On Wed, Oct 17, 2012 at 11:08 AM, Michael Dürig <[email protected]> wrote: > > > On 17.10.12 10:03, Stefan Guggisberg wrote: >> >> On Wed, Oct 17, 2012 at 10:42 AM, Michael Dürig <[email protected]> >> wrote: >>> >>> >>> I wonder why the Microkernel API has an asymmetry here: for writing a >>> binary >>> you can pass a stream where as for reading you need to pass a byte array. >> >> >> the write method implies a content-addressable storage for blobs, >> i.e. identical binary content is identified by identical identifiers. >> the identifier >> needs to be computed from the entire blob content. that's why the >> signature takes >> a stream rather than supporting chunked writes. > > > Makes sense so far but this is only half of the story ;-) Why couldn't the > read method also return a stream?
it could, but then why should it? for cosmetical reasons? personally i prefer the current signature for cleaner semantics and ease of implementation. cheers stefan > > Michael > > >> >> cheers >> stefan >> >>> >>> Michael >>> >>> >>> On 26.9.12 8:38, Mete Atamel wrote: >>>> >>>> >>>> Hi, >>>> >>>> I realized that MicroKernelIT#testBlobs takes a while to complete on >>>> MongoMK. This is partly due to how the test was written and partly due >>>> to >>>> how the blob read offset is implemented in MongoMK. I'm looking for >>>> feedback on where to fix this. >>>> >>>> To give you an idea on testBlobs, it first writes a blob using MK. Then, >>>> it verifies that the blob bytes were written correctly by reading the >>>> blob >>>> from MK. However, blob read from MK is not done in one shot. Instead, >>>> it's >>>> done via this input stream: >>>> >>>> InputStream in2 = new BufferedInputStream(new MicroKernelInputStream(mk, >>>> id)); >>>> >>>> >>>> MicroKernelInputStream reads from the MK and BufferedInputStream buffers >>>> the reads in 8K chunks. Then, there's a while loop with in2.read() to >>>> read >>>> the blob fully. This makes a call to MicroKernel#read method with the >>>> right offset for every 8K chunk until the blob bytes are fully read. >>>> >>>> This is not a problem for small blob sizes but for bigger blob sizes, >>>> reading 8K chunks can be slow because in MongoMK, every read with offset >>>> triggers the following: >>>> -Find the blob from GridFS >>>> -Retrieve its input stream >>>> -Skip to the right offset >>>> -Read 8K >>>> -Close the input stream >>>> >>>> I could fix this by changing the test to read the blob bytes in one shot >>>> and then do the comparison. However, I was wondering if we should also >>>> work on an optimization for successive reads from the blob with >>>> incremental offsets? Maybe we could keep the input stream of recently >>>> read >>>> blobs around for some time before closing them? >>>> >>>> Best, >>>> Mete >>>> >>>> >>> >
