On Wed, Oct 17, 2012 at 10:42 AM, Michael Dürig <[email protected]> wrote: > > I wonder why the Microkernel API has an asymmetry here: for writing a binary > you can pass a stream where as for reading you need to pass a byte array.
the write method implies a content-addressable storage for blobs, i.e. identical binary content is identified by identical identifiers. the identifier needs to be computed from the entire blob content. that's why the signature takes a stream rather than supporting chunked writes. cheers stefan > > Michael > > > On 26.9.12 8:38, Mete Atamel wrote: >> >> Hi, >> >> I realized that MicroKernelIT#testBlobs takes a while to complete on >> MongoMK. This is partly due to how the test was written and partly due to >> how the blob read offset is implemented in MongoMK. I'm looking for >> feedback on where to fix this. >> >> To give you an idea on testBlobs, it first writes a blob using MK. Then, >> it verifies that the blob bytes were written correctly by reading the blob >> from MK. However, blob read from MK is not done in one shot. Instead, it's >> done via this input stream: >> >> InputStream in2 = new BufferedInputStream(new MicroKernelInputStream(mk, >> id)); >> >> >> MicroKernelInputStream reads from the MK and BufferedInputStream buffers >> the reads in 8K chunks. Then, there's a while loop with in2.read() to read >> the blob fully. This makes a call to MicroKernel#read method with the >> right offset for every 8K chunk until the blob bytes are fully read. >> >> This is not a problem for small blob sizes but for bigger blob sizes, >> reading 8K chunks can be slow because in MongoMK, every read with offset >> triggers the following: >> -Find the blob from GridFS >> -Retrieve its input stream >> -Skip to the right offset >> -Read 8K >> -Close the input stream >> >> I could fix this by changing the test to read the blob bytes in one shot >> and then do the comparison. However, I was wondering if we should also >> work on an optimization for successive reads from the blob with >> incremental offsets? Maybe we could keep the input stream of recently read >> blobs around for some time before closing them? >> >> Best, >> Mete >> >> >
