Thanks for the feedback. Using AbstractBlobStore instead of GridFS is indeed on the list of things I want to try out once the rest of missing functionality is done in MongoMK. I'll report back once I get a chance to implement that.
-Mete On 10/17/12 10:26 AM, "Thomas Mueller" <[email protected]> wrote: >Hi, > >As a workaround, you could keep the last few streams open in the Mongo MK >for some time (a cache) together with the current position. That way seek >is not required in most cases, as usually binaries are read as a stream. > >However, keeping resources open is problematic (we do that in the >DbDataStore in Jackrabbit, and we ran into various problems), and I would >avoid it if possible. I would probably use the AbstractBlobStore instead >which splits blobs into blocks, I believe that way you can just use >regular MongoDB features and don't need to use GridFS. But you might want >to test which approach is faster / easier. > >Regards, >Thomas > > > >On 9/26/12 9:48 AM, "Mete Atamel" <[email protected]> wrote: > >>Forgot to mention. I could also increase the BufferedInputStream's buffer >>size to something high to speed up the large blob read. That's probably >>what I'll do in the short term but my question is more about whether the >>optimization I mentioned in my previous email is worth pursuing at some >>point. >> >>Best, >>Mete >> >>On 9/26/12 9:38 AM, "Mete Atamel" <[email protected]> wrote: >> >>>Hi, >>> >>>I realized that MicroKernelIT#testBlobs takes a while to complete on >>>MongoMK. This is partly due to how the test was written and partly due >>>to >>>how the blob read offset is implemented in MongoMK. I'm looking for >>>feedback on where to fix this. >>> >>>To give you an idea on testBlobs, it first writes a blob using MK. Then, >>>it verifies that the blob bytes were written correctly by reading the >>>blob >>>from MK. However, blob read from MK is not done in one shot. Instead, >>>it's >>>done via this input stream: >>> >>>InputStream in2 = new BufferedInputStream(new MicroKernelInputStream(mk, >>>id)); >>> >>> >>>MicroKernelInputStream reads from the MK and BufferedInputStream buffers >>>the reads in 8K chunks. Then, there's a while loop with in2.read() to >>>read >>>the blob fully. This makes a call to MicroKernel#read method with the >>>right offset for every 8K chunk until the blob bytes are fully read. >>> >>>This is not a problem for small blob sizes but for bigger blob sizes, >>>reading 8K chunks can be slow because in MongoMK, every read with offset >>>triggers the following: >>>-Find the blob from GridFS >>>-Retrieve its input stream >>>-Skip to the right offset >>>-Read 8K >>>-Close the input stream >>> >>>I could fix this by changing the test to read the blob bytes in one shot >>>and then do the comparison. However, I was wondering if we should also >>>work on an optimization for successive reads from the blob with >>>incremental offsets? Maybe we could keep the input stream of recently >>>read >>>blobs around for some time before closing them? >>> >>>Best, >>>Mete >>> >>> >> >
