[ 
https://issues.apache.org/jira/browse/OAK-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183612#comment-16183612
 ] 

Vikas Saurabh edited comment on OAK-6269 at 9/28/17 4:33 AM:
-------------------------------------------------------------

bq. Would be good to have OakDirectory#largeFile run with streaming mode.
So, current {{BlackHoleBlobStore}} couldn't accomodate single large blob as it 
couldn't allocate sufficient space on heap (maybe, I could've played with Xmx 
to make it work... but, that doesn't seem reasonable). Anyway, I adapted that 
store to not keep blobs in mem but hash calculation is still involved - the 
test passes but takes more time.

bq. Or better parametrize the OakDirectory test for both modes
Wrt, to previous point and some custom blob size/setting blob size in writing 
tests don't fit the bill in streaming case

That said, all test pass (though largeFile takes time :-/) with a few sensible 
changes.

Have taken care of other points in that comment \[0].

bq. Can we have this as OSGi config similar to support for blobSize
-I couldn't find how blob size is configurable using Osgi... that said, it'd 
probably to far down the tracks to make enable/disable from 
{{LuceneIndexProviderService}} to {{BufferedOakDirectory}}. The current logic 
is that BufferedOakDirectory sets up backing OakDirectory for streaming if 
{{\-Doak.lucene.enableSingleBlobIndexFiles}} is set to true. Currently, default 
is also true.-
_UPDATE_: Now this is exposed as a osgi prop for LuceneIndexPropertyService 
that set a static boolean of BufferedOakDirectory.

Btw, with default set to true, all oak-lucene tests pass.

\[0]: Current work at 
https://github.com/catholicon/jackrabbit-oak/compare/trunk...catholicon:OAK-6269-non-chunking-OakDirectory?expand=1



was (Author: catholicon):
bq. Would be good to have OakDirectory#largeFile run with streaming mode.
So, current {{BlackHoleBlobStore}} couldn't accomodate single large blob as it 
couldn't allocate sufficient space on heap (maybe, I could've played with Xmx 
to make it work... but, that doesn't seem reasonable). Anyway, I adapted that 
store to not keep blobs in mem but hash calculation is still involved - the 
test passes but takes more time.

bq. Or better parametrize the OakDirectory test for both modes
Wrt, to previous point and some custom blob size/setting blob size in writing 
tests don't fit the bill in streaming case

That said, all test pass (though largeFile takes time :-/) with a few sensible 
changes.

Have taken care of other points in that comment \[0].

bq. Can we have this as OSGi config similar to support for blobSize
I couldn't find how blob size is configurable using Osgi... that said, it'd 
probably to far down the tracks to make enable/disable from 
{{LuceneIndexProviderService}} to {{BufferedOakDirectory}}. The current logic 
is that BufferedOakDirectory sets up backing OakDirectory for streaming if 
{{-Doak.lucene.enableSingleBlobIndexFiles}} is set to true. Currently, default 
is also true.

Btw, with default set to true, all oak-lucene tests pass.

\[0]: Current work at 
https://github.com/catholicon/jackrabbit-oak/compare/trunk...catholicon:OAK-6269-non-chunking-OakDirectory?expand=1


> Support non chunk storage in OakDirectory
> -----------------------------------------
>
>                 Key: OAK-6269
>                 URL: https://issues.apache.org/jira/browse/OAK-6269
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>             Fix For: 1.8
>
>         Attachments: 
> 0001-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch, 
> 0002-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch, 
> 0003-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch
>
>
> Logging this issue based on offline discussion with [~catholicon].
> Currently OakDirectory stores files in chunk of 1 MB each. So a 1 GB file 
> would be stored in 1000+ chunks of 1 MB.
> This design was done to support direct usage of OakDirectory with Lucene as 
> Lucene makes use of random io. Chunked storage allows it to seek to random 
> position quickly. If the files are stored as Blobs then its only possible to 
> access via streaming which would be slow
> As most setup now use copy-on-read and copy-on-write support and rely on 
> local copy of index we can have an implementation which stores the file as 
> single blob.
> *Pros*
> * Quite a bit of reduction in number of small blobs stored in BlobStore. 
> Which should reduce the GC time specially for S3 
> * Reduced overhead of storing a single file in repository. Instead of array 
> of 1k blobids we would be stored a single blobid
> * Potential improvement in IO cost as file can be read in one connection and 
> uploaded in one.
> *Cons*
> It would not be possible to use OakDirectory directly (or would be very slow) 
> and we would always need to do local copy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to