[jira] [Commented] (OAK-6269) Support non chunk storage in OakDirectory

Chetan Mehrotra (JIRA) Tue, 26 Sep 2017 02:50:21 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180556#comment-16180556
 ]


Chetan Mehrotra commented on OAK-6269:
--------------------------------------

bq. I wanted to break OakIndexFile split into read and write parts. While that 
made sense for streaming implementation, the same thing wasn't making much 
sense for buffered one. Wdyt?

Yeah lets keep it as is

bq. there is a configuration which would disable writing of single blob based 
index file. the read side doesn't care about this flag - it respects whatever 
is found in the repo. Something to note: this creates forward incompatibility

Can we have this as OSGi config similar to support for blobSize

bq. should we enable this by default just now

Yes. Not sure if we can enable it for default test but enable it for those test 
were CoW is enabled. LucenePropertyIndexTest should be fine to use this

bq. But, I couldn't quite come up with some benchmark (within oak or ad-hoc)

This may show better performance for S3DataStore

> Support non chunk storage in OakDirectory
> -----------------------------------------
>
>                 Key: OAK-6269
>                 URL: https://issues.apache.org/jira/browse/OAK-6269
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>             Fix For: 1.8
>
>         Attachments: 
> 0001-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch, 
> 0002-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch, 
> 0003-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch
>
>
> Logging this issue based on offline discussion with [~catholicon].
> Currently OakDirectory stores files in chunk of 1 MB each. So a 1 GB file 
> would be stored in 1000+ chunks of 1 MB.
> This design was done to support direct usage of OakDirectory with Lucene as 
> Lucene makes use of random io. Chunked storage allows it to seek to random 
> position quickly. If the files are stored as Blobs then its only possible to 
> access via streaming which would be slow
> As most setup now use copy-on-read and copy-on-write support and rely on 
> local copy of index we can have an implementation which stores the file as 
> single blob.
> *Pros*
> * Quite a bit of reduction in number of small blobs stored in BlobStore. 
> Which should reduce the GC time specially for S3 
> * Reduced overhead of storing a single file in repository. Instead of array 
> of 1k blobids we would be stored a single blobid
> * Potential improvement in IO cost as file can be read in one connection and 
> uploaded in one.
> *Cons*
> It would not be possible to use OakDirectory directly (or would be very slow) 
> and we would always need to do local copy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (OAK-6269) Support non chunk storage in OakDirectory

Reply via email to