[ 
https://issues.apache.org/jira/browse/OAK-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Saurabh updated OAK-6269:
-------------------------------
    Attachment: 0001-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch
                0002-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch
                0003-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch

[~chetanm], can you please take a look. I've broken the changes into 3 parts:
* [patch1|^0001-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch] : 
This is basically a refactor to extract interface from {{OakIndexFile}}. 
Concrete class is named {{OakBufferedIndexFile}}.
* [patch2|^0002-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch] : 
This is what contains the new implementation
* [patch3 | ^0003-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch] : 
Adds tests

A few things to note:
* I wanted to break OakIndexFile split into read and write parts. While that 
made sense for streaming implementation, the same thing wasn't making much 
sense for buffered one. Wdyt?
* there is a configuration which would disable writing of single blob based 
index file. the read side doesn't care about this flag - it respects whatever 
is found in the repo. Something to note: this creates forward incompatibility
* should we enable this by default just now
* afaict, with CoW/CoRWithPrefetch, there should be no impact on performance. 
But, I couldn't quite come up with some benchmark (within oak or ad-hoc) where 
I could control lucene merges of big segments (where this feature should 
un/shine the most) and benchmark the perf. Any ideas?

> Support non chunk storage in OakDirectory
> -----------------------------------------
>
>                 Key: OAK-6269
>                 URL: https://issues.apache.org/jira/browse/OAK-6269
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>             Fix For: 1.8
>
>         Attachments: 
> 0001-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch, 
> 0002-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch, 
> 0003-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch
>
>
> Logging this issue based on offline discussion with [~catholicon].
> Currently OakDirectory stores files in chunk of 1 MB each. So a 1 GB file 
> would be stored in 1000+ chunks of 1 MB.
> This design was done to support direct usage of OakDirectory with Lucene as 
> Lucene makes use of random io. Chunked storage allows it to seek to random 
> position quickly. If the files are stored as Blobs then its only possible to 
> access via streaming which would be slow
> As most setup now use copy-on-read and copy-on-write support and rely on 
> local copy of index we can have an implementation which stores the file as 
> single blob.
> *Pros*
> * Quite a bit of reduction in number of small blobs stored in BlobStore. 
> Which should reduce the GC time specially for S3 
> * Reduced overhead of storing a single file in repository. Instead of array 
> of 1k blobids we would be stored a single blobid
> * Potential improvement in IO cost as file can be read in one connection and 
> uploaded in one.
> *Cons*
> It would not be possible to use OakDirectory directly (or would be very slow) 
> and we would always need to do local copy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to