[ https://issues.apache.org/jira/browse/OAK-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Mueller updated OAK-6269: -------------------------------- Sprint: L11 (was: L10) > Support non chunk storage in OakDirectory > ----------------------------------------- > > Key: OAK-6269 > URL: https://issues.apache.org/jira/browse/OAK-6269 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene > Reporter: Chetan Mehrotra > Assignee: Vikas Saurabh > Fix For: 1.8 > > > Logging this issue based on offline discussion with [~catholicon]. > Currently OakDirectory stores files in chunk of 1 MB each. So a 1 GB file > would be stored in 1000+ chunks of 1 MB. > This design was done to support direct usage of OakDirectory with Lucene as > Lucene makes use of random io. Chunked storage allows it to seek to random > position quickly. If the files are stored as Blobs then its only possible to > access via streaming which would be slow > As most setup now use copy-on-read and copy-on-write support and rely on > local copy of index we can have an implementation which stores the file as > single blob. > *Pros* > * Quite a bit of reduction in number of small blobs stored in BlobStore. > Which should reduce the GC time specially for S3 > * Reduced overhead of storing a single file in repository. Instead of array > of 1k blobids we would be stored a single blobid > * Potential improvement in IO cost as file can be read in one connection and > uploaded in one. > *Cons* > It would not be possible to use OakDirectory directly (or would be very slow) > and we would always need to do local copy. -- This message was sent by Atlassian JIRA (v6.4.14#64029)