[
https://issues.apache.org/jira/browse/OAK-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180553#comment-16180553
]
Chetan Mehrotra edited comment on OAK-6269 at 9/26/17 10:03 AM:
----------------------------------------------------------------
[~catholicon] Patch looks good. Some feedback below
{noformat}
+ /**
+ * @return if the file implementation supports copying data from {@link
DataInput} directly.
+ */
+ boolean supportsCopying();
{noformat}
May be {{supportsCopyFromDataInput}}
{noformat}
+ /** Copy numBytes bytes from input to ourself. */
+ public void copyBytes(DataInput input, long numBytes) throws IOException {
{noformat}
add {{@Override}}
Some other points
* Earlier we saw some bugs due to integer overflow. Would be good to have
OakDirectory#largeFile run with streaming mode. Or better parametrize the
OakDirectory test for both modes
* Checking for test covergae of OakStreamingIndexFile do not show coverage for
following and few others. Would be good to have higher test coverage for this
class
** {{copyBytes}}
** uniqueKey clause
was (Author: chetanm):
[~catholicon] Patch looks good. Some feedback below
{noformat}
+ /**
+ * @return if the file implementation supports copying data from {@link
DataInput} directly.
+ */
+ boolean supportsCopying();
{noformat}
May be {{supportsCopyFromDataInput}}
{noformat}
+ /** Copy numBytes bytes from input to ourself. */
+ public void copyBytes(DataInput input, long numBytes) throws IOException {
{noformat}
add {{@Override}}
Some other points
* Earlier we saw some bugs due to integer overflow. Would be good to have
OakDirectory#largeFile run with streaming mode
* Checking for test covergae of OakStreamingIndexFile do not show coverage for
following and few others. Would be good to have higher test coverage for this
class
** {{copyBytes}}
** uniqueKey clause
> Support non chunk storage in OakDirectory
> -----------------------------------------
>
> Key: OAK-6269
> URL: https://issues.apache.org/jira/browse/OAK-6269
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: lucene
> Reporter: Chetan Mehrotra
> Assignee: Vikas Saurabh
> Fix For: 1.8
>
> Attachments:
> 0001-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch,
> 0002-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch,
> 0003-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch
>
>
> Logging this issue based on offline discussion with [~catholicon].
> Currently OakDirectory stores files in chunk of 1 MB each. So a 1 GB file
> would be stored in 1000+ chunks of 1 MB.
> This design was done to support direct usage of OakDirectory with Lucene as
> Lucene makes use of random io. Chunked storage allows it to seek to random
> position quickly. If the files are stored as Blobs then its only possible to
> access via streaming which would be slow
> As most setup now use copy-on-read and copy-on-write support and rely on
> local copy of index we can have an implementation which stores the file as
> single blob.
> *Pros*
> * Quite a bit of reduction in number of small blobs stored in BlobStore.
> Which should reduce the GC time specially for S3
> * Reduced overhead of storing a single file in repository. Instead of array
> of 1k blobids we would be stored a single blobid
> * Potential improvement in IO cost as file can be read in one connection and
> uploaded in one.
> *Cons*
> It would not be possible to use OakDirectory directly (or would be very slow)
> and we would always need to do local copy.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)