[
https://issues.apache.org/jira/browse/OAK-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836699#comment-16836699
]
Henry Saginor edited comment on OAK-8275 at 5/9/19 9:35 PM:
------------------------------------------------------------
See the specification of the reset method
https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#reset()
Specifically this - "If the method mark has not been called since the stream
was created, or the number of bytes read from the stream since mark was last
called is larger than the argument to mark at that last call, then an
IOException might be thrown."
Basically the way I read this is that if you want to set the position to the
end of the stream and then reset it to starting position (something commons
compress library actually does) then you need to call mark method when you
start processing with readlimit argument equal to blob's size+1. Otherwise you
will get an IOException when you call reset because marked position will be
invalidated. Since mark method takes an int this would be limited to
Integer.MAX (or blobs of little over 2gb I think). And in practical sense
realimit argument should probably be a lot smaller that this to avoid memory
issues. In my code I have set max readlimit to 1gb which is probably still too
large. But we can discuss this if/when the general approach is accepted.
So, this is the reason why the InputStream wrapper would be limited in terms of
blob size it can support. I hope this clarifies it.
was (Author: [email protected]):
See the specification of the reset method
https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#reset()
Specifically this - "If the method mark has not been called since the stream
was created, or the number of bytes read from the stream since mark was last
called is larger than the argument to mark at that last call, then an
IOException might be thrown."
Basically the way I read this is that if you want to set the position to the
end of the stream and then reset it to starting position (something commons
compress library actually does) then you need to call mark method when you
start processing with realimit argument equal to blob's size+1. Otherwise you
will get an IOException when you call reset because marked position will be
invalidated. Since mark method takes an int this would be limited to
Integer.MAX (or blobs of little over 2gb I think). And in practical sense
realimit argument should probably be a lot smaller that this to avoid memory
issues. In my code I have set max readlimit to 1gb which is probably still too
large. But we can discuss this if/when the general approach is accepted.
So, this is the reason why the InputStream wrapper would be limited in terms of
blob size it can support. I hope this clarifies it.
> Add NIO channel access to JCR binaries
> --------------------------------------
>
> Key: OAK-8275
> URL: https://issues.apache.org/jira/browse/OAK-8275
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Reporter: Henry Saginor
> Priority: Major
>
> This is a follow up to the discussion started in OAK-8186. Currently JCR
> binaries can only be accessed via InputStream. This is inefficient. It can
> also be inadequate for some use cases. For example handling some Zip file
> formats like deflate64 requires random access.
> The proposal is to add API that returns SeekableByteChannel
> Here is the new API I am proposing -
>
> [https://github.com/hsaginor/jackrabbit/blob/createChannel/jackrabbit-api/src/main/java/org/apache/jackrabbit/api/ChannelBinary.java]
>
> [https://github.com/hsaginor/jackrabbit-oak/blob/createChannel2/oak-api/src/main/java/org/apache/jackrabbit/oak/api/Blob.java]
> (see 2 added methods)
> And all of the implementation changes -
>
> [https://github.com/apache/jackrabbit-oak/compare/trunk...hsaginor:createChannel2]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)