Andrei Dulceanu commented on OAK-5902:

In the end I chose to implement custom chunking logic on both client and server 

* {{LengthFieldFrameDecoder}} could be still used on the client => no need to 
make changes in encoding/decoding for the rest of the packets sent between 
client and server (head state, segments, references, etc.)
* Using {{ChunkedStream}} and {{ChunkedWriteHandler}} on the server needed to 
alter the whole decoding on the client (removing {{LengthFieldFrameDecoder}}, 
see above), implementing a state machine, etc. Basically, before this change 
every packet had a header and a payload. After implementing the change all 
packets except blobs stay as they are, while blobs have only one header for the 
whole blob, followed by {{NO_CHUNKS}} chunks of data without any header.
* Computing the hash of the {{InputStream}} on the fly with 
{{HashingInputStream}} (from guava) means that under the hood, this class 
buffers all the bytes in an internal stream -> e.g. lots of memory consumed on 
the server side which goes against the whole chunking concept (on both client 
and server).
* It prevents transferring chunks that are too little, as the chunks size in 
{{ChunkedStream}} is computed as {{Math.min(this.chunkSize, in.available())}}. 
In practice this means at most 30K per chunk.

All changes can be tracked in [0]. I'm a bit worried about the additional 
{{-Xmx3584m}} heap size parameter added in {{pom.xml}} for 
{{DataStoreTestBase.testSyncBigBlob}}. This test verifies the sync of a 2.5 GB 
blob and without the increased heap size it fails. As a side note, having this 
additional test increases the duration of {{ExternalPrivateStoreIT}} from ~30 
seconds to ~ 180 seconds on my machine. But I guess we can decide later if we 
want to keep it or not.

[~frm], could you take a look at the proposed changes?

[0] https://github.com/dulceanu/jackrabbit-oak/tree/cold-standby/chunking

> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -------------------------------------------------------------
>                 Key: OAK-5902
>                 URL: https://issues.apache.org/jira/browse/OAK-5902
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>    Affects Versions: 1.6.1
>            Reporter: Andrei Dulceanu
>            Assignee: Andrei Dulceanu
>            Priority: Minor
>             Fix For: 1.8, 1.7.6
> Currently there is a limitation for the maximum binary size (in bytes) to be 
> synced between primary and standby instances. This matches 
> {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this 
> limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending 
> of binaries in chunks, to surpass this limitation.
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125

This message was sent by Atlassian JIRA

Reply via email to