[ 
https://issues.apache.org/jira/browse/OAK-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093043#comment-16093043
 ] 

Andrei Dulceanu commented on OAK-5902:
--------------------------------------

I took a look at what netty provides for solving this problem, and as stated in 
an older comment from code,  {{ChunkedInput}} and {{ChunkedWriteHandler}} are 
the candidates to look at. Now, when it comes to the changes to be done, I 
identified the following:
* {{GetBlobResponse}}: instead of {{byte[]}} array has an {{InputStream}} 
reference and a length field
* {{GetBlobRequestHandler}}: doesn't read the whole blob ahead, but rather only 
returns the reference to underlying {{InputStream}} and length
* {{GetBlobResponseEncoder}}: since the whole content is not read ahead, a 
challenge arises in computing the content hash. For this, 
{{HashingInputStream}} could be used, to compute the hash on the fly. This 
would be wrapped in a {{ChunkedStream}} which would be written to the channel, 
along with the blob length and the hash (comes last).
* {{StandbyServer}}: {{ChunkedWriteHandler}} needs to be added before all other 
handlers.
* {{ResponseDecoder}}: read blob length and then the content, chunk by chunk, 
ending up with the {{InputStream}} needed to re-create {{GetBlobResponse}} 
value object.
* {{StandbyClient}}: {{#getBlob}} should return {{InputStream}} instead of 
{{byte[]}}.
* {{StandbyDiff}}: writes the blob using the above {{InputStream}} to blob 
store, computes the hash on the fly similar to {{GetBlobEncoder}}.

*Advantages*:
* only {{CHUNK_SIZE}} bytes are sent from server to client at a given time
* lower memory consumption

*Disadvantages*
* since the hash of the content can be computed only at the very end, when the 
blob is already written to the blob store, if a chunk is corrupt, this will be 
seen only at the end, needing special handling (i.e. deleting the blob from the 
blob store, initially using a temporary blob store?).

Another approach would be to implement custom chunking logic in 
{{GetBlobResponseEncoder}}, introduce a {{GetChunkResponse}} containing only 
{{CHUNK_SIZE}} bytes and the corresponding hash, write a custom handler on the 
client to aggregate the chunks.

[~frm], WDYT?


> Cold standby should allow syncing of blobs bigger than 2.2 GB
> -------------------------------------------------------------
>
>                 Key: OAK-5902
>                 URL: https://issues.apache.org/jira/browse/OAK-5902
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>    Affects Versions: 1.6.1
>            Reporter: Andrei Dulceanu
>            Assignee: Andrei Dulceanu
>            Priority: Minor
>             Fix For: 1.8, 1.7.5
>
>
> Currently there is a limitation for the maximum binary size (in bytes) to be 
> synced between primary and standby instances. This matches 
> {{Integer.MAX_VALUE}} (2,147,483,647) bytes and no binaries bigger than this 
> limit can be synced between the instances.
> Per comment at [1], the current protocol needs to be changed to allow sending 
> of binaries in chunks, to surpass this limitation.
> [1] 
> https://github.com/apache/jackrabbit-oak/blob/1.6/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/standby/client/StandbyClient.java#L125



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to