[jira] [Updated] (OAK-6659) Cold standby should fail loudly when a big blob can't be timely transferred

Davide Giannella (JIRA) Mon, 08 Jan 2018 05:38:01 -0800

     [ 
https://issues.apache.org/jira/browse/OAK-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Davide Giannella updated OAK-6659:
----------------------------------
    Fix Version/s: 1.8.0

> Cold standby should fail loudly when a big blob can't be timely transferred
> ---------------------------------------------------------------------------
>
>                 Key: OAK-6659
>                 URL: https://issues.apache.org/jira/browse/OAK-6659
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar, tarmk-standby
>    Affects Versions: 1.7.6
>            Reporter: Andrei Dulceanu
>            Assignee: Andrei Dulceanu
>            Priority: Critical
>              Labels: cold-standby
>             Fix For: 1.7.8, 1.8.0
>
>         Attachments: OAK-6659.patch
>
>
> Due to changes done in OAK-4969, currently there are two 'sync blob' cycles 
> triggered by {{StandbyDiff#childNodeChanged}}. The test scenario is the same 
> as the one in {{DataStoreTestBase#testSyncBigBlob}}: on the primary file 
> store, a new big blob (1GB) is added and then a standby sync is triggered to 
> sync this content to the secondary file store. 
> The first 'sync blob' cycle happens as a result of {{#process}} being called 
> in {{StandbyDiff#childNodeChanged}}. Therefore, a new 'get blob' request is 
> created on the client and the server starts sending chunks from the big blob. 
> Now, if the time needed for transferring the entire blob from server to 
> client exceeds {{readTimeoutMs}} an {{IllegalStateException}} will be 
> correctly thrown by {{StandbyDiff#readBlob}}, but will be swallowed by the 
> {{StandbyDiff#childNodeChanged}} in its catch clause. A second 'sync blob' 
> cycle will be triggered and, -this might succeed with the same 
> {{readTimeoutMs}} for which it was failing before-, if {{readTimeoutMs * 2}} 
> is enough, the blob will be synced on the standby. This happens because the 
> server will continue sending the remaining chunks after 
> {{IllegalStateException}} was thrown (first 'sync blob' cycle).
> The consequence of these two 'sync blob' cycles is that sometimes, deleting 
> the temporary file to which chunks are spooled to on the client fails (see 
> Windows for example and OAK-6641 specifically). This way, instead of deleting 
> the previous incomplete transfer, new chunks from the second 'sync blob' 
> cycle are added. The blob persisted in the blob store on the client won't 
> have the same size and id as the initial blob sent by the server.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (OAK-6659) Cold standby should fail loudly when a big blob can't be timely transferred

Reply via email to