Andrei Dulceanu created OAK-6659:
------------------------------------
Summary: Cold standby should fail loudly when a big blob can't be
timely transferred
Key: OAK-6659
URL: https://issues.apache.org/jira/browse/OAK-6659
Project: Jackrabbit Oak
Issue Type: Bug
Components: segment-tar, tarmk-standby
Affects Versions: 1.7.6
Reporter: Andrei Dulceanu
Assignee: Andrei Dulceanu
Priority: Critical
Fix For: 1.7.8
Due to changes done in OAK-4969, currently there are two 'sync blob' cycles
triggered by {{StandbyDiff#childNodeChanged}}. The test scenario is the same as
the one in {{DataStoreTestBase#testSyncBigBlob}}: on the primary file store, a
new big blob (1GB) is added and then a standby sync is triggered to sync this
content to the secondary file store.
The first 'sync blob' cycle happens as a result of {{#process}} being called in
{{StandbyDiff#childNodeChanged}}. As a result a new 'get blob' request is
created on the client and the server starts sending chunks from the big blob.
Now, if the time needed for transferring the entire blob from server to client
exceeds {{readTimeoutMs}} an {{IllegalStateException}} will be correctly thrown
by {{StandbyDiff#readBlob}}, but will be swallowed by the
{{StandbyDiff#childNodeChanged}} in its catch clause. A second 'sync blob'
cycle will be triggered and sometimes, this might succeed with the same
{{readTimeoutMs}} for which it was failing before.
The consequence of these two 'sync blob' cycles is that sometimes, deleting the
temporary file to which chunks are spooled to on the client fails (see Windows
for example). This way, instead of deleting the previous incomplete transfer,
new chunks from the second 'sync blob' cycle are added. The blob persisted in
the blob store on the client won't have the same size and id as the initial
blob sent by the server.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)