Stefan Egli created OAK-3238:
--------------------------------
Summary: fine tune clock-sync check vs lease-check settings
Key: OAK-3238
URL: https://issues.apache.org/jira/browse/OAK-3238
Project: Jackrabbit Oak
Issue Type: Improvement
Components: core
Affects Versions: 1.3.4
Reporter: Stefan Egli
There are now two components that try to assure 'discovery-lite' (OAK-2844) is
reporting a coherent cluster view to the upper layers:
* OAK-2682 : time difference detection: by default fails if clock is off by
more than 2 seconds at startup. That results in a 4 sec max margin in a
document-cluster
* OAK-2739 : lease-checking: every instance checks if the local lease is valid
upon any document access. This check is done against the actual 'leaseEndTime'
- which is updated every (by default) 30 seconds to be valid for (by default)
another 60 seconds.
These two factors combined, in the worst case you could still end up having
that 4 second time window where the local instance fails to update the lease
(eg lease-thread dies) but it considers itself still owning a valid lease -
while a remote instance might be those 4 seconds off and considers the lease as
timed out.
So overall: the 3 factors 'lease duration', 'lease update frequency' and
'maximum allowed clock difference' must be better tuned to end up in a stable
mechanism.
Suggestion:
* increase the 'lease duration' to be 3 x 'lease update frequency', ie 90sec
lease duration
* reduce the lease check failure limit from 'lease duration' to 2x 'lease
update frequency' - assuming that one 'lease update interval' is way larger
than the 'maximum allowed clock difference'
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)