[
https://issues.apache.org/jira/browse/OAK-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Egli updated OAK-2682:
-----------------------------
Assignee: Stefan Egli
After discussing this with [~mduerig] the suggestion is to follow up on what
was discussed and as it looks agreed upon between [~mreutegg] and [~rombert]:
* DocumentStore implementations should expose an MBean function which
determines the +time difference between the local and the database server+ (in
milliseconds): {{getServerTimeDifferentMillis()}}
* That MBean function could thus be used in some monitoring tool to react upon
difference growing above certain limits (perhaps with a lower 'warn' and a
higher 'panic' limit)
* Independent of monitoring however, the DocumentStore should at +startup apply
an initial check+ on this 'server-time-diff' to assert that the clocks are in
sync at least initially. The assumption is that clock speed differences are
much less of a problem than initial time difference. This, plus the fact that a
server startup is usually an admin controlled activity, the initial check can
apply a rather dramatic limit (eg 2 seconds). Higher level monitoring though
can be slightly more generous and for example have a 2 sec warning- and a 5
second panic limit.
I'll follow up on the MongoDocumentStore part of this feature next.
(RDBDocumentStore part will be handled in separate ticket)
> Introduce time difference detection for DocumentNodeStore
> ---------------------------------------------------------
>
> Key: OAK-2682
> URL: https://issues.apache.org/jira/browse/OAK-2682
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core, mongomk
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Labels: resilience
> Fix For: 1.3.5
>
>
> Currently the lease mechanism in DocumentNodeStore/mongoMk is based on the
> assumption that the clocks are in perfect sync between all nodes of the
> cluster. The lease is valid for 60sec with a timeout of 30sec. If clocks are
> off by too much, and background operations happen to take couple seconds, you
> run the risk of timing out a lease. So introducing a check which WARNs if the
> clocks in a cluster are off by too much (1st threshold, eg 5sec?) would help
> increase awareness. Further drastic measure could be to prevent a startup of
> Oak at all if the difference is for example higher than a 2nd threshold
> (optional I guess, but could be 20sec?).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)