[ 
https://issues.apache.org/jira/browse/OAK-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated OAK-2682:
-----------------------------
    Assignee: Stefan Egli

After discussing this with [~mduerig] the suggestion is to follow up on what 
was discussed and as it looks agreed upon between [~mreutegg] and [~rombert]:
* DocumentStore implementations should expose an MBean function which 
determines the +time difference between the local and the database server+ (in 
milliseconds): {{getServerTimeDifferentMillis()}}
* That MBean function could thus be used in some monitoring tool to react upon 
difference growing above certain limits (perhaps with a lower 'warn'  and a 
higher 'panic' limit)
* Independent of monitoring however, the DocumentStore should at +startup apply 
an initial check+ on this 'server-time-diff' to assert that the clocks are in 
sync at least initially. The assumption is that clock speed differences are 
much less of a problem than initial time difference. This, plus the fact that a 
server startup is usually an admin controlled activity, the initial check can 
apply a rather dramatic limit (eg 2 seconds). Higher level monitoring though 
can be slightly more generous and for example have a 2 sec warning- and a 5 
second panic limit. 

I'll follow up on the MongoDocumentStore part of this feature next. 
(RDBDocumentStore part will be handled in separate ticket)

> Introduce time difference detection for DocumentNodeStore
> ---------------------------------------------------------
>
>                 Key: OAK-2682
>                 URL: https://issues.apache.org/jira/browse/OAK-2682
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, mongomk
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>              Labels: resilience
>             Fix For: 1.3.5
>
>
> Currently the lease mechanism in DocumentNodeStore/mongoMk is based on the 
> assumption that the clocks are in perfect sync between all nodes of the 
> cluster. The lease is valid for 60sec with a timeout of 30sec. If clocks are 
> off by too much, and background operations happen to take couple seconds, you 
> run the risk of timing out a lease. So introducing a check which WARNs if the 
> clocks in a cluster are off by too much (1st threshold, eg 5sec?) would help 
> increase awareness. Further drastic measure could be to prevent a startup of 
> Oak at all if the difference is for example higher than a 2nd threshold 
> (optional I guess, but could be 20sec?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to