Stefan Egli created OAK-3398:
--------------------------------

             Summary: make lease update more robust
                 Key: OAK-3398
                 URL: https://issues.apache.org/jira/browse/OAK-3398
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.3.6
            Reporter: Stefan Egli
            Assignee: Stefan Egli
             Fix For: 1.3.7


With the lease check introduced in OAK-2739 (and refined to do a oak-core stop 
in OAK-3397) it becomes more critical that the lease is always properly updated 
(to avoid an unnecessary oak-core stop). The following issues exist atm:
* currently the lease is valid 60sec by default, updated every 20sec, the lease 
check fails with a margin of 20sec *before* it times out. this means if the 
lease update thread is not operating for 20sec it will cause a stop. that's 
quite a low figure probably
** the suggestion is to increase the lease timeout to 120sec from 60sec - 
update it as soon as 10sec has been eaten off it, and leave the 20sec safety 
margin at the end. This would result in 90sec 'idle equals faulty'
* on a machine with heavy load it seems likely that the lease-update-thread 
doesn't get scheduled timely enough - as it races for cpu against all the other 
busy threads
** the suggestion is to increase the thread priority of the lease update thread 
- so if the VM supports thread priorities, that would help reduce lease failure 
'just because the cpu is too busy'
* the ClusterNodeInfo, when renewing the lease, doesn't check if the lease has 
been marked as timed-out/recovering by another instance. it just overwrites 
whatever is there. 
** It should, however, only update the lease when it has not yet been marked as 
timed out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to