[ 
https://issues.apache.org/jira/browse/OAK-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15454918#comment-15454918
 ] 

Martin Böttcher commented on OAK-4739:
--------------------------------------

[~mreutegg] modifying/decreasing the DB timeout can help in this situation but 
lead to other problems (eg. killing long lasting queries). The point is that 
the lease logic should handle isolated networking issue by implementing a 
proper retry logic. In the current implementation this retry gets *never 
called*. It's an improvement that the code tries to recover (at least once) 
from a network issue. 

> lease: immediate renew after long renew call
> --------------------------------------------
>
>                 Key: OAK-4739
>                 URL: https://issues.apache.org/jira/browse/OAK-4739
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: documentmk
>    Affects Versions: 1.5.8
>            Reporter: Martin Böttcher
>
> A single temporary network issue can shut down the DocumentStore. We observed 
> the following situation:
> # org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.renewLease was 
> called (this is done regularly and completely normal)
> # the network had a temporary issue (whatsoever)
> # the database call terminated after a lot of time (the default db 
> maxWaitTime is 120 seconds).
> # org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo.renewLease 
> decides that the current lease is too old (>120 seconds thats the default for 
> the oak.documentMK.leaseDurationSeconds property), sets a leaseCheckFailed 
> variable and throws an Exception
> # because leaseCheckFailed is set all following tries (if any) will 
> immediately throw an Exception, too.
> I'd recommend to make the ClusterNodeInfo code more robust so that at least 
> one retry will be made.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to