[
https://issues.apache.org/jira/browse/OAK-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645550#comment-14645550
]
Chetan Mehrotra commented on OAK-2739:
--------------------------------------
[~egli] I am still not clear on the severity of this issue. Currently
background lease is updated periodically (every 1 sec) by a dedicated thread
which just perform a single operation and not much. So even if there are issues
in other parts this thread would continue to work (which might be wrong) and
still update the lease every 1 sec.
So to me lease update does not look like an operation which would take long
time and cause above mentioned issues. May be I am missing something here
> take appropriate action when lease cannot be renewed (in time)
> --------------------------------------------------------------
>
> Key: OAK-2739
> URL: https://issues.apache.org/jira/browse/OAK-2739
> Project: Jackrabbit Oak
> Issue Type: Task
> Components: mongomk
> Affects Versions: 1.2
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Labels: resilience
> Fix For: 1.3.5
>
>
> Currently, in an oak-cluster when (e.g.) one oak-client stops renewing its
> lease (ClusterNodeInfo.renewLease()), this will be eventually noticed by the
> others in the same oak-cluster. Those then mark this client as {{inactive}}
> and start recoverying and subsequently removing that node from any further
> merge etc operation.
> Now, whatever the reason was why that client stopped renewing the lease
> (could be an exception, deadlock, whatever) - that client itself still
> considers itself as {{active}} and continues to take part in the cluster
> action.
> This will result in a unbalanced situation where that one client 'sees'
> everybody as {{active}} while the others see this one as {{inactive}}.
> If this ClusterNodeInfo state should be something that can be built upon, and
> to avoid any inconsistency due to unbalanced handling, the inactive node
> should probably retire gracefully - or any other appropriate action should be
> taken, other than just continuing as today.
> This ticket is to keep track of ideas and actions taken wrt this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)