[ 
https://issues.apache.org/jira/browse/OAK-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699331#comment-14699331
 ] 

Julian Reschke commented on OAK-2739:
-------------------------------------

With the latest changes, the unit tests fail:

Failed tests:   
readWriteMode(org.apache.jackrabbit.oak.plugins.document.ClusterInfoTest): 
performLeaseCheck: this instance failed to update the lease in time 
(leaseEndTime: 1439806623318, now: 1439806623319, leaseTime: 0) and is thus no 
longer eligible for taking part in the cluster. Shutting down NOW!

Also, many errors:
Tests in error:
  
sameCluster(org.apache.jackrabbit.oak.plugins.blob.ClusterRepositoryInfoTest): 
clusterNodeInfo must not be null
  
checkGetIdWhenNotRegistered(org.apache.jackrabbit.oak.plugins.blob.ClusterRepositoryInfoTest):
 clusterNodeInfo must not be null
  
revisionComparisonMultipleClusterNode(org.apache.jackrabbit.oak.plugins.document.ClusterRevisionComparisonTest):
 clusterNodeInfo must not be null
  
revisionComparisonTwoClusterNodes(org.apache.jackrabbit.oak.plugins.document.ClusterRevisionComparisonTest):
 clusterNodeInfo must not be null
  fromExternalChange(org.apache.jackrabbit.oak.plugins.document.ClusterTest): 
clusterNodeInfo must not be null
  threeNodes(org.apache.jackrabbit.oak.plugins.document.ClusterTest): 
clusterNodeInfo must not be null
  openCloseOpen(org.apache.jackrabbit.oak.plugins.document.ClusterTest): 
clusterNodeInfo must not be null
  
clusterBranchInVisibility(org.apache.jackrabbit.oak.plugins.document.ClusterTest):
 clusterNodeInfo must not be null
  clusterBranchRebase(org.apache.jackrabbit.oak.plugins.document.ClusterTest): 
clusterNodeInfo must not be null
  conflict(org.apache.jackrabbit.oak.plugins.document.ClusterTest): 
clusterNodeInfo must not be null
  revisionVisibility(org.apache.jackrabbit.oak.plugins.document.ClusterTest): 
clusterNodeInfo must not be null
  
rollbackAfterConflict(org.apache.jackrabbit.oak.plugins.document.ClusterTest): 
clusterNodeInfo must not be null
  purge(org.apache.jackrabbit.oak.plugins.document.CollisionTest): 
clusterNodeInfo must not be null



> take appropriate action when lease cannot be renewed (in time)
> --------------------------------------------------------------
>
>                 Key: OAK-2739
>                 URL: https://issues.apache.org/jira/browse/OAK-2739
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: core
>    Affects Versions: 1.2
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>              Labels: resilience
>             Fix For: 1.3.4
>
>         Attachments: OAK-2739.patch
>
>
> Currently, in an oak-cluster when (e.g.) one oak-client stops renewing its 
> lease (ClusterNodeInfo.renewLease()), this will be eventually noticed by the 
> others in the same oak-cluster. Those then mark this client as {{inactive}} 
> and start recoverying and subsequently removing that node from any further 
> merge etc operation.
> Now, whatever the reason was why that client stopped renewing the lease 
> (could be an exception, deadlock, whatever) - that client itself still 
> considers itself as {{active}} and continues to take part in the cluster 
> action.
> This will result in a unbalanced situation where that one client 'sees' 
> everybody as {{active}} while the others see this one as {{inactive}}.
> If this ClusterNodeInfo state should be something that can be built upon, and 
> to avoid any inconsistency due to unbalanced handling, the inactive node 
> should probably retire gracefully - or any other appropriate action should be 
> taken, other than just continuing as today.
> This ticket is to keep track of ideas and actions taken wrt this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to