[jira] [Commented] (OAK-8627) Avoid late-arriving lastRev update from crashed instance

Marcel Reutegger (Jira) Thu, 19 Sep 2019 06:22:23 -0700


    [ 
https://issues.apache.org/jira/browse/OAK-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933371#comment-16933371
 ]


Marcel Reutegger commented on OAK-8627:
---------------------------------------

Remaining changes to discuss: [^OAK-8627.patch].

> Avoid late-arriving lastRev update from crashed instance
> --------------------------------------------------------
>
>                 Key: OAK-8627
>                 URL: https://issues.apache.org/jira/browse/OAK-8627
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: documentmk
>    Affects Versions: 1.16.0
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Major
>         Attachments: OAK-8627.patch
>
>
> Recently a deployment with a two node cluster showed a Sling Discovery Oak 
> with a cluster view that had a clusterId stuck in the deactivating state.
> According to the entry in the clusterNodes collection, the clusterId in the 
> deactivating state was inactive. However, the revisions for the _lastRev 
> entry on the root document and the lastWrittenRootRev did not match. The 
> latter was slightly more recent. This caused the Sling Discovery Oak to 
> consider the clusterId as not entirely shut down.
> While there is no direct proof, one theoretical scenario [~mreutegg] 
> identified as a _potential_ root cause was that it can happen that the 
> lastRev for a clusterId on the root document is set back to an earlier value 
> due to a race condition:
> Before the lease expiry, the backgorund update thread could have issued an 
> update for the root document, which then took a very long time to reach the 
> DocumentStore, longer than the lease timeout and recovery which must have 
> been done by another instance meanwhile.
> If such a late-arriving update of the {{_lastRev}} is possible, then the 
> reset of the lastRev value on the root document could be explained, since the 
> update is currently done unconditionally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (OAK-8627) Avoid late-arriving lastRev update from crashed instance

Reply via email to