[
https://issues.apache.org/jira/browse/OAK-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946963#comment-14946963
]
Marcel Reutegger commented on OAK-3488:
---------------------------------------
Yes, that's correct. This shows another flaw of the current recovery locking
mechanism, which should be fixed as well.
We need a way to detect if the recovering instance is still alive. If we know,
which instance is recovering we could use the existing lease mechanism.
However, this does not work when an instance recovers itself, at least with the
current implementation. The DocumentNodeStore runs recovery first before it
starts the lease update thread.
> LastRevRecovery for self async?
> -------------------------------
>
> Key: OAK-3488
> URL: https://issues.apache.org/jira/browse/OAK-3488
> Project: Jackrabbit Oak
> Issue Type: Task
> Reporter: Julian Reschke
>
> Currently, when a cluster node starts and discovers that it wasn't properly
> shutdown, it first runs the complete LastRevRecovery and only continues
> startup when done.
> However, when it fails to acquire the recovery lock, which implies that a
> different cluster node is already running the recovery on its behalf, it
> simply skips recovery and continues startup?
> So what is it? Is running the recovery before proceeding critical or not? If
> it is, this code in {{LastRevRecoveryAgent}} needs to change:
> {code}
> //TODO What if recovery is being performed for current clusterNode by
> some other node
> //should we halt the startup
> if(!lockAcquired){
> log.info("Last revision recovery already being performed by some
> other node. " +
> "Would not attempt recovery");
> return 0;
> }
> {code}
> If it's not critical, we may want to run the recovery always asynchronously.
> cc [~mreutegg] and [~chetanm]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)