[
https://issues.apache.org/jira/browse/OAK-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043966#comment-15043966
]
Vikas Saurabh edited comment on OAK-3733 at 12/6/15 5:42 PM:
-------------------------------------------------------------
The interesting revisions are:
# {{r151233e54e1-0-4}} (the conflicting rev) - it's marked committed at
{{2:p/r1512340e9c1-0-4/0}}. The revision marked _deleted=false starting from
{{9:/oak:index/event.job.topic/:index/enc_value/var/eventing/jobs/assigned/server_uuid}}
(and downwards). The revision was marked correctly for _commitRoot for the
parent of created hierarchy - {{assigned}} node.
# {{r151233e5114-0-2}} (the rev which deleted the one of the parent and
remained undetected OR didn't detect hierarchy change mentioned above) - it's
marked committed at {{2:p/r151233e5114-0-2/0}}. The revision marked
_deleted=true starting from
{{5:/oak:index/event.job.topic/:index/enc_value/var}} upto
{{8:/oak:index/event.job.topic/:index/enc_value/var/eventing/jobs/assigned}}
(and most probably a sibling hierarchy that I didn't capture in mongoexport).
Depending upon which session got to get committed first, either #1 or #2 should
have detected a conflict due to changes on {{assigned}} node. Given, the
revision timestamps are milliseconds apart - most probably both cluster id 2
and 4 would have treated the other rev as being from future.
was (Author: catholicon):
The interesting revisions are:
# {{r151233e54e1-0-4}} (the conflicting rev) - it's marked committed at
{{2:p/r1512340e9c1-0-4/0}}. The revision marked _deleted=false starting from
{{9:/oak:index/event.job.topic/:index/enc_value/var/eventing/jobs/assigned/server_uuid}}
(and downwards). The revision was marked correctly for _commitRoot for the
parent of created hierarchy - {{assigned}} node.
# {{r151233e5114-0-2}} (the rev which deleted the one of the parent and
remained undetected OR didn't detect hierarchy change mentioned above) - it's
marked committed at {{:p/r151233e5114-0-2/0}}. The revision marked
_deleted=true starting from
{{5:/oak:index/event.job.topic/:index/enc_value/var}} upto
{{8:/oak:index/event.job.topic/:index/enc_value/var/eventing/jobs/assigned}}
(and most probably a sibling hierarchy that I didn't capture in mongoexport).
Depending upon which session got to get committed first, either #1 or #2 should
have detected a conflict due to changes on {{assigned}} node. Given, the
revision timestamps are milliseconds apart - most probably both cluster id 2
and 4 would have treated the other rev as being from future.
> Sometimes hierarchy confict between concurrent add/delete isn't detected
> ------------------------------------------------------------------------
>
> Key: OAK-3733
> URL: https://issues.apache.org/jira/browse/OAK-3733
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: core, documentmk
> Reporter: Vikas Saurabh
> Assignee: Vikas Saurabh
> Attachments: mongoexport.zip
>
>
> I'm not sure of exact set of event that led to an incident on one of our test
> clusters. The cluster is running 3 AEM instances based on oak build at
> 1.3.10.r1713699 backed by a single mongo 3 instance.
> Unfortunately, we found the issue too late and logs had rolled over. Here's
> the exception that showed over and over as workflow jobs were (trying to)
> being processed:
> {noformat}
> ....
> at java.lang.Thread.run(Thread.java:745)
> Caused by: javax.jcr.InvalidItemStateException: OakMerge0004: OakMerge0004:
> The node
> 8:/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned
> was already added in revision
> r151233e54e1-0-4, before
> r15166378b6a-0-2 (retries 5, 6830 ms)
> at
> org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:239)
> at
> org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:212)
> at
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.newRepositoryException(SessionDelegate.java:669)
> at
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:495)
> at
> org.apache.jackrabbit.oak.jcr.session.SessionImpl$8.performVoid(SessionImpl.java:419)
> at
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performVoid(SessionDelegate.java:273)
> at
> org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:416)
> at
> org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.commit(JcrResourceProvider.java:634)
> ... 16 common frames omitted
> Caused by: org.apache.jackrabbit.oak.api.CommitFailedException: OakMerge0004:
> OakMerge0004: The node
> 8:/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned
> was already added in revision
> r151233e54e1-0-4, before
> r15166378b6a-0-2 (retries 5, 6830 ms)
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.merge0(DocumentNodeStoreBranch.java:200)
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.merge(DocumentNodeStoreBranch.java:123)
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentRootBuilder.merge(DocumentRootBuilder.java:158)
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.merge(DocumentNodeStore.java:1497)
> at
> org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:247)
> at
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.commit(SessionDelegate.java:346)
> at
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:493)
> ... 20 common frames omitted
> Caused by: org.apache.jackrabbit.oak.plugins.document.ConflictException: The
> node
> 8:/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned
> was already added in revision
> r151233e54e1-0-4, before
> r15166378b6a-0-2
> at
> org.apache.jackrabbit.oak.plugins.document.Commit.checkConflicts(Commit.java:582)
> at
> org.apache.jackrabbit.oak.plugins.document.Commit.createOrUpdateNode(Commit.java:487)
> at
> org.apache.jackrabbit.oak.plugins.document.Commit.applyToDocumentStore(Commit.java:371)
> at
> org.apache.jackrabbit.oak.plugins.document.Commit.applyToDocumentStore(Commit.java:265)
> at
> org.apache.jackrabbit.oak.plugins.document.Commit.applyInternal(Commit.java:234)
> at
> org.apache.jackrabbit.oak.plugins.document.Commit.apply(Commit.java:219)
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.persist(DocumentNodeStoreBranch.java:290)
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.persist(DocumentNodeStoreBranch.java:260)
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.access$300(DocumentNodeStoreBranch.java:54)
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch$InMemory.merge(DocumentNodeStoreBranch.java:498)
> at
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.merge0(DocumentNodeStoreBranch.java:180)
> ... 26 common frames omitted
> ....
> {noformat}
> Doing following removed repo corruption and restored w/f processing:
> {noformat}
> oak.removeDescendantsAndSelf("/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned")
> {noformat}
> Attaching [mongoexport output|^mongoexport.zip] for
> {{/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned/6a389a6a-a8bf-4038-b57b-cb441c6ac557/com.adobe.granite.workflow.transient.job.etc.workflow.models.dam-xmp-writeback.jcr_content.model/2015/11/19/23/54/6a389a6a-a8bf-4038-b57b-cb441c6ac557_10}}
> (the hierarchy created at {{r151233e54e1-0-4}}). I've renamed a few path
> elements to make it more reable though (e.g.
> {{:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel}}
> -> {{enc_value}}).
> [~mreutegg], I'm assigning it to myself for now, but I think this would
> require your expertise all the way :).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)