[
https://issues.apache.org/jira/browse/OAK-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582095#comment-13582095
]
Marcel Reutegger commented on OAK-638:
--------------------------------------
Oh dear! This is probably getting a bit off topic and may rather belong to the
discussion thread on the mailing list you referenced, but I'll do it here
anyway because I think its strongly related to this issue.
bq. To address this, RootImpl.commit() should not rebase commits that do not
result in persisted branches. Rather should the underlying Microkernel
implementation take care of this.
This will not work. The MicroKernel does not know about the CommitHook. The
CommitHook should run on the tree after the rebase and right before it is
promoted to head. This was also pointed out by Jukka in the conflict handling
discussion. On the other hand this is quite strict and we touched this topic in
previous discussions as well, e.g. when we talked about how to guarantee unique
UUIDs for referenceable nodes.
As far as I can see the H2MK does not strictly implement the conflict handling
as you outlined for the merge case. The H2MK acutally performs a merge of
concurrent changes and may therefore render previously asserted conditions in a
commit hook invalid. Think of the unique UUID case again.
So, what if we change the implementation to fail fast when the branchRevisionId
does not match the current head revision? This lead to another problem
discussed in the conflict handling thread: throughput will suffer when there
are concurrent changes. Now the atomic block effectively isn't just the head
revision check, but also includes running the commit hooks (in the layer on top
of the MK). I don't think this will work with larger commits and concurrent
smaller changes. The large commit will likely have to rebase and re-run the
commit hooks over and over again.
I think it's important we distinguish different levels of consistency when it
comes to conflict handling. The MK as it is right now can only ensure what I
now call _structural_ consistency. It can e.g. detect when a property is
changed that was removed in the meantime. This kind of detection and conflict
handling is tied to the structure of the tree. Modifications applied to
different locations of the tree never conflict per our definition.
On the other hand _semantic_ consistency is ensured with the CommmitHook.
Because this stronger consistency is layered on top of the MK, this either
means a) the MK merge() and commit() must be strict as described by Michael to
ensure semantic consistency or b) the MK merges concurrent changes in merge()
and commit() but semantic consistency is not guaranteed. I don't think a) is an
option because it has an impact on performance. b) is roughly what we do right
now, but is not in line with the conflict handling Michael outlined.
Jukkas SegmentMK is somewhere in between because it knows the CommitHook and
can run it in the rebase/atomic-head-update loop.
Now, what I think we must do, is decide what kind consistency we want from the
system. Right now we have a mix and at least to me it is not clear where we are
going. Let's recap, Michaels conflict handling strives for strong _semantic_
consistency, which IMO does not work efficently with distributed concurrent
commits. H2MK does not ensure _semantic_ consistency because it merges
concurrent changes in commit() and merge() without CommitHook. SegmentMK
ensures _semantic_ consistency because it has access to the CommitHook, but I
think the cost may be quite high when there are concurrent commits.
And how is this related to this issue? To avoid creating the branch when a
rebase is requested, we'd have to again implement the rebase in memory in
KernelNodeStore, with all the conflict handling. But that doesn't guarantee
semantic consistency because the current MK implementations automatically merge
commits. Then why are we re-basing and running the CommitHook in the first
place? We could just as well say the CommitHooks are best effort and run them
without a rebase before the commit.
> Avoid branch/merge for small commits
> ------------------------------------
>
> Key: OAK-638
> URL: https://issues.apache.org/jira/browse/OAK-638
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core
> Reporter: Marcel Reutegger
> Priority: Minor
> Attachments: OAK-638.patch
>
>
> The branch/merge features on the MicroKernel were initially introduced to
> stage changes of large commits. Currently oak-core creates a branch even for
> small changes like updating a property. I think this introduces quite some
> overhead for scenarios with highly concurrent updates. E.g. think of a
> twitter like application or a forum with comments. Well, basically user
> generated content. These update tend to be rather small (couple of nodes) but
> frequent and concurrent.
> Right now oak-core always does:
> - MK.branch()
> - MK.commit() to branch
> - MK.merge()
> For small commits, it ideally should do:
> - MK.commit() to trunk
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira