[jira] [Commented] (OAK-638) Avoid branch/merge for small commits

Marcel Reutegger (JIRA) Wed, 20 Feb 2013 03:01:18 -0800

    [ 
https://issues.apache.org/jira/browse/OAK-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582095#comment-13582095
 ]


Marcel Reutegger commented on OAK-638:
--------------------------------------

Oh dear! This is probably getting a bit off topic and may rather belong to the 
discussion thread on the mailing list you referenced, but I'll do it here 
anyway because I think its strongly related to this issue.

bq. To address this, RootImpl.commit() should not rebase commits that do not 
result in persisted branches. Rather should the underlying Microkernel 
implementation take care of this.

This will not work. The MicroKernel does not know about the CommitHook. The 
CommitHook should run on the tree after the rebase and right before it is 
promoted to head. This was also pointed out by Jukka in the conflict handling 
discussion. On the other hand this is quite strict and we touched this topic in 
previous discussions as well, e.g. when we talked about how to guarantee unique 
UUIDs for referenceable nodes.

As far as I can see the H2MK does not strictly implement the conflict handling 
as you outlined for the merge case. The H2MK acutally performs a merge of 
concurrent changes and may therefore render previously asserted conditions in a 
commit hook invalid. Think of the unique UUID case again.

So, what if we change the implementation to fail fast when the branchRevisionId 
does not match the current head revision? This lead to another problem 
discussed in the conflict handling thread: throughput will suffer when there 
are concurrent changes. Now the atomic block effectively isn't just the head 
revision check, but also includes running the commit hooks (in the layer on top 
of the MK). I don't think this will work with larger commits and concurrent 
smaller changes. The large commit will likely have to rebase and re-run the 
commit hooks over and over again.

I think it's important we distinguish different levels of consistency when it 
comes to conflict handling. The MK as it is right now can only ensure what I 
now call _structural_ consistency. It can e.g. detect when a property is 
changed that was removed in the meantime. This kind of detection and conflict 
handling is tied to the structure of the tree. Modifications applied to 
different locations of the tree never conflict per our definition.

On the other hand _semantic_ consistency is ensured with the CommmitHook. 
Because this stronger consistency is layered on top of the MK, this either 
means a) the MK merge() and commit() must be strict as described by Michael to 
ensure semantic consistency or b) the MK merges concurrent changes in merge() 
and commit() but semantic consistency is not guaranteed. I don't think a) is an 
option because it has an impact on performance. b) is roughly what we do right 
now, but is not in line with the conflict handling Michael outlined.

Jukkas SegmentMK is somewhere in between because it knows the CommitHook and 
can run it in the rebase/atomic-head-update loop.

Now, what I think we must do, is decide what kind consistency we want from the 
system. Right now we have a mix and at least to me it is not clear where we are 
going. Let's recap, Michaels conflict handling strives for strong _semantic_ 
consistency, which IMO does not work efficently with distributed concurrent 
commits. H2MK does not ensure _semantic_ consistency because it merges 
concurrent changes in commit() and merge() without CommitHook. SegmentMK 
ensures _semantic_ consistency because it has access to the CommitHook, but I 
think the cost may be quite high when there are concurrent commits.

And how is this related to this issue? To avoid creating the branch when a 
rebase is requested, we'd have to again implement the rebase in memory in 
KernelNodeStore, with all the conflict handling. But that doesn't guarantee 
semantic consistency because the current MK implementations automatically merge 
commits. Then why are we re-basing and running the CommitHook in the first 
place? We could just as well say the CommitHooks are best effort and run them 
without a rebase before the commit.
                
> Avoid branch/merge for small commits
> ------------------------------------
>
>                 Key: OAK-638
>                 URL: https://issues.apache.org/jira/browse/OAK-638
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>            Reporter: Marcel Reutegger
>            Priority: Minor
>         Attachments: OAK-638.patch
>
>
> The branch/merge features on the MicroKernel were initially introduced to 
> stage changes of large commits. Currently oak-core creates a branch even for 
> small changes like updating a property. I think this introduces quite some 
> overhead for scenarios with highly concurrent updates. E.g. think of a 
> twitter like application or a forum with comments. Well, basically user 
> generated content. These update tend to be rather small (couple of nodes) but 
> frequent and concurrent.
> Right now oak-core always does:
> - MK.branch()
> - MK.commit() to branch
> - MK.merge()
> For small commits, it ideally should do:
> - MK.commit() to trunk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OAK-638) Avoid branch/merge for small commits

Reply via email to