Hi,

On 27.11.12 10:56, Jukka Zitting wrote:
Hi,

As discussed at length earlier, in oak-core we only keep track of tree
states instead of change logs, i.e. the sequences of changes that lead
from one state to another. This works pretty well in general and
avoids a lot of extra complexity, but poses the question of how to
best produce the JSOP change log that the MicroKernel expects.

The default case where content is simply added, modified or removed is
easily handled by the existing JsopDiff class that maps a tree diff to
the corresponding JSOP operations. The only big problem with this
approach is that it doesn't handle bulk operations like copies and
moves too well, instead it falls back to large add(+remove)
operations. Currently we work around this problem by mapping each
higher-level copy and move operation to separate branch commits that
contain just that operation, which isn't too ideal.

Michael already did quite a bit of work on this problem earlier and
realized that it's essentially impossible to reliably extract all
copies and moves from just a tree diff. He did come up with a pretty
brilliant mechanism for handling that case when we do have the
original change log, but I'd really like to avoid having to keep track
of all changes.

Thanks Jukka for picking up the ball! Our private chat from last week triggered some new ideas on my side. Basically I

* reviewed the preconditions: what information do we have or could we make available? How does this differ from the preconditions I based my original work on change log on [1, 2]?

* Applied the knowledge gathered while working on my change log based solution upon the new situation.

Thus I looked at how we could leverage the following two
simplifications to the problem posed by a general tree diff:

1) For each tree node we keep track of its original location. For
example in KernelNodeState we have a record of the path at which the
node existed in the base revision.

Right, this matches my observation. However, I think recording the path is not sufficient. We need to record the parent node.


2) We don't need optimal handling of all possible corner cases. As
long as we cover basic copy and move use cases we can let the handling
of trickier situations (like replacing a tree with one of its
subtrees, or swapping two subtrees) degrade to the fallback mechanism
we already have in JsopDiff. The lack of fully accurate move tracking
may cause difficulty in merging concurrent changes over moves, but I
think a few extra potential conflicts over non-trivial moves is a
reasonable price to pay.

If we record the original parent node for all moved/copied nodes, I think it is possible to come up with a change log which is not larger than twice the size of a minimal change log. The tricky part here is to find out in what order to execute copy and move operations.

In general the size of the change log should be acceptable and for most common cases I think it will be even minimal. If we really think we need to squeeze the change logs down to minimal, there is sill my initial change log consolidator [1, 2].


I did some work along these lines already earlier with the
CopyAndMoveAwareJsopDiff class you can find inside KernelRootBuilder,
but couldn't make it work properly. Based on discussions with Michael
last week I think we should be able to come up with an algorithm that
works pretty well for this purpose. More on that in follow-ups.

I'll also sketch my idea an follow up with a separate message.

Michael


[1] http://markmail.org/message/3c22jqy6gzzkyxx2
[2] http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/main/java/org/apache/jackrabbit/state/ChangeLog.java?view=markup

BR,

Jukka Zitting

Reply via email to