Re: [RFC/PATCH 3/7] rerere: add some documentation

2018-06-03 Thread Thomas Gummerer
On 05/24, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > +Conflict normalization
> > +--
> > +
> > +To try and re-do a conflict resolution, even when different merge
> > +strategies are used, 'rerere' computes a conflict ID for each
> > +conflict in the file.
> > +
> > +This is done by discarding the common ancestor version in the
> > +diff3-style, and re-ordering the two sides of the conflict, in
> > +alphabetic order.
> 
> s/discarding.*-style/normalising the conflicted section to 'merge' style/
> 
> The motivation behind the normalization should probably be given
> upfront in the first paragraph.  It is to ensure the recorded
> resolutions can be looked up from the rerere database for
> application, even when branches are merged in different order.  I am
> not sure what you meant by even when different merge stratagies are
> used; I'd drop that if I were writing the paragraph.

What I meant was when different conflict styles are used, and when the
branches are merged in different orders.  But merge strategies is
obviously not a good word for that.  Will rephrase this.

> > +Using this technique a conflict that looks as follows when for example
> > +'master' was merged into a topic branch:
> > +
> > +<<< HEAD
> > +foo
> > +===
> > +bar
> > +>>> master
> > +
> > +and the opposite way when the topic branch is merged into 'master':
> > +
> > +<<< HEAD
> > +bar
> > +===
> > +foo
> > +>>> topic
> > +
> > +can be recognized as the same conflict, and can automatically be
> > +re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would
> > +be calculated from 'barfoo' in both cases.
> 
> You earlier talked about normalizing and reordering, but did not
> talk about "concatenate both with NUL in between and hash", so the
> explanation in the last two lines are not quite understandable by
> mere mortals, even though I know which part of the code you are
> referring to.  When you talk about hasing, you may want to make sure
> the readers understand that the branch label on <<< and >>> lines
> are ignored.
> 
> > +If there are multiple conflicts in one file, they are all appended to
> > +one another, both in the 'preimage' file as well as in the conflict
> > +ID.
> 
> In case it was not clear (and I do not think it is to those who only
> read your description and haven't thought things through
> themselves), this concatenation is why the normalization by
> reordering is helpful.  Imagine that a common ancestor had a file
> with a line with string "A" on it (I'll call such a line "line A"
> for brevity in the following) in its early part, and line X in its
> late part.  And then you fork four branches that do these things:
> 
> - AB: changes A to B
> - AC: changes A to C
> - XY: changes X to Y
> - XZ: changes X to Z
> 
> Now, forking a branch ABAC off of branch AB and then merging AC into
> it, and forking a branch ACAB off of branch AC and then merging AB
> into it, would yield the conflict in a different order.  The former
> would say "A became B or C, what now?" while the latter would say "A
> became C or B, what now?"
> 
> But the act of merging AC into ABAC and resolving the conflict to
> leave line D means that you declare: 
> 
> After examining what branches AB and AC did, I believe that
> making line A into line D is the best thing to do that is
> compatible with what AB and AC wanted to do.
> 
> So the conflict we would see when merging AB into ACAB should be
> resolved the same way---it is the resolution that is in line with
> that declaration.
> 
> Imagine that similarly you had previously forked branch XYXZ from
> XY, merged XZ into it, and resolved "X became Y or Z" into "X became
> W".
> 
> Now, if you forked a branch ABXY from AB and then merged XY, then
> ABXY would have line B in its early part and line Y in its later
> part.  Such a merge would be quite clean.  We can construct
> 4 combinations using these four branches ((AB, AC) x (XY, XZ)).
> 
> Merging ABXY and ACXZ would make "an early A became B or C, a late X
> became Y or Z" conflict, while merging ACXY and ABXZ would make "an
> early A became C or B, a late X became Y or Z".  We can see there
> are 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
> 
> By sorting, we can give the conflict its canonical name, namely, "an
> early part became B or C, a late part becames X or Y", and whenever
> any of these four patterns appear, we can get to the same conflict
> and resolution that we saw earlier.  Without the sorting, we will
> have to somehow find a previous resolution from combinatorial
> explosion ;-)

Thanks for the in depth explanation!  I'll incorporate this into the
document.

> These days post ec34a8b1 ("Merge branch 'jc/rerere-multi'",
> 2016-05-23), the conflict ID can safely collide, i.e. hash
> collisions that drops completely different conflicts and their
> resolutions into the same 

Re: [RFC/PATCH 3/7] rerere: add some documentation

2018-05-24 Thread Junio C Hamano
Thomas Gummerer  writes:

> +Conflict normalization
> +--
> +
> +To try and re-do a conflict resolution, even when different merge
> +strategies are used, 'rerere' computes a conflict ID for each
> +conflict in the file.
> +
> +This is done by discarding the common ancestor version in the
> +diff3-style, and re-ordering the two sides of the conflict, in
> +alphabetic order.

s/discarding.*-style/normalising the conflicted section to 'merge' style/

The motivation behind the normalization should probably be given
upfront in the first paragraph.  It is to ensure the recorded
resolutions can be looked up from the rerere database for
application, even when branches are merged in different order.  I am
not sure what you meant by even when different merge stratagies are
used; I'd drop that if I were writing the paragraph.

> +Using this technique a conflict that looks as follows when for example
> +'master' was merged into a topic branch:
> +
> +<<< HEAD
> +foo
> +===
> +bar
> +>>> master
> +
> +and the opposite way when the topic branch is merged into 'master':
> +
> +<<< HEAD
> +bar
> +===
> +foo
> +>>> topic
> +
> +can be recognized as the same conflict, and can automatically be
> +re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would
> +be calculated from 'barfoo' in both cases.

You earlier talked about normalizing and reordering, but did not
talk about "concatenate both with NUL in between and hash", so the
explanation in the last two lines are not quite understandable by
mere mortals, even though I know which part of the code you are
referring to.  When you talk about hasing, you may want to make sure
the readers understand that the branch label on <<< and >>> lines
are ignored.

> +If there are multiple conflicts in one file, they are all appended to
> +one another, both in the 'preimage' file as well as in the conflict
> +ID.

In case it was not clear (and I do not think it is to those who only
read your description and haven't thought things through
themselves), this concatenation is why the normalization by
reordering is helpful.  Imagine that a common ancestor had a file
with a line with string "A" on it (I'll call such a line "line A"
for brevity in the following) in its early part, and line X in its
late part.  And then you fork four branches that do these things:

- AB: changes A to B
- AC: changes A to C
- XY: changes X to Y
- XZ: changes X to Z

Now, forking a branch ABAC off of branch AB and then merging AC into
it, and forking a branch ACAB off of branch AC and then merging AB
into it, would yield the conflict in a different order.  The former
would say "A became B or C, what now?" while the latter would say "A
became C or B, what now?"

But the act of merging AC into ABAC and resolving the conflict to
leave line D means that you declare: 

After examining what branches AB and AC did, I believe that
making line A into line D is the best thing to do that is
compatible with what AB and AC wanted to do.

So the conflict we would see when merging AB into ACAB should be
resolved the same way---it is the resolution that is in line with
that declaration.

Imagine that similarly you had previously forked branch XYXZ from
XY, merged XZ into it, and resolved "X became Y or Z" into "X became
W".

Now, if you forked a branch ABXY from AB and then merged XY, then
ABXY would have line B in its early part and line Y in its later
part.  Such a merge would be quite clean.  We can construct
4 combinations using these four branches ((AB, AC) x (XY, XZ)).

Merging ABXY and ACXZ would make "an early A became B or C, a late X
became Y or Z" conflict, while merging ACXY and ABXZ would make "an
early A became C or B, a late X became Y or Z".  We can see there
are 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").

By sorting, we can give the conflict its canonical name, namely, "an
early part became B or C, a late part becames X or Y", and whenever
any of these four patterns appear, we can get to the same conflict
and resolution that we saw earlier.  Without the sorting, we will
have to somehow find a previous resolution from combinatorial
explosion ;-)

These days post ec34a8b1 ("Merge branch 'jc/rerere-multi'",
2016-05-23), the conflict ID can safely collide, i.e. hash
collisions that drops completely different conflicts and their
resolutions into the same .git/rr-cache/$id directory will not
interfere with proper operation of the system, thanks to that
rerere-multi topic that allows us to store multiple preimage
conflicts that happens to share the same conflict ID with their
corresponding postimage resolutions.

In theory, we *should* be able to stub out the SHA-1 computation and
give every conflict the same ID and rerere should still operate
correctly, even though I haven't tried it yet myself.