Re: [PATCH v3 05/11] rerere: add documentation for conflict normalization

2018-07-30 Thread Thomas Gummerer
On 07/30, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > +Different conflict styles and branch names are normalized by stripping
> > +the labels from the conflict markers, and removing extraneous
> > +information from the `diff3` conflict style. Branches that are merged
> 
> s/extraneous information/commmon ancestor version/ perhaps, to be
> fact-based without passing value judgment?

Yeah I meant "extraneous information for rerere", but common ancester
version is better.

> We drop the common ancestor version only because we cannot normalize
> from `merge` style to `diff3` style by adding one, and not because
> it is extraneous.  It does help humans understand the conflict a lot
> better to have that section.
> 
> > +By extension, this means that rerere should recognize that the above
> > +conflicts are the same.  To do this, the labels on the conflict
> > +markers are stripped, and the diff3 output is removed.  The above
> 
> s/diff3 output/common ancestor version/, as "diff3 output" would
> mean the whole thing between <<< and >>> to readers.

Makes sense, will fix in the re-roll, thanks!

> > diff --git a/rerere.c b/rerere.c
> > index be98c0afcb..da1ab54027 100644
> > --- a/rerere.c
> > +++ b/rerere.c
> > @@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int 
> > marker_size)
> >   * and NUL concatenated together.
> >   *
> >   * Return the number of conflict hunks found.
> > - *
> > - * NEEDSWORK: the logic and theory of operation behind this conflict
> > - * normalization may deserve to be documented somewhere, perhaps in
> > - * Documentation/technical/rerere.txt.
> >   */
> >  static int handle_path(unsigned char *sha1, struct rerere_io *io, int 
> > marker_size)
> >  {
> 
> Thanks for finally removing this age-old NEEDSWORK comment.


Re: [PATCH v3 05/11] rerere: add documentation for conflict normalization

2018-07-30 Thread Junio C Hamano
Thomas Gummerer  writes:

> +Different conflict styles and branch names are normalized by stripping
> +the labels from the conflict markers, and removing extraneous
> +information from the `diff3` conflict style. Branches that are merged

s/extraneous information/commmon ancestor version/ perhaps, to be
fact-based without passing value judgment?

We drop the common ancestor version only because we cannot normalize
from `merge` style to `diff3` style by adding one, and not because
it is extraneous.  It does help humans understand the conflict a lot
better to have that section.

> +By extension, this means that rerere should recognize that the above
> +conflicts are the same.  To do this, the labels on the conflict
> +markers are stripped, and the diff3 output is removed.  The above

s/diff3 output/common ancestor version/, as "diff3 output" would
mean the whole thing between <<< and >>> to readers.

> diff --git a/rerere.c b/rerere.c
> index be98c0afcb..da1ab54027 100644
> --- a/rerere.c
> +++ b/rerere.c
> @@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int 
> marker_size)
>   * and NUL concatenated together.
>   *
>   * Return the number of conflict hunks found.
> - *
> - * NEEDSWORK: the logic and theory of operation behind this conflict
> - * normalization may deserve to be documented somewhere, perhaps in
> - * Documentation/technical/rerere.txt.
>   */
>  static int handle_path(unsigned char *sha1, struct rerere_io *io, int 
> marker_size)
>  {

Thanks for finally removing this age-old NEEDSWORK comment.


[PATCH v3 05/11] rerere: add documentation for conflict normalization

2018-07-14 Thread Thomas Gummerer
Add some documentation for the logic behind the conflict normalization
in rerere.

Helped-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---
 Documentation/technical/rerere.txt | 140 +
 rerere.c   |   4 -
 2 files changed, 140 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

diff --git a/Documentation/technical/rerere.txt 
b/Documentation/technical/rerere.txt
new file mode 100644
index 00..4102cce7aa
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,140 @@
+Rerere
+==
+
+This document describes the rerere logic.
+
+Conflict normalization
+--
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Different conflict styles and branch names are normalized by stripping
+the labels from the conflict markers, and removing extraneous
+information from the `diff3` conflict style. Branches that are merged
+in different order are normalized by sorting the conflict hunks.  More
+on each of those steps in the following sections.
+
+Once these two normalization operations are applied, a conflict ID is
+calculated based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Stripping extraneous information
+
+
+Say we have three branches AB, AC and AC2.  The common ancestor of
+these branches has a file with a line containing the string "A" (for
+brevity this is called "line A" in the rest of the document).  In
+branch AB this line is changed to "B", in AC, this line is changed to
+"C", and branch AC2 is forked off of AC, after the line was changed to
+"C".
+
+Forking a branch ABAC off of branch AB and then merging AC into it, we
+get a conflict like the following:
+
+<<< HEAD
+B
+===
+C
+>>> AC
+
+Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
+and then merging branch AC2 into it), using the diff3 conflict style,
+we get a conflict like the following:
+
+<<< HEAD
+B
+||| merged common ancestors
+A
+===
+C
+>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+After examining what branches AB and AC did, I believe that making
+line A into line D is the best thing to do that is compatible with
+what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same.  To do this, the labels on the conflict
+markers are stripped, and the diff3 output is removed.  The above
+examples would both result in the following normalized conflict:
+
+<<<
+B
+===
+C
+>>>
+
+Sorting hunks
+~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part.  And then four branches
+are forked that do these things:
+
+- AB: changes A to B
+- AC: changes A to C
+- XY: changes X to Y
+- XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order.  The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+After examining what branches AB and AC did, I believe that
+making line A into line D is the best thing to do that is
+compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean.  We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z".  We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or