Add some documentation for the logic behind the conflict normalization
in rerere.
Helped-by: Junio C Hamano
Signed-off-by: Thomas Gummerer
---
Documentation/technical/rerere.txt | 140 +
rerere.c | 4 -
2 files changed, 140 insertions(+), 4 deletions(-)
create mode 100644 Documentation/technical/rerere.txt
diff --git a/Documentation/technical/rerere.txt
b/Documentation/technical/rerere.txt
new file mode 100644
index 00..4102cce7aa
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,140 @@
+Rerere
+==
+
+This document describes the rerere logic.
+
+Conflict normalization
+--
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Different conflict styles and branch names are normalized by stripping
+the labels from the conflict markers, and removing extraneous
+information from the `diff3` conflict style. Branches that are merged
+in different order are normalized by sorting the conflict hunks. More
+on each of those steps in the following sections.
+
+Once these two normalization operations are applied, a conflict ID is
+calculated based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Stripping extraneous information
+
+
+Say we have three branches AB, AC and AC2. The common ancestor of
+these branches has a file with a line containing the string "A" (for
+brevity this is called "line A" in the rest of the document). In
+branch AB this line is changed to "B", in AC, this line is changed to
+"C", and branch AC2 is forked off of AC, after the line was changed to
+"C".
+
+Forking a branch ABAC off of branch AB and then merging AC into it, we
+get a conflict like the following:
+
+<<< HEAD
+B
+===
+C
+>>> AC
+
+Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
+and then merging branch AC2 into it), using the diff3 conflict style,
+we get a conflict like the following:
+
+<<< HEAD
+B
+||| merged common ancestors
+A
+===
+C
+>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+After examining what branches AB and AC did, I believe that making
+line A into line D is the best thing to do that is compatible with
+what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same. To do this, the labels on the conflict
+markers are stripped, and the diff3 output is removed. The above
+examples would both result in the following normalized conflict:
+
+<<<
+B
+===
+C
+>>>
+
+Sorting hunks
+~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part. And then four branches
+are forked that do these things:
+
+- AB: changes A to B
+- AC: changes A to C
+- XY: changes X to Y
+- XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order. The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+After examining what branches AB and AC did, I believe that
+making line A into line D is the best thing to do that is
+compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean. We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z". We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or