[PATCH] combine-diff: coalesce lost lines optimally

2013-03-17 Thread Antoine Pelisse
This replaces the greedy implementation to coalesce lost lines by using
dynamic programming to find the Longest Common Subsequence.

The O(n²) time complexity is obviously bigger than previous
implementation but it can produce shorter diff results (and most likely
easier to read).

List of lost lines is now doubly-linked because we reverse-read it when
reading the direction matrix.

Signed-off-by: Antoine Pelisse apeli...@gmail.com
---
Hi,
This is a very first draft for improving the way we coalesce lost
lines. It has only been tested with the two scenarios below.

What is left to do:
- Test it more extensively
- Had some tests scenarios

I'm also having a hard time trying it with more than two parents. How I
am supposed to have more than two parents while octopus merge refuses if
there are conflicts ?

Tested scenarios:
git init
test
git add test
git commit -m initial

git checkout -b side1
seq 10 test
git commit -m all -a
git checkout master
seq 1 2 10 test
git commit -m three -a

git merge side1
test
git commit -m merge -a
git show

AND

git init
test
git add test
git commit -m initial

echo 3\n1\n2\n4 test
git commit -m shuffled -a

git checkout -b side HEAD^
echo 1\n2\n3\n4 test
git commit -m sorted -a

git merge master
test
git commit -m merge -a
git show

 combine-diff.c |  192 ++--
 1 file changed, 160 insertions(+), 32 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 35d41cd..252dd72 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -73,16 +73,24 @@ static struct combine_diff_path *intersect_paths(struct 
combine_diff_path *curr,

 /* Lines lost from parent */
 struct lline {
-   struct lline *next;
+   struct lline *next, *prev;
int len;
unsigned long parent_map;
char line[FLEX_ARRAY];
 };

+/* Lines lost from current parent (before coalescing) */
+struct plost {
+   struct lline *lost_head, *lost_tail;
+   int len;
+};
+
 /* Lines surviving in the merge result */
 struct sline {
-   struct lline *lost_head, **lost_tail;
-   struct lline *next_lost;
+   /* Accumulated and coalesced lost lines */
+   struct lline *lost;
+   int lenlost;
+   struct plost plost;
char *bol;
int len;
/* bit 0 up to (N-1) are on if the parent has this line (i.e.
@@ -94,6 +102,132 @@ struct sline {
unsigned long *p_lno;
 };

+enum coalesce_direction { MATCH, BASE, NEW };
+
+/* Coalesce new lines into base by finding LCS */
+static struct lline *coalesce_lines(struct lline *base, int *lenbase,
+   struct lline *new, int lennew,
+   unsigned long parent)
+{
+   int **lcs;
+   enum coalesce_direction **directions;
+   struct lline *baseend, *newend;
+   int i, j, origbaselen = *lenbase;
+
+   if (new == NULL)
+   return base;
+
+   if (base == NULL) {
+   *lenbase = lennew;
+   return new;
+   }
+
+   /*
+* Coalesce new lines into base by finding the LCS
+* - Create the table to run dynamic programing
+* - Compute the LCS
+* - Then reverse read the direction structure:
+*   - If we have MATCH, assign parent to base flag, and consume
+*   both baseend and newend
+*   - Else if we have BASE, consume baseend
+*   - Else if we have NEW, insert newend lline into base and
+*   consume newend
+*/
+   lcs = xcalloc(origbaselen + 1, sizeof(int*));
+   directions = xcalloc(origbaselen + 1, sizeof(enum coalesce_direction*));
+   for (i = 0; i  origbaselen + 1; i++) {
+   lcs[i] = xcalloc(lennew + 1, sizeof(int));
+   directions[i] = xcalloc(lennew + 1, sizeof(enum 
coalesce_direction));
+   directions[i][0] = BASE;
+   }
+   for (j = 1; j  lennew + 1; j++)
+   directions[0][j] = NEW;
+
+   for (i = 1, baseend = base; i  origbaselen + 1; i++) {
+   for (j = 1, newend = new; j  lennew + 1; j++) {
+   if (baseend-len == newend-len 
+   !memcmp(baseend-line, newend-line, baseend-len)) 
{
+   lcs[i][j] = lcs[i - 1][j - 1] + 1;
+   directions[i][j] = MATCH;
+   } else if (lcs[i][j - 1] = lcs[i - 1][j]) {
+   lcs[i][j] = lcs[i][j - 1];
+   directions[i][j] = NEW;
+   } else {
+   lcs[i][j] = lcs[i - 1][j];
+   directions[i][j] = BASE;
+   }
+   if (newend-next)
+   newend = newend-next;
+   }
+   if (baseend-next)
+   baseend = baseend-next;
+   }
+
+   for (i = 0; i  origbaselen + 1; i++)
+   free(lcs[i]);
+  

Re: [PATCH] combine-diff: coalesce lost lines optimally

2013-03-17 Thread Antoine Pelisse
 I'm also having a hard time trying it with more than two parents. How I
 am supposed to have more than two parents while octopus merge refuses if
 there are conflicts ?

OK, creating the merge commit myself solves the issue:

git init
test
git add test
git commit -m initial

seq 100 test
git commit -m all -a

git checkout -b side1 HEAD^1
seq 1 2 100 test
git commit -m side1 -a

git checkout -b side2 HEAD^1
seq 1 4 100 test
git commit -m side2 -a

git checkout -b side3 HEAD^1
seq 1 8 100 test
git commit -m side3 -a

git checkout -b side4 HEAD^1
seq 1 16 100 test
git commit -m side4 -a

git checkout master
test
git add test
TREE=$(git write-tree)
COMMIT=$(git commit-tree $TREE -p master -p side1 -p side2 -p side3 -p
side4 -m merge)
git show $COMMIT

This will work with the basic greedy implementation if all parents are
in this order. But the optimal result will be lost if we change the
order of -p parameters in git-commit-tree.
The patch seems to be correct, always finding the best result (we
always have 100 lines diff) whatever the order of parents.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] combine-diff: coalesce lost lines optimally

2013-03-17 Thread Junio C Hamano
Antoine Pelisse apeli...@gmail.com writes:

 This replaces the greedy implementation to coalesce lost lines by using
 dynamic programming to find the Longest Common Subsequence.

 The O(n²) time complexity is obviously bigger than previous
 implementation but it can produce shorter diff results (and most likely
 easier to read).

 List of lost lines is now doubly-linked because we reverse-read it when
 reading the direction matrix.

 Signed-off-by: Antoine Pelisse apeli...@gmail.com
 ---
 Hi,
 This is a very first draft for improving the way we coalesce lost
 lines. It has only been tested with the two scenarios below.

 What is left to do:
 - Test it more extensively
 - Had some tests scenarios

 I'm also having a hard time trying it with more than two parents. How I
 am supposed to have more than two parents while octopus merge refuses if
 there are conflicts ?

9fdb62af92c7 ([ACPI] merge 3549 4320 4485 4588 4980 5483 5651 acpica
asus fops pnpacpi branches into release, 2006-01-24) is one of the
amusing examples ;-)  Cf.

http://thread.gmane.org/gmane.comp.version-control.git/15486
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] combine-diff: coalesce lost lines optimally

2013-03-17 Thread Antoine Pelisse
Hopefully, my patch takes about the same time as git 1.7.9.5 and
produces the same output on that commit ;)
Unfortunately on a commit that would remove A LOT of lines (1)
from 7 parents, the times goes from 0.01s to 1.5s... I'm pretty sure
that scenario is quite uncommon though.

On Sun, Mar 17, 2013 at 9:10 PM, Junio C Hamano gits...@pobox.com wrote:
 Antoine Pelisse apeli...@gmail.com writes:

 This replaces the greedy implementation to coalesce lost lines by using
 dynamic programming to find the Longest Common Subsequence.

 The O(n²) time complexity is obviously bigger than previous
 implementation but it can produce shorter diff results (and most likely
 easier to read).

 List of lost lines is now doubly-linked because we reverse-read it when
 reading the direction matrix.

 Signed-off-by: Antoine Pelisse apeli...@gmail.com
 ---
 Hi,
 This is a very first draft for improving the way we coalesce lost
 lines. It has only been tested with the two scenarios below.

 What is left to do:
 - Test it more extensively
 - Had some tests scenarios

 I'm also having a hard time trying it with more than two parents. How I
 am supposed to have more than two parents while octopus merge refuses if
 there are conflicts ?

 9fdb62af92c7 ([ACPI] merge 3549 4320 4485 4588 4980 5483 5651 acpica
 asus fops pnpacpi branches into release, 2006-01-24) is one of the
 amusing examples ;-)  Cf.

 http://thread.gmane.org/gmane.comp.version-control.git/15486
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] combine-diff: coalesce lost lines optimally

2013-03-17 Thread Junio C Hamano
Antoine Pelisse apeli...@gmail.com writes:

 +/* Coalesce new lines into base by finding LCS */
 +static struct lline *coalesce_lines(struct lline *base, int *lenbase,
 + struct lline *new, int lennew,
 + unsigned long parent)
 +{

Don't you want to pass flags so that you can use match_string_spaces()
at places like this:

 + for (i = 1, baseend = base; i  origbaselen + 1; i++) {
 + for (j = 1, newend = new; j  lennew + 1; j++) {
 + if (baseend-len == newend-len 
 + !memcmp(baseend-line, newend-line, baseend-len)) 
 {

IOW, it looks to me that this wants to be rebuilt on top of
fa04ae0be8cc (Allow combined diff to ignore white-spaces,
2013-03-14).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html