[PATCH] combine-diff: coalesce lost lines optimally
This replaces the greedy implementation to coalesce lost lines by using dynamic programming to find the Longest Common Subsequence. The O(n²) time complexity is obviously bigger than previous implementation but it can produce shorter diff results (and most likely easier to read). List of lost lines is now doubly-linked because we reverse-read it when reading the direction matrix. Signed-off-by: Antoine Pelisse apeli...@gmail.com --- Hi, This is a very first draft for improving the way we coalesce lost lines. It has only been tested with the two scenarios below. What is left to do: - Test it more extensively - Had some tests scenarios I'm also having a hard time trying it with more than two parents. How I am supposed to have more than two parents while octopus merge refuses if there are conflicts ? Tested scenarios: git init test git add test git commit -m initial git checkout -b side1 seq 10 test git commit -m all -a git checkout master seq 1 2 10 test git commit -m three -a git merge side1 test git commit -m merge -a git show AND git init test git add test git commit -m initial echo 3\n1\n2\n4 test git commit -m shuffled -a git checkout -b side HEAD^ echo 1\n2\n3\n4 test git commit -m sorted -a git merge master test git commit -m merge -a git show combine-diff.c | 192 ++-- 1 file changed, 160 insertions(+), 32 deletions(-) diff --git a/combine-diff.c b/combine-diff.c index 35d41cd..252dd72 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -73,16 +73,24 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, /* Lines lost from parent */ struct lline { - struct lline *next; + struct lline *next, *prev; int len; unsigned long parent_map; char line[FLEX_ARRAY]; }; +/* Lines lost from current parent (before coalescing) */ +struct plost { + struct lline *lost_head, *lost_tail; + int len; +}; + /* Lines surviving in the merge result */ struct sline { - struct lline *lost_head, **lost_tail; - struct lline *next_lost; + /* Accumulated and coalesced lost lines */ + struct lline *lost; + int lenlost; + struct plost plost; char *bol; int len; /* bit 0 up to (N-1) are on if the parent has this line (i.e. @@ -94,6 +102,132 @@ struct sline { unsigned long *p_lno; }; +enum coalesce_direction { MATCH, BASE, NEW }; + +/* Coalesce new lines into base by finding LCS */ +static struct lline *coalesce_lines(struct lline *base, int *lenbase, + struct lline *new, int lennew, + unsigned long parent) +{ + int **lcs; + enum coalesce_direction **directions; + struct lline *baseend, *newend; + int i, j, origbaselen = *lenbase; + + if (new == NULL) + return base; + + if (base == NULL) { + *lenbase = lennew; + return new; + } + + /* +* Coalesce new lines into base by finding the LCS +* - Create the table to run dynamic programing +* - Compute the LCS +* - Then reverse read the direction structure: +* - If we have MATCH, assign parent to base flag, and consume +* both baseend and newend +* - Else if we have BASE, consume baseend +* - Else if we have NEW, insert newend lline into base and +* consume newend +*/ + lcs = xcalloc(origbaselen + 1, sizeof(int*)); + directions = xcalloc(origbaselen + 1, sizeof(enum coalesce_direction*)); + for (i = 0; i origbaselen + 1; i++) { + lcs[i] = xcalloc(lennew + 1, sizeof(int)); + directions[i] = xcalloc(lennew + 1, sizeof(enum coalesce_direction)); + directions[i][0] = BASE; + } + for (j = 1; j lennew + 1; j++) + directions[0][j] = NEW; + + for (i = 1, baseend = base; i origbaselen + 1; i++) { + for (j = 1, newend = new; j lennew + 1; j++) { + if (baseend-len == newend-len + !memcmp(baseend-line, newend-line, baseend-len)) { + lcs[i][j] = lcs[i - 1][j - 1] + 1; + directions[i][j] = MATCH; + } else if (lcs[i][j - 1] = lcs[i - 1][j]) { + lcs[i][j] = lcs[i][j - 1]; + directions[i][j] = NEW; + } else { + lcs[i][j] = lcs[i - 1][j]; + directions[i][j] = BASE; + } + if (newend-next) + newend = newend-next; + } + if (baseend-next) + baseend = baseend-next; + } + + for (i = 0; i origbaselen + 1; i++) + free(lcs[i]); +
Re: [PATCH] combine-diff: coalesce lost lines optimally
I'm also having a hard time trying it with more than two parents. How I am supposed to have more than two parents while octopus merge refuses if there are conflicts ? OK, creating the merge commit myself solves the issue: git init test git add test git commit -m initial seq 100 test git commit -m all -a git checkout -b side1 HEAD^1 seq 1 2 100 test git commit -m side1 -a git checkout -b side2 HEAD^1 seq 1 4 100 test git commit -m side2 -a git checkout -b side3 HEAD^1 seq 1 8 100 test git commit -m side3 -a git checkout -b side4 HEAD^1 seq 1 16 100 test git commit -m side4 -a git checkout master test git add test TREE=$(git write-tree) COMMIT=$(git commit-tree $TREE -p master -p side1 -p side2 -p side3 -p side4 -m merge) git show $COMMIT This will work with the basic greedy implementation if all parents are in this order. But the optimal result will be lost if we change the order of -p parameters in git-commit-tree. The patch seems to be correct, always finding the best result (we always have 100 lines diff) whatever the order of parents. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] combine-diff: coalesce lost lines optimally
Antoine Pelisse apeli...@gmail.com writes: This replaces the greedy implementation to coalesce lost lines by using dynamic programming to find the Longest Common Subsequence. The O(n²) time complexity is obviously bigger than previous implementation but it can produce shorter diff results (and most likely easier to read). List of lost lines is now doubly-linked because we reverse-read it when reading the direction matrix. Signed-off-by: Antoine Pelisse apeli...@gmail.com --- Hi, This is a very first draft for improving the way we coalesce lost lines. It has only been tested with the two scenarios below. What is left to do: - Test it more extensively - Had some tests scenarios I'm also having a hard time trying it with more than two parents. How I am supposed to have more than two parents while octopus merge refuses if there are conflicts ? 9fdb62af92c7 ([ACPI] merge 3549 4320 4485 4588 4980 5483 5651 acpica asus fops pnpacpi branches into release, 2006-01-24) is one of the amusing examples ;-) Cf. http://thread.gmane.org/gmane.comp.version-control.git/15486 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] combine-diff: coalesce lost lines optimally
Hopefully, my patch takes about the same time as git 1.7.9.5 and produces the same output on that commit ;) Unfortunately on a commit that would remove A LOT of lines (1) from 7 parents, the times goes from 0.01s to 1.5s... I'm pretty sure that scenario is quite uncommon though. On Sun, Mar 17, 2013 at 9:10 PM, Junio C Hamano gits...@pobox.com wrote: Antoine Pelisse apeli...@gmail.com writes: This replaces the greedy implementation to coalesce lost lines by using dynamic programming to find the Longest Common Subsequence. The O(n²) time complexity is obviously bigger than previous implementation but it can produce shorter diff results (and most likely easier to read). List of lost lines is now doubly-linked because we reverse-read it when reading the direction matrix. Signed-off-by: Antoine Pelisse apeli...@gmail.com --- Hi, This is a very first draft for improving the way we coalesce lost lines. It has only been tested with the two scenarios below. What is left to do: - Test it more extensively - Had some tests scenarios I'm also having a hard time trying it with more than two parents. How I am supposed to have more than two parents while octopus merge refuses if there are conflicts ? 9fdb62af92c7 ([ACPI] merge 3549 4320 4485 4588 4980 5483 5651 acpica asus fops pnpacpi branches into release, 2006-01-24) is one of the amusing examples ;-) Cf. http://thread.gmane.org/gmane.comp.version-control.git/15486 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] combine-diff: coalesce lost lines optimally
Antoine Pelisse apeli...@gmail.com writes: +/* Coalesce new lines into base by finding LCS */ +static struct lline *coalesce_lines(struct lline *base, int *lenbase, + struct lline *new, int lennew, + unsigned long parent) +{ Don't you want to pass flags so that you can use match_string_spaces() at places like this: + for (i = 1, baseend = base; i origbaselen + 1; i++) { + for (j = 1, newend = new; j lennew + 1; j++) { + if (baseend-len == newend-len + !memcmp(baseend-line, newend-line, baseend-len)) { IOW, it looks to me that this wants to be rebuilt on top of fa04ae0be8cc (Allow combined diff to ignore white-spaces, 2013-03-14). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html