Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
On Tue, Aug 28, 2012 at 04:39:11PM -0700, Junio C Hamano wrote: > > git rev-list --committer=torva...@linux-foundation.org \ > > --max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 >RL > > > > cmd=' > > while read result parent1 parent2 > > do > > $GIT merge-base $parent1 $parent2 > > done > ' > > I have this suspicion that it is spending most of its time in > insert-by-date. Luckily, merge_bases_many() is totally outside of > the usual revision traversal and its use of the commit list is > pretty well isolated. > > Peff, can I interest you into resurrecting your $gmane/174007 and > $gmane/174008 (we do not need pop_commit_from_queue() in the latter; > everything can stay as static to commit.c for now) to see how well > priority queue based approach would perform? I did a quick port of merge_bases_many and get_merge_bases_many to use priority queues, but I didn't see any speedup at all on the above test case. According to perf, we spend most of our time in zlib, inflating commits. Only 1% is spent on commit_list_insert_by_date, which is about the same amount spent on queue insertion after my patches. Patches follow, just for reference. [1/2]: basic priority queue implementation [2/2]: commit: use a priority queue in merge base functions -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
Junio C Hamano writes: > Junio C Hamano writes: > >> Junio C Hamano writes: >> >>> for (i = 0; i < cnt; i++) { >>> - if (rslt[i]) >>> + /* >>> +* Is rslt[i] an ancestor of any of the others? >>> +* then it is not interesting to us. >>> +*/ >>> + for (j = 0; j < i; j++) >>> + others[j] = rslt[j]; >>> + for (j = 1 + 1; j < cnt; j++) >> >> s/1 + 1/i + 1/; >> >> With that, all tests seem to pass ;-) > > "git merge-base" itself seems to have more room for improvement. > Trying to recompute bases for recent 200 merges in the kernel > history with the attached script does not show any improvement with > or without the series on top of recent "master". Correctnesswise it > seems to be OK, though---I get byte-for-byte identical output. > > -- >8 -- > #!/bin/sh > > git rev-list --committer=torva...@linux-foundation.org \ > --max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 >RL > > cmd=' > while read result parent1 parent2 > do > $GIT merge-base $parent1 $parent2 > done ' > > GIT="rungit master" time sh -c "$cmd" >:stock > GIT=../git.git/git time sh -c "$cmd" >:optim > cmp :stock :optim > -- 8< -- I have this suspicion that it is spending most of its time in insert-by-date. Luckily, merge_bases_many() is totally outside of the usual revision traversal and its use of the commit list is pretty well isolated. Peff, can I interest you into resurrecting your $gmane/174007 and $gmane/174008 (we do not need pop_commit_from_queue() in the latter; everything can stay as static to commit.c for now) to see how well priority queue based approach would perform? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
Junio C Hamano writes: > Junio C Hamano writes: > >> for (i = 0; i < cnt; i++) { >> -if (rslt[i]) >> +/* >> + * Is rslt[i] an ancestor of any of the others? >> + * then it is not interesting to us. >> + */ >> +for (j = 0; j < i; j++) >> +others[j] = rslt[j]; >> +for (j = 1 + 1; j < cnt; j++) > > s/1 + 1/i + 1/; > > With that, all tests seem to pass ;-) "git merge-base" itself seems to have more room for improvement. Trying to recompute bases for recent 200 merges in the kernel history with the attached script does not show any improvement with or without the series on top of recent "master". Correctnesswise it seems to be OK, though---I get byte-for-byte identical output. -- >8 -- #!/bin/sh git rev-list --committer=torva...@linux-foundation.org \ --max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 >RL cmd=' while read result parent1 parent2 do $GIT merge-base $parent1 $parent2 done :stock GIT=../git.git/git time sh -c "$cmd" >:optim cmp :stock :optim -- 8< -- -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
Junio C Hamano writes: > for (i = 0; i < cnt; i++) { > - if (rslt[i]) > + /* > + * Is rslt[i] an ancestor of any of the others? > + * then it is not interesting to us. > + */ > + for (j = 0; j < i; j++) > + others[j] = rslt[j]; > + for (j = 1 + 1; j < cnt; j++) s/1 + 1/i + 1/; With that, all tests seem to pass ;-) > + others[j - 1] = rslt[j]; > + list = merge_bases_many(rslt[i], cnt - 1, others); > + clear_commit_marks(rslt[i], all_flags); > + for (j = 0; j < cnt - 1; j++) > + clear_commit_marks(others[j], all_flags); > + while (list) { > + if (rslt[i] == list->item) > + break; > + list = list->next; > + } > + if (!list) > commit_list_insert_by_date(rslt[i], &result); > + free_commit_list(list); > } > free(rslt); > + free(others); > return result; > } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
After merge_bases_many() paints the graph in two colors to find common ancestors, get_merge_bases_many() reduces the result by excluding commits that can be reached from other commits in the result. We used to do N * (N-1) traversals for this, but we can check if one commit is reachable from any of the other (N-1) commits by a single traversal, and repeat it N times to find the answer. Signed-off-by: Junio C Hamano --- commit.c | 42 +++--- 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/commit.c b/commit.c index f5211bd..ccf2c5a 100644 --- a/commit.c +++ b/commit.c @@ -715,7 +715,7 @@ struct commit_list *get_merge_bases_many(struct commit *one, int cleanup) { struct commit_list *list; - struct commit **rslt; + struct commit **rslt, **others; struct commit_list *result; int cnt, i, j; @@ -744,33 +744,37 @@ struct commit_list *get_merge_bases_many(struct commit *one, for (list = result, i = 0; list; list = list->next) rslt[i++] = list->item; free_commit_list(result); + result = NULL; clear_commit_marks(one, all_flags); for (i = 0; i < n; i++) clear_commit_marks(twos[i], all_flags); - for (i = 0; i < cnt - 1; i++) { - for (j = i+1; j < cnt; j++) { - if (!rslt[i] || !rslt[j]) - continue; - result = merge_bases_many(rslt[i], 1, &rslt[j]); - clear_commit_marks(rslt[i], all_flags); - clear_commit_marks(rslt[j], all_flags); - for (list = result; list; list = list->next) { - if (rslt[i] == list->item) - rslt[i] = NULL; - if (rslt[j] == list->item) - rslt[j] = NULL; - } - } - } - /* Surviving ones in rslt[] are the independent results */ - result = NULL; + others = xcalloc(cnt - 1, sizeof(*others)); for (i = 0; i < cnt; i++) { - if (rslt[i]) + /* +* Is rslt[i] an ancestor of any of the others? +* then it is not interesting to us. +*/ + for (j = 0; j < i; j++) + others[j] = rslt[j]; + for (j = 1 + 1; j < cnt; j++) + others[j - 1] = rslt[j]; + list = merge_bases_many(rslt[i], cnt - 1, others); + clear_commit_marks(rslt[i], all_flags); + for (j = 0; j < cnt - 1; j++) + clear_commit_marks(others[j], all_flags); + while (list) { + if (rslt[i] == list->item) + break; + list = list->next; + } + if (!list) commit_list_insert_by_date(rslt[i], &result); + free_commit_list(list); } free(rslt); + free(others); return result; } -- 1.7.12.116.g31e0100 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html