Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel

2012-08-29 Thread Jeff King
On Tue, Aug 28, 2012 at 04:39:11PM -0700, Junio C Hamano wrote:

> > git rev-list --committer=torva...@linux-foundation.org \
> > --max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 >RL
> >
> > cmd='
> > while read result parent1 parent2
> > do
> > $GIT merge-base $parent1 $parent2
> > done  > '
> 
> I have this suspicion that it is spending most of its time in
> insert-by-date.  Luckily, merge_bases_many() is totally outside of
> the usual revision traversal and its use of the commit list is
> pretty well isolated.
> 
> Peff, can I interest you into resurrecting your $gmane/174007 and
> $gmane/174008 (we do not need pop_commit_from_queue() in the latter;
> everything can stay as static to commit.c for now) to see how well
> priority queue based approach would perform?

I did a quick port of merge_bases_many and get_merge_bases_many to use
priority queues, but I didn't see any speedup at all on the above test
case.  According to perf, we spend most of our time in zlib, inflating
commits. Only 1% is spent on commit_list_insert_by_date, which is about
the same amount spent on queue insertion after my patches.

Patches follow, just for reference.

  [1/2]: basic priority queue implementation
  [2/2]: commit: use a priority queue in merge base functions

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel

2012-08-28 Thread Junio C Hamano
Junio C Hamano  writes:

> Junio C Hamano  writes:
>
>> Junio C Hamano  writes:
>>
>>> for (i = 0; i < cnt; i++) {
>>> -   if (rslt[i])
>>> +   /*
>>> +* Is rslt[i] an ancestor of any of the others?
>>> +* then it is not interesting to us.
>>> +*/
>>> +   for (j = 0; j < i; j++)
>>> +   others[j] = rslt[j];
>>> +   for (j = 1 + 1; j < cnt; j++)
>>
>> s/1 + 1/i + 1/;
>>
>> With that, all tests seem to pass ;-)
>
> "git merge-base" itself seems to have more room for improvement.
> Trying to recompute bases for recent 200 merges in the kernel
> history with the attached script does not show any improvement with
> or without the series on top of recent "master".  Correctnesswise it
> seems to be OK, though---I get byte-for-byte identical output.
>
> -- >8 --
> #!/bin/sh
>
> git rev-list --committer=torva...@linux-foundation.org \
> --max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 >RL
>
> cmd='
>   while read result parent1 parent2
>   do
>   $GIT merge-base $parent1 $parent2
>   done  '
>
> GIT="rungit master" time sh -c "$cmd" >:stock
> GIT=../git.git/git time sh -c "$cmd" >:optim
> cmp :stock :optim
> -- 8< --

I have this suspicion that it is spending most of its time in
insert-by-date.  Luckily, merge_bases_many() is totally outside of
the usual revision traversal and its use of the commit list is
pretty well isolated.

Peff, can I interest you into resurrecting your $gmane/174007 and
$gmane/174008 (we do not need pop_commit_from_queue() in the latter;
everything can stay as static to commit.c for now) to see how well
priority queue based approach would perform?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel

2012-08-28 Thread Junio C Hamano
Junio C Hamano  writes:

> Junio C Hamano  writes:
>
>>  for (i = 0; i < cnt; i++) {
>> -if (rslt[i])
>> +/*
>> + * Is rslt[i] an ancestor of any of the others?
>> + * then it is not interesting to us.
>> + */
>> +for (j = 0; j < i; j++)
>> +others[j] = rslt[j];
>> +for (j = 1 + 1; j < cnt; j++)
>
> s/1 + 1/i + 1/;
>
> With that, all tests seem to pass ;-)

"git merge-base" itself seems to have more room for improvement.
Trying to recompute bases for recent 200 merges in the kernel
history with the attached script does not show any improvement with
or without the series on top of recent "master".  Correctnesswise it
seems to be OK, though---I get byte-for-byte identical output.

-- >8 --
#!/bin/sh

git rev-list --committer=torva...@linux-foundation.org \
--max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 >RL

cmd='
while read result parent1 parent2
do
$GIT merge-base $parent1 $parent2
done :stock
GIT=../git.git/git time sh -c "$cmd" >:optim
cmp :stock :optim
-- 8< --
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel

2012-08-27 Thread Junio C Hamano
Junio C Hamano  writes:

>   for (i = 0; i < cnt; i++) {
> - if (rslt[i])
> + /*
> +  * Is rslt[i] an ancestor of any of the others?
> +  * then it is not interesting to us.
> +  */
> + for (j = 0; j < i; j++)
> + others[j] = rslt[j];
> + for (j = 1 + 1; j < cnt; j++)

s/1 + 1/i + 1/;

With that, all tests seem to pass ;-)

> + others[j - 1] = rslt[j];
> + list = merge_bases_many(rslt[i], cnt - 1, others);
> + clear_commit_marks(rslt[i], all_flags);
> + for (j = 0; j < cnt - 1; j++)
> + clear_commit_marks(others[j], all_flags);
> + while (list) {
> + if (rslt[i] == list->item)
> + break;
> + list = list->next;
> + }
> + if (!list)
>   commit_list_insert_by_date(rslt[i], &result);
> + free_commit_list(list);
>   }
>   free(rslt);
> + free(others);
>   return result;
>  }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel

2012-08-27 Thread Junio C Hamano
After merge_bases_many() paints the graph in two colors to find
common ancestors, get_merge_bases_many() reduces the result by
excluding commits that can be reached from other commits in the
result.  We used to do N * (N-1) traversals for this, but we can
check if one commit is reachable from any of the other (N-1) commits
by a single traversal, and repeat it N times to find the answer.

Signed-off-by: Junio C Hamano 
---
 commit.c | 42 +++---
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/commit.c b/commit.c
index f5211bd..ccf2c5a 100644
--- a/commit.c
+++ b/commit.c
@@ -715,7 +715,7 @@ struct commit_list *get_merge_bases_many(struct commit *one,
 int cleanup)
 {
struct commit_list *list;
-   struct commit **rslt;
+   struct commit **rslt, **others;
struct commit_list *result;
int cnt, i, j;
 
@@ -744,33 +744,37 @@ struct commit_list *get_merge_bases_many(struct commit 
*one,
for (list = result, i = 0; list; list = list->next)
rslt[i++] = list->item;
free_commit_list(result);
+   result = NULL;
 
clear_commit_marks(one, all_flags);
for (i = 0; i < n; i++)
clear_commit_marks(twos[i], all_flags);
-   for (i = 0; i < cnt - 1; i++) {
-   for (j = i+1; j < cnt; j++) {
-   if (!rslt[i] || !rslt[j])
-   continue;
-   result = merge_bases_many(rslt[i], 1, &rslt[j]);
-   clear_commit_marks(rslt[i], all_flags);
-   clear_commit_marks(rslt[j], all_flags);
-   for (list = result; list; list = list->next) {
-   if (rslt[i] == list->item)
-   rslt[i] = NULL;
-   if (rslt[j] == list->item)
-   rslt[j] = NULL;
-   }
-   }
-   }
 
-   /* Surviving ones in rslt[] are the independent results */
-   result = NULL;
+   others = xcalloc(cnt - 1, sizeof(*others));
for (i = 0; i < cnt; i++) {
-   if (rslt[i])
+   /*
+* Is rslt[i] an ancestor of any of the others?
+* then it is not interesting to us.
+*/
+   for (j = 0; j < i; j++)
+   others[j] = rslt[j];
+   for (j = 1 + 1; j < cnt; j++)
+   others[j - 1] = rslt[j];
+   list = merge_bases_many(rslt[i], cnt - 1, others);
+   clear_commit_marks(rslt[i], all_flags);
+   for (j = 0; j < cnt - 1; j++)
+   clear_commit_marks(others[j], all_flags);
+   while (list) {
+   if (rslt[i] == list->item)
+   break;
+   list = list->next;
+   }
+   if (!list)
commit_list_insert_by_date(rslt[i], &result);
+   free_commit_list(list);
}
free(rslt);
+   free(others);
return result;
 }
 
-- 
1.7.12.116.g31e0100

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html