Re: [PATCH 4/8] git-remote-mediawiki: get rid of O(N^2) loop

2012-07-16 Thread Matthieu Moy
Junio C Hamano  writes:

> Matthieu Moy  writes:
>
>> The algorithm to find a path from the local revision to the remote one
>> was calling "git rev-list" and parsing its output N times. Run rev-list
>> only once, and fill a hashtable with the result to optimize the body of
>> the loop.
>
> Good thinking.  I wonder if it would further reduce the overhead if
> you stop using --children and do this using --parents instead, as
> you will be reading the parsed_sha1..local range either way yourself
> anyway.

It is possible, yes. I'll resend a version with --parents, but this
probably doesn't change the performance much: what we really need is for
Git to prune dead-ends in the subgraph, to make sure we find a path
without having to backtrack (i.e. we need parent rewriting history
simplification), so Git has to do something a bit clever anyway.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] git-remote-mediawiki: get rid of O(N^2) loop

2012-07-16 Thread Junio C Hamano
Matthieu Moy  writes:

> The algorithm to find a path from the local revision to the remote one
> was calling "git rev-list" and parsing its output N times. Run rev-list
> only once, and fill a hashtable with the result to optimize the body of
> the loop.

Good thinking.  I wonder if it would further reduce the overhead if
you stop using --children and do this using --parents instead, as
you will be reading the parsed_sha1..local range either way yourself
anyway.

> Signed-off-by: Matthieu Moy 
> ---
>  contrib/mw-to-git/git-remote-mediawiki | 25 ++---
>  1 file changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/contrib/mw-to-git/git-remote-mediawiki 
> b/contrib/mw-to-git/git-remote-mediawiki
> index 8e46e4e..f9c0cc6 100755
> --- a/contrib/mw-to-git/git-remote-mediawiki
> +++ b/contrib/mw-to-git/git-remote-mediawiki
> @@ -1196,16 +1196,27 @@ sub mw_push_revision {
>   if ($last_local_revid > 0) {
>   my $parsed_sha1 = $remoteorigin_sha1;
>   # Find a path from last MediaWiki commit to pushed commit
> + print STDERR "Computing path from local to remote ...\n";
> + my @local_ancestry = split(/\n/, run_git("rev-list --boundary 
> --children $local ^$parsed_sha1"));
> + my %local_ancestry;
> + foreach my $line (@local_ancestry) {
> + if (my ($parent, $child) = $line =~ m/^-?([a-f0-9]+) 
> ([a-f0-9]+)/) {
> + $local_ancestry{$parent} = $child;
> + if ($parent eq $parsed_sha1 || $child eq 
> $parsed_sha1) {
> + print STDERR "$parent -> $child\n";
> + }
> + } elsif (!$line =~ m/^([a-f0-9]+)/) {
> + die "Unexpected output from git rev-list: 
> $line";
> + }
> + }
>   while ($parsed_sha1 ne $HEAD_sha1) {
> - my @commit_info =  grep(/^$parsed_sha1/, split(/\n/, 
> run_git("rev-list --children $local")));
> - if (!@commit_info) {
> + my $child = $local_ancestry{$parsed_sha1};
> + if (!$child) {
> + printf STDERR "Cannot find a path in history 
> from remote commit to last commit\n";
>   return error_non_fast_forward($remote);
>   }
> - my @commit_info_split = split(/ |\n/, $commit_info[0]);
> - # $commit_info_split[1] is the sha1 of the commit to 
> export
> - # $commit_info_split[0] is the sha1 of its direct child
> - push(@commit_pairs, \@commit_info_split);
> - $parsed_sha1 = $commit_info_split[1];
> + push(@commit_pairs, [$parsed_sha1, $child]);
> + $parsed_sha1 = $child;
>   }
>   } else {
>   # No remote mediawiki revision. Export the whole
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html