Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-07 Thread Derrick Stolee

On 9/7/2018 1:15 PM, Jeff King wrote:

On Fri, Sep 07, 2018 at 05:23:31PM +0200, Ævar Arnfjörð Bjarmason wrote:


Hrm, no. I spoke too soon because I was conflating "commit-graph write"
v.s. "gc". For "gc" we're now with this change just e.g. spending 6
seconds on 2015-04-03-1M-git displaying nothing, because we're looping
through the commits and finding that we have no new work.

So I'm on the fence about this, but leaning towards just taking my
initial approch. I.e. it sucks if you're e.g. testing different "git gc"
options that we're churning in the background doing nothing, just
because we're trying to report how many *new* things we added to the
graph.

After all, the main point IMNSHO is not to show some diagnostic output
of exactly how much work we're doing, that I have 200 new commits with
generation numbers or whatever is just useless trivia, but rather to not
leave the user thinking the command is hanging.

I think there's some precedent for your view of things, too. For
example, "writing objects" counts _all_ of the objects, even though many
of them are just copying bytes straight from disk, and some are actually
generating a delta and/or zlib-deflating content.

So it's not the most precise measurement we could give, but it shows
there's activity, and the "average" movement over many objects tends to
be reasonably smooth.


So I think I'll just do what I was doing to begin with and change the
message to "Refreshing commit graph generation numbers" or something to
indicate that it's a find/verify/compute operation, not just a compute
operation.

So basically yes, I agree with this. :)


Same here. Thanks for the discussion.

-Stolee



Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-07 Thread Jeff King
On Fri, Sep 07, 2018 at 05:23:31PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Hrm, no. I spoke too soon because I was conflating "commit-graph write"
> v.s. "gc". For "gc" we're now with this change just e.g. spending 6
> seconds on 2015-04-03-1M-git displaying nothing, because we're looping
> through the commits and finding that we have no new work.
> 
> So I'm on the fence about this, but leaning towards just taking my
> initial approch. I.e. it sucks if you're e.g. testing different "git gc"
> options that we're churning in the background doing nothing, just
> because we're trying to report how many *new* things we added to the
> graph.
> 
> After all, the main point IMNSHO is not to show some diagnostic output
> of exactly how much work we're doing, that I have 200 new commits with
> generation numbers or whatever is just useless trivia, but rather to not
> leave the user thinking the command is hanging.

I think there's some precedent for your view of things, too. For
example, "writing objects" counts _all_ of the objects, even though many
of them are just copying bytes straight from disk, and some are actually
generating a delta and/or zlib-deflating content.

So it's not the most precise measurement we could give, but it shows
there's activity, and the "average" movement over many objects tends to
be reasonably smooth.

> So I think I'll just do what I was doing to begin with and change the
> message to "Refreshing commit graph generation numbers" or something to
> indicate that it's a find/verify/compute operation, not just a compute
> operation.

So basically yes, I agree with this. :)

-Peff


Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-07 Thread Ævar Arnfjörð Bjarmason


On Fri, Sep 07 2018, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Sep 05 2018, Derrick Stolee wrote:
>
>> On 9/4/2018 6:07 PM, Junio C Hamano wrote:
>>> Ævar Arnfjörð Bjarmason   writes:
>>>
 With --stdin-packs we don't show any estimation of how much is left to
 do. This is because we might be processing more than one pack. We
 could be less lazy here and show progress, either detect by detecting
 that we're only processing one pack, or by first looping over the
 packs to discover how many commits they have. I don't see the point in
>>> I do not know if there is no point, but if we were to do it, I think
>>> slurping the list of packs and computing the number of objects is
>>> not all that bad.
>>
>> If you want to do that, I have nothing against it. However, I don't
>> expect users to use that option directly. That option is used by VFS
>> for Git to compute the commit-graph in the background after receiving
>> a pack of commits and trees, but not by 'git gc' which I expect is how
>> most users will compute commit-graphs.
>>
   static void compute_generation_numbers(struct packed_commit_list* 
 commits)
   {
int i;
struct commit_list *list = NULL;
 +  struct progress *progress = NULL;
   +progress = start_progress(
 +  _("Computing commit graph generation numbers"), commits->nr);
for (i = 0; i < commits->nr; i++) {
 +  display_progress(progress, i);
if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY 
 &&
commits->list[i]->generation != GENERATION_NUMBER_ZERO)
continue;
>>> I am wondering if the progress call should be moved after this
>>> conditional continue; would we want to count the entry whose
>>> generation is already known here?  Of course, as we give commits->nr
>>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>>> wrong.
>>
>> If we wanted to be completely right, we would count the commits in the
>> list that do not have a generation number and report that as the 100%
>> ceiling.
>>
>> Something like the diff below would work. I tested it in Linux by
>> first deleting my commit-graph and running the following:
>>
>> stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
>> stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git
>> commit-graph write --stdin-commits
>> Annotating commits in commit graph: 1180333, done.
>> Computing commit graph generation numbers: 100% (590166/590166), done.
>> stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
>> Annotating commits in commit graph: 1564087, done.
>> Computing commit graph generation numbers: 100% (191590/191590), done.
>>
>> -->8--
>>
>> From: Derrick Stolee 
>> Date: Wed, 5 Sep 2018 11:55:42 +
>> Subject: [PATCH] fixup! commit-graph write: add progress output
>>
>> Signed-off-by: Derrick Stolee 
>> ---
>> commit-graph.c | 15 +++
>> 1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 1a02fe019a..b933bc9f00 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -634,14 +634,20 @@ static void close_reachable(struct
>> packed_oid_list *oids)
>>
>> static void compute_generation_numbers(struct packed_commit_list* commits)
>> {
>> - int i;
>> + int i, count_uncomputed = 0;
>>  struct commit_list *list = NULL;
>>  struct progress *progress = NULL;
>>
>> + for (i = 0; i < commits->nr; i++)
>> + if (commits->list[i]->generation ==
>> GENERATION_NUMBER_INFINITY ||
>> + commits->list[i]->generation == GENERATION_NUMBER_ZERO)
>> + count_uncomputed++;
>> +
>>  progress = start_progress(
>> - _("Computing commit graph generation numbers"),
>> commits->nr);
>> + _("Computing commit graph generation numbers"),
>> count_uncomputed);
>> + count_uncomputed = 0;
>> +
>>  for (i = 0; i < commits->nr; i++) {
>> - display_progress(progress, i);
>>  if (commits->list[i]->generation !=
>> GENERATION_NUMBER_INFINITY &&
>>  commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>  continue;
>> @@ -670,10 +676,11 @@ static void compute_generation_numbers(struct
>> packed_commit_list* commits)
>>
>>  if (current->generation >
>> GENERATION_NUMBER_MAX)
>>  current->generation =
>> GENERATION_NUMBER_MAX;
>> +
>> + display_progress(progress,
>> ++count_uncomputed);
>>  }
>>  }
>>  }
>> - display_progress(progress, i);
>>  stop_progress();
>> }
>
> One of the things I was trying to do with this series was to make sure
> that whenever we run "git gc" there's always some indication that if you
> set gc.writeCommitGraph=true that it's actualy doing work.
>
> This modifies that, which I think is actually fine, just something I
> wanted to note. I.e. if you run "git commit-graph write" twice in a row,
> the second time will have no output.
>
> Unless that is, your repo is big enough that some of the delayed timers
> kick in. So e.g. on git.git we get no output the second time 

Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-07 Thread Ævar Arnfjörð Bjarmason


On Wed, Sep 05 2018, Derrick Stolee wrote:

> On 9/4/2018 6:07 PM, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason   writes:
>>
>>> With --stdin-packs we don't show any estimation of how much is left to
>>> do. This is because we might be processing more than one pack. We
>>> could be less lazy here and show progress, either detect by detecting
>>> that we're only processing one pack, or by first looping over the
>>> packs to discover how many commits they have. I don't see the point in
>> I do not know if there is no point, but if we were to do it, I think
>> slurping the list of packs and computing the number of objects is
>> not all that bad.
>
> If you want to do that, I have nothing against it. However, I don't
> expect users to use that option directly. That option is used by VFS
> for Git to compute the commit-graph in the background after receiving
> a pack of commits and trees, but not by 'git gc' which I expect is how
> most users will compute commit-graphs.
>
>>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>>   {
>>> int i;
>>> struct commit_list *list = NULL;
>>> +   struct progress *progress = NULL;
>>>   + progress = start_progress(
>>> +   _("Computing commit graph generation numbers"), commits->nr);
>>> for (i = 0; i < commits->nr; i++) {
>>> +   display_progress(progress, i);
>>> if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY 
>>> &&
>>> commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>> continue;
>> I am wondering if the progress call should be moved after this
>> conditional continue; would we want to count the entry whose
>> generation is already known here?  Of course, as we give commits->nr
>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>> wrong.
>
> If we wanted to be completely right, we would count the commits in the
> list that do not have a generation number and report that as the 100%
> ceiling.
>
> Something like the diff below would work. I tested it in Linux by
> first deleting my commit-graph and running the following:
>
> stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
> stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git
> commit-graph write --stdin-commits
> Annotating commits in commit graph: 1180333, done.
> Computing commit graph generation numbers: 100% (590166/590166), done.
> stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
> Annotating commits in commit graph: 1564087, done.
> Computing commit graph generation numbers: 100% (191590/191590), done.
>
> -->8--
>
> From: Derrick Stolee 
> Date: Wed, 5 Sep 2018 11:55:42 +
> Subject: [PATCH] fixup! commit-graph write: add progress output
>
> Signed-off-by: Derrick Stolee 
> ---
> commit-graph.c | 15 +++
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 1a02fe019a..b933bc9f00 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -634,14 +634,20 @@ static void close_reachable(struct
> packed_oid_list *oids)
>
> static void compute_generation_numbers(struct packed_commit_list* commits)
> {
> - int i;
> + int i, count_uncomputed = 0;
>  struct commit_list *list = NULL;
>  struct progress *progress = NULL;
>
> + for (i = 0; i < commits->nr; i++)
> + if (commits->list[i]->generation ==
> GENERATION_NUMBER_INFINITY ||
> + commits->list[i]->generation == GENERATION_NUMBER_ZERO)
> + count_uncomputed++;
> +
>  progress = start_progress(
> - _("Computing commit graph generation numbers"),
> commits->nr);
> + _("Computing commit graph generation numbers"),
> count_uncomputed);
> + count_uncomputed = 0;
> +
>  for (i = 0; i < commits->nr; i++) {
> - display_progress(progress, i);
>  if (commits->list[i]->generation !=
> GENERATION_NUMBER_INFINITY &&
>  commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>  continue;
> @@ -670,10 +676,11 @@ static void compute_generation_numbers(struct
> packed_commit_list* commits)
>
>  if (current->generation >
> GENERATION_NUMBER_MAX)
>  current->generation =
> GENERATION_NUMBER_MAX;
> +
> + display_progress(progress,
> ++count_uncomputed);
>  }
>  }
>  }
> - display_progress(progress, i);
>  stop_progress();
> }

One of the things I was trying to do with this series was to make sure
that whenever we run "git gc" there's always some indication that if you
set gc.writeCommitGraph=true that it's actualy doing work.

This modifies that, which I think is actually fine, just something I
wanted to note. I.e. if you run "git commit-graph write" twice in a row,
the second time will have no output.

Unless that is, your repo is big enough that some of the delayed timers
kick in. So e.g. on git.git we get no output the second time around, but
do get output the first time around, and on linux.git we always get
output.

But in the common case people aren't running this in a loop, and it's
useful to see how many new things are 

Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-07 Thread Derrick Stolee

On 9/7/2018 8:40 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Sep 04 2018, Ævar Arnfjörð Bjarmason wrote:


Before this change the "commit-graph write" command didn't report any
progress. On my machine this command takes more than 10 seconds to
write the graph for linux.git, and around 1m30s on the
2015-04-03-1M-git.git[1] test repository, which is a test case for
larger monorepos.

There's a fun issue with this code that I'll fix, but thought was
informative to send a mail about.

Because the graph verification happens in the main "git gc" process, as
opposed to everything else via external commands, so all this progress
output gets written to .git/gc.log.

Then next time we do a "gc --auto" we vomit out a couple of KB of
progress bar output at the user, since spot that the gc.log isn't empty.
Good catch! (I do want to clarify that the graph _writing_ happens 
during 'git gc' since 'git commit-graph verify' is a different thing.)


Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-07 Thread Ævar Arnfjörð Bjarmason


On Tue, Sep 04 2018, Ævar Arnfjörð Bjarmason wrote:

> Before this change the "commit-graph write" command didn't report any
> progress. On my machine this command takes more than 10 seconds to
> write the graph for linux.git, and around 1m30s on the
> 2015-04-03-1M-git.git[1] test repository, which is a test case for
> larger monorepos.

There's a fun issue with this code that I'll fix, but thought was
informative to send a mail about.

Because the graph verification happens in the main "git gc" process, as
opposed to everything else via external commands, so all this progress
output gets written to .git/gc.log.

Then next time we do a "gc --auto" we vomit out a couple of KB of
progress bar output at the user, since spot that the gc.log isn't empty.


Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-05 Thread Derrick Stolee

On 9/5/2018 5:46 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


for (i = 0; i < commits->nr; i++) {
+   display_progress(progress, i);
if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY 
&&
commits->list[i]->generation != GENERATION_NUMBER_ZERO)
continue;

I am wondering if the progress call should be moved after this
conditional continue; would we want to count the entry whose
generation is already known here?  Of course, as we give commits->nr
as the 100% ceiling, we cannot avoid doing so, but it somehow smells
wrong.

If we wanted to be completely right, we would count the commits in the
list that do not have a generation number and report that as the 100%
ceiling.

Yeah, but I realize that the definition of "right" really depends on
what we consider a task being accomplished is in this loop.  If we
define the task to "we have some number of commits that lack
generation numbers and our task is to assign numbers to them.", then
yes counting the ones without generation number and culling the ones
that already have generation number is outside the work and we need
another loop to count them.  But the position the posted patch takes
is also a valid one: we have some commits and we are making sure
each and every one of them has assigned a generation number.

So I do not think it is necessary to introduce another loop just for
counting.

Thanks.

Makes sense to me. Thanks!


Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-05 Thread Junio C Hamano
Derrick Stolee  writes:

>>> for (i = 0; i < commits->nr; i++) {
>>> +   display_progress(progress, i);
>>> if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY 
>>> &&
>>> commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>> continue;
>> I am wondering if the progress call should be moved after this
>> conditional continue; would we want to count the entry whose
>> generation is already known here?  Of course, as we give commits->nr
>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>> wrong.
>
> If we wanted to be completely right, we would count the commits in the
> list that do not have a generation number and report that as the 100%
> ceiling.

Yeah, but I realize that the definition of "right" really depends on
what we consider a task being accomplished is in this loop.  If we
define the task to "we have some number of commits that lack
generation numbers and our task is to assign numbers to them.", then
yes counting the ones without generation number and culling the ones
that already have generation number is outside the work and we need
another loop to count them.  But the position the posted patch takes
is also a valid one: we have some commits and we are making sure
each and every one of them has assigned a generation number.

So I do not think it is necessary to introduce another loop just for
counting.

Thanks.





Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-05 Thread Ævar Arnfjörð Bjarmason


On Wed, Sep 05 2018, Derrick Stolee wrote:

> On 9/4/2018 6:07 PM, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason   writes:
>>
>>> With --stdin-packs we don't show any estimation of how much is left to
>>> do. This is because we might be processing more than one pack. We
>>> could be less lazy here and show progress, either detect by detecting
>>> that we're only processing one pack, or by first looping over the
>>> packs to discover how many commits they have. I don't see the point in
>> I do not know if there is no point, but if we were to do it, I think
>> slurping the list of packs and computing the number of objects is
>> not all that bad.
>
> If you want to do that, I have nothing against it. However, I don't
> expect users to use that option directly. That option is used by VFS
> for Git to compute the commit-graph in the background after receiving
> a pack of commits and trees, but not by 'git gc' which I expect is how
> most users will compute commit-graphs.

Yeah, I suspected only one guy at Microsoft would potentially benefit
from this, but added it just so we'd have progress regardless of entry
point :)

>>>   static void compute_generation_numbers(struct packed_commit_list* commits)
>>>   {
>>> int i;
>>> struct commit_list *list = NULL;
>>> +   struct progress *progress = NULL;
>>>   + progress = start_progress(
>>> +   _("Computing commit graph generation numbers"), commits->nr);
>>> for (i = 0; i < commits->nr; i++) {
>>> +   display_progress(progress, i);
>>> if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY 
>>> &&
>>> commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>>> continue;
>> I am wondering if the progress call should be moved after this
>> conditional continue; would we want to count the entry whose
>> generation is already known here?  Of course, as we give commits->nr
>> as the 100% ceiling, we cannot avoid doing so, but it somehow smells
>> wrong.
>
> If we wanted to be completely right, we would count the commits in the
> list that do not have a generation number and report that as the 100%
> ceiling.
>
> Something like the diff below would work. I tested it in Linux by
> first deleting my commit-graph and running the following:
>
> stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
> stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git
> commit-graph write --stdin-commits
> Annotating commits in commit graph: 1180333, done.
> Computing commit graph generation numbers: 100% (590166/590166), done.
> stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
> Annotating commits in commit graph: 1564087, done.
> Computing commit graph generation numbers: 100% (191590/191590), done.
>
> -->8--
>
> From: Derrick Stolee 
> Date: Wed, 5 Sep 2018 11:55:42 +
> Subject: [PATCH] fixup! commit-graph write: add progress output
>
> Signed-off-by: Derrick Stolee 
> ---
> commit-graph.c | 15 +++
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 1a02fe019a..b933bc9f00 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -634,14 +634,20 @@ static void close_reachable(struct
> packed_oid_list *oids)
>
> static void compute_generation_numbers(struct packed_commit_list* commits)
> {
> - int i;
> + int i, count_uncomputed = 0;
>  struct commit_list *list = NULL;
>  struct progress *progress = NULL;
>
> + for (i = 0; i < commits->nr; i++)
> + if (commits->list[i]->generation ==
> GENERATION_NUMBER_INFINITY ||
> + commits->list[i]->generation == GENERATION_NUMBER_ZERO)
> + count_uncomputed++;
> +
>  progress = start_progress(
> - _("Computing commit graph generation numbers"),
> commits->nr);
> + _("Computing commit graph generation numbers"),
> count_uncomputed);
> + count_uncomputed = 0;
> +
>  for (i = 0; i < commits->nr; i++) {
> - display_progress(progress, i);
>  if (commits->list[i]->generation !=
> GENERATION_NUMBER_INFINITY &&
>  commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>  continue;
> @@ -670,10 +676,11 @@ static void compute_generation_numbers(struct
> packed_commit_list* commits)
>
>  if (current->generation >
> GENERATION_NUMBER_MAX)
>  current->generation =
> GENERATION_NUMBER_MAX;
> +
> + display_progress(progress,
> ++count_uncomputed);
>  }
>  }
>  }
> - display_progress(progress, i);
>  stop_progress();
> }

Thanks! That looks good, and you obviously know this code a lot
better. I'll squash this into v2 pending further feedback I'll need to
address.


Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-05 Thread Derrick Stolee

On 9/4/2018 4:27 PM, Ævar Arnfjörð Bjarmason wrote:

@@ -591,8 +597,13 @@ static void close_reachable(struct packed_oid_list *oids)
  {
int i;
struct commit *commit;
+   struct progress *progress = NULL;
+   int j = 0;


The change below over-counts the number of commits we are processing (by 
at least double, possibly triple).




+   progress = start_delayed_progress(
+   _("Annotating commits in commit graph"), 0);
for (i = 0; i < oids->nr; i++) {
+   display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
if (commit)
commit->object.flags |= UNINTERESTING;
This count is the number of oids given to the method. For 'git 
commit-graph write --reachable', this will be the number of refs.

@@ -604,6 +615,7 @@ static void close_reachable(struct packed_oid_list *oids)
 * closure.
 */
for (i = 0; i < oids->nr; i++) {
+   display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
  
  		if (commit && !parse_commit(commit))


This is the important count, since we will be parsing commits and adding 
their parents to the list. The bulk of the work happens here.



@@ -611,19 +623,25 @@ static void close_reachable(struct packed_oid_list *oids)
}
  
  	for (i = 0; i < oids->nr; i++) {

+   display_progress(progress, ++j);
commit = lookup_commit(the_repository, >list[i]);
This iterates through the commits a second time and removes the 
UNINTERESTING flag.
  
  		if (commit)

commit->object.flags &= ~UNINTERESTING;
}
+   stop_progress();
  }


I think it is good to have the progress start before the first loop and 
end after the third loop, but the middle loop has the important count.


I tried deleting the first and third display_progress() methods and 
re-ran the process on the Linux repo and did not notice a delay at the 
0% and 100% progress spots. The count matches the number of commits.


Thanks,

-Stolee



Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-05 Thread Derrick Stolee

On 9/4/2018 6:07 PM, Junio C Hamano wrote:

Ævar Arnfjörð Bjarmason   writes:


With --stdin-packs we don't show any estimation of how much is left to
do. This is because we might be processing more than one pack. We
could be less lazy here and show progress, either detect by detecting
that we're only processing one pack, or by first looping over the
packs to discover how many commits they have. I don't see the point in

I do not know if there is no point, but if we were to do it, I think
slurping the list of packs and computing the number of objects is
not all that bad.


If you want to do that, I have nothing against it. However, I don't 
expect users to use that option directly. That option is used by VFS for 
Git to compute the commit-graph in the background after receiving a pack 
of commits and trees, but not by 'git gc' which I expect is how most 
users will compute commit-graphs.



  static void compute_generation_numbers(struct packed_commit_list* commits)
  {
int i;
struct commit_list *list = NULL;
+   struct progress *progress = NULL;
  
+	progress = start_progress(

+   _("Computing commit graph generation numbers"), commits->nr);
for (i = 0; i < commits->nr; i++) {
+   display_progress(progress, i);
if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY 
&&
commits->list[i]->generation != GENERATION_NUMBER_ZERO)
continue;

I am wondering if the progress call should be moved after this
conditional continue; would we want to count the entry whose
generation is already known here?  Of course, as we give commits->nr
as the 100% ceiling, we cannot avoid doing so, but it somehow smells
wrong.


If we wanted to be completely right, we would count the commits in the 
list that do not have a generation number and report that as the 100% 
ceiling.


Something like the diff below would work. I tested it in Linux by first 
deleting my commit-graph and running the following:


stolee@stolee-linux:~/linux$ rm .git/objects/info/commit-graph
stolee@stolee-linux:~/linux$ git rev-parse v4.6 | ~/git/git commit-graph 
write --stdin-commits

Annotating commits in commit graph: 1180333, done.
Computing commit graph generation numbers: 100% (590166/590166), done.
stolee@stolee-linux:~/linux$ ~/git/git commit-graph write --reachable
Annotating commits in commit graph: 1564087, done.
Computing commit graph generation numbers: 100% (191590/191590), done.

-->8--

From: Derrick Stolee 
Date: Wed, 5 Sep 2018 11:55:42 +
Subject: [PATCH] fixup! commit-graph write: add progress output

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 1a02fe019a..b933bc9f00 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -634,14 +634,20 @@ static void close_reachable(struct packed_oid_list 
*oids)


 static void compute_generation_numbers(struct packed_commit_list* commits)
 {
-   int i;
+   int i, count_uncomputed = 0;
    struct commit_list *list = NULL;
    struct progress *progress = NULL;

+   for (i = 0; i < commits->nr; i++)
+   if (commits->list[i]->generation == 
GENERATION_NUMBER_INFINITY ||

+   commits->list[i]->generation == GENERATION_NUMBER_ZERO)
+   count_uncomputed++;
+
    progress = start_progress(
-   _("Computing commit graph generation numbers"), 
commits->nr);
+   _("Computing commit graph generation numbers"), 
count_uncomputed);

+   count_uncomputed = 0;
+
    for (i = 0; i < commits->nr; i++) {
-   display_progress(progress, i);
    if (commits->list[i]->generation != 
GENERATION_NUMBER_INFINITY &&

    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
    continue;
@@ -670,10 +676,11 @@ static void compute_generation_numbers(struct 
packed_commit_list* commits)


    if (current->generation > 
GENERATION_NUMBER_MAX)
    current->generation = 
GENERATION_NUMBER_MAX;

+
+   display_progress(progress, 
++count_uncomputed);

    }
    }
    }
-   display_progress(progress, i);
    stop_progress();
 }

--
2.19.0.rc2



Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-04 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Before this change the "commit-graph write" command didn't report any

Please describe the pre-patch state in present tense without "Before
this change".

> progress. On my machine this command takes more than 10 seconds to
> write the graph for linux.git, and around 1m30s on the
> 2015-04-03-1M-git.git[1] test repository, which is a test case for
> larger monorepos.
>
> Furthermore, since the gc.writeCommitGraph setting was added in
> d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27),
> there was no indication at all from a "git gc" run that anything was
> different. This why one of the progress bars being added here uses

"This is why", I guess.

> start_progress() instead of start_delayed_progress(), so that it's
> guaranteed to be seen. E.g. on my tiny 867 commit dotfiles.git
> repository:
>
> $ git -c gc.writeCommitGraph=true gc
> Enumerating objects: 2821, done.
> [...]
> Total 2821 (delta 1670), reused 2821 (delta 1670)
> Computing commit graph generation numbers: 100% (867/867), done.
>
> On larger repositories, such as linux.git the delayed progress bar(s)

"such as linux.git, the delayed ..."

> With --stdin-packs we don't show any estimation of how much is left to
> do. This is because we might be processing more than one pack. We
> could be less lazy here and show progress, either detect by detecting
> that we're only processing one pack, or by first looping over the
> packs to discover how many commits they have. I don't see the point in

I do not know if there is no point, but if we were to do it, I think
slurping the list of packs and computing the number of objects is
not all that bad.

>  static void compute_generation_numbers(struct packed_commit_list* commits)
>  {
>   int i;
>   struct commit_list *list = NULL;
> + struct progress *progress = NULL;
>  
> + progress = start_progress(
> + _("Computing commit graph generation numbers"), commits->nr);
>   for (i = 0; i < commits->nr; i++) {
> + display_progress(progress, i);
>   if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY 
> &&
>   commits->list[i]->generation != GENERATION_NUMBER_ZERO)
>   continue;

I am wondering if the progress call should be moved after this
conditional continue; would we want to count the entry whose
generation is already known here?  Of course, as we give commits->nr
as the 100% ceiling, we cannot avoid doing so, but it somehow smells
wrong.


Re: [PATCH 1/2] commit-graph write: add progress output

2018-09-04 Thread Eric Sunshine
On Tue, Sep 4, 2018 at 4:27 PM Ævar Arnfjörð Bjarmason  wrote:
> With --stdin-packs we don't show any estimation of how much is left to
> do. This is because we might be processing more than one pack. We
> could be less lazy here and show progress, either detect by detecting

s/detect//

> that we're only processing one pack, or by first looping over the
> packs to discover how many commits they have. I don't see the point in
> doing that work. So instead we get (on 2015-04-03-1M-git.git):
>
> Signed-off-by: Ævar Arnfjörð Bjarmason