Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>> I'm just saying it's hard in this case to remove one piece without the
>> whole Jenga tower collapsing, and it's probably a good idea in some of
>> these cases to pester the user about what he wants, but probably not via
>> gc --auto emitting the same warning every time, e.g. in one of these
>> threads I suggested maybe "git status" should emit this.
>
> I have to say, I don't have a lot of sympathy for this.
>
> I've been running with the patches I sent before for a while now, and
> the behavior that they create is great.  I think we can make further
> refinements on top.  To put it another way, I haven't actually
> experienced any bad knock-on effects, and I think other feature
> requests can be addressed separately.
>
> I do have sympathy for some wishes for changes to "git gc --auto"
> behavior (I think it should be synchronous regardless of config and
> the asynchrony should move to being requested explicitly through a
> command line option by the callers within Git) but I don't understand
> why this holds up a change that IMHO is wholly positive for users.
>
> To put it another way, I am getting the feeling that the objections to
> that series were theoretical, while the practical benefits of the
> patch are pretty immediate and real.  I'm happy to help anyone who
> wants to polish it but time has shown no one is working on that, so...

[I wrote this before seeing Jeff's reply, but just to bo clear...]

Yes, like Jeff says I'm not referring to your gitster/jn/gc-auto with
this "Jenga tower" comment.

Re that patch: I've said what I think about tools printing error
messages saying "I can't do stuff" while not returning a non-zero exit
code, so I won't repeat that here. But whatever anyone thinks of that
it's ultimately a rather trivial detail, and doesn't have any knock-on
effects on the rest of git-gc behavior.

I'm talking about the "gc: do not warn about too many loose objects"
patch and similar approaches. FWIW what I'm describing in
<878t36f3ed@evledraar.gmail.com> isn't some theoretical concern. In
some large repositories at work that experience a lot of branch churn
and have fetch.prune / fetch.pruneTags turned on active checkouts very
quickly get to the default 6700 limit.

I've currently found that gc.pruneExpire=4.days.ago is close to a sweet
spot of avoiding that issue for now, while not e.g. gc-ing a loose
object someone committed on Friday before the same time on Monday, but
before I tweaked that, but with the default of 2.weeks we'd much more
regularly see the problem described in [1].

But as noted in the various GC threads linked from this one that sort of
solution within the confines of the current implementation and
configuration promises we've made, which lead to all sorts of stupidity.

1. https://public-inbox.org/git/87inc89j38@evledraar.gmail.com/


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Jeff King
On Wed, Oct 10, 2018 at 09:51:52AM -0700, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
> 
> > I'm just saying it's hard in this case to remove one piece without the
> > whole Jenga tower collapsing, and it's probably a good idea in some of
> > these cases to pester the user about what he wants, but probably not via
> > gc --auto emitting the same warning every time, e.g. in one of these
> > threads I suggested maybe "git status" should emit this.
> 
> I have to say, I don't have a lot of sympathy for this.
> 
> I've been running with the patches I sent before for a while now, and
> the behavior that they create is great.  I think we can make further
> refinements on top.  To put it another way, I haven't actually
> experienced any bad knock-on effects, and I think other feature
> requests can be addressed separately.

I think there may be some miscommunication here. The Jenga tower above
is referring (I think) to Jonathan Tan's original patch to drop the
warning entirely, which does have some unwanted side effects.

Your patches are much less controversial, I think, and are in next and
marked as "will merge to master" in the last "what's cooking".

-Peff


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> I'm just saying it's hard in this case to remove one piece without the
> whole Jenga tower collapsing, and it's probably a good idea in some of
> these cases to pester the user about what he wants, but probably not via
> gc --auto emitting the same warning every time, e.g. in one of these
> threads I suggested maybe "git status" should emit this.

I have to say, I don't have a lot of sympathy for this.

I've been running with the patches I sent before for a while now, and
the behavior that they create is great.  I think we can make further
refinements on top.  To put it another way, I haven't actually
experienced any bad knock-on effects, and I think other feature
requests can be addressed separately.

I do have sympathy for some wishes for changes to "git gc --auto"
behavior (I think it should be synchronous regardless of config and
the asynchrony should move to being requested explicitly through a
command line option by the callers within Git) but I don't understand
why this holds up a change that IMHO is wholly positive for users.

To put it another way, I am getting the feeling that the objections to
that series were theoretical, while the practical benefits of the
patch are pretty immediate and real.  I'm happy to help anyone who
wants to polish it but time has shown no one is working on that, so...

Thanks,
Jonathan


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Martin Langhoff
On Wed, Oct 10, 2018 at 8:21 AM Junio C Hamano  wrote:
> We probably can keep the "let's not run for a day" safety while
> pretending that "git gc -auto" succeeded for callers like "git svn"
> so that these callers do not hae to do "eval { ... }" to hide our
> exit code, no?
>
> I think that is what Jonathan's patch (jn/gc-auto) does.

+1

`--auto` means "DTRT, but remember you're running as part of a larger
process; don't error out unless it's critical".

cheers,


m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
>
>>  - We use this warning as a proxy for "let's not run for a day",
>>otherwise we'll just grind on gc --auto trying to consolidate
>>possibly many hundreds of K of loose objects only to find none of
>>them can be pruned because the run into the expiry policy. With the
>>warning we retry that once per day, which sucks less.
>>
>>  - This conflation of the user-visible warning and the policy is an
>>emergent effect of how the different gc pieces interact, which as I
>>note in the linked thread(s) sucks.
>>
>>But we can't just yank one piece away (as Jonathan's patch does)
>>without throwing the baby out with the bathwater.
>>
>>It will mean that e.g. if you have 10k loose objects in your git.git,
>>and created them just now, that every time you run anything that runs
>>"gc --auto" we'll fork to the background, peg a core at 100% CPU for
>>2-3 minutes or whatever it is, only do get nowhere and do the same
>>thing again in ~3 minutes when you run your next command.
>
> We probably can keep the "let's not run for a day" safety while
> pretending that "git gc -auto" succeeded for callers like "git svn"
> so that these callers do not hae to do "eval { ... }" to hide our
> exit code, no?
>
> I think that is what Jonathan's patch (jn/gc-auto) does.

Yeah we could take that patch to skip the eval {} suggested upthread.

As noted when it was discussed I'm *mildly* negative on hiding a IMO
meaningful exit code like that, but maybe sprinkling eval {} or other
"run but ignore exit code" in stuff running "gc --auto" is worth it, and
we could just document that you may want to check gc.log.

> From: Jonathan Nieder 
> Date: Mon, 16 Jul 2018 23:57:40 -0700
> Subject: [PATCH] gc: do not return error for prior errors in daemonized mode
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 95c8afd07b..ce8a663a01 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -438,9 +438,15 @@ static const char *lock_repo_for_gc(int force, pid_t* 
> ret_pid)
>   return NULL;
>  }
>
> -static void report_last_gc_error(void)
> +/*
> + * Returns 0 if there was no previous error and gc can proceed, 1 if
> + * gc should not proceed due to an error in the last run. Prints a
> + * message and returns -1 if an error occured while reading gc.log
> + */
> +static int report_last_gc_error(void)
>  {
>   struct strbuf sb = STRBUF_INIT;
> + int ret = 0;
> ...
>   if (len < 0)
> + ret = error_errno(_("cannot read '%s'"), gc_log_path);
> + else if (len > 0) {
> + /*
> +  * A previous gc failed.  Report the error, and don't
> +  * bother with an automatic gc run since it is likely
> +  * to fail in the same way.
> +  */
> + warning(_("The last gc run reported the following. "
>  "Please correct the root cause\n"
>  "and remove %s.\n"
>  "Automatic cleanup will not be performed "
>  "until the file is removed.\n\n"
>  "%s"),
>   gc_log_path, sb.buf);
> + ret = 1;
> + }
>   strbuf_release();
>  done:
>   free(gc_log_path);
> + return ret;
>  }
>
> I.e. report_last_gc_error() returns 1 when finds that the previous
> attempt to "gc --auto" failed.  And then
>
> @@ -561,7 +576,13 @@ int cmd_gc(int argc, const char **argv, const char 
> *prefix)
>   fprintf(stderr, _("See \"git help gc\" for manual 
> housekeeping.\n"));
>   }
>   if (detach_auto) {
> - report_last_gc_error(); /* dies on error */
> + int ret = report_last_gc_error();
> + if (ret < 0)
> + /* an I/O error occured, already reported */
> + exit(128);
> + if (ret == 1)
> + /* Last gc --auto failed. Skip this one. */
> + return 0;
>
> ... it exits with 0 without bothering to rerun "gc".
>
> So it won't get stuck for 3 minutes; the repository after "gc
> --auto" punts will stay to be suboptimal for a day, and the user
> kill not get an "actionable" error notice (due to this hiding of
> previous error), hence cannot make changes that may help like
> shortening expiry period, though.

Right, because it still writes the gc.log, but we'll still be yelling at
the user on every commit/fetch etc. that we discovered such-and-such an
issue on the last gc for that full day.

That 3 minute comment was in reference to if we'd apply Jonathan Tan's
"[PATCH] gc: do not warn about too many loose objects without any other
changes. Then we'd just keep returning true on too_many_loose_objects()
even though gc wouldn't help to resolve it.


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>  - We use this warning as a proxy for "let's not run for a day",
>otherwise we'll just grind on gc --auto trying to consolidate
>possibly many hundreds of K of loose objects only to find none of
>them can be pruned because the run into the expiry policy. With the
>warning we retry that once per day, which sucks less.
>
>  - This conflation of the user-visible warning and the policy is an
>emergent effect of how the different gc pieces interact, which as I
>note in the linked thread(s) sucks.
>
>But we can't just yank one piece away (as Jonathan's patch does)
>without throwing the baby out with the bathwater.
>
>It will mean that e.g. if you have 10k loose objects in your git.git,
>and created them just now, that every time you run anything that runs
>"gc --auto" we'll fork to the background, peg a core at 100% CPU for
>2-3 minutes or whatever it is, only do get nowhere and do the same
>thing again in ~3 minutes when you run your next command.

We probably can keep the "let's not run for a day" safety while
pretending that "git gc -auto" succeeded for callers like "git svn"
so that these callers do not hae to do "eval { ... }" to hide our
exit code, no?

I think that is what Jonathan's patch (jn/gc-auto) does.

From: Jonathan Nieder 
Date: Mon, 16 Jul 2018 23:57:40 -0700
Subject: [PATCH] gc: do not return error for prior errors in daemonized mode

diff --git a/builtin/gc.c b/builtin/gc.c
index 95c8afd07b..ce8a663a01 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -438,9 +438,15 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
return NULL;
 }
 
-static void report_last_gc_error(void)
+/*
+ * Returns 0 if there was no previous error and gc can proceed, 1 if
+ * gc should not proceed due to an error in the last run. Prints a
+ * message and returns -1 if an error occured while reading gc.log
+ */
+static int report_last_gc_error(void)
 {
struct strbuf sb = STRBUF_INIT;
+   int ret = 0;
...
if (len < 0)
+   ret = error_errno(_("cannot read '%s'"), gc_log_path);
+   else if (len > 0) {
+   /*
+* A previous gc failed.  Report the error, and don't
+* bother with an automatic gc run since it is likely
+* to fail in the same way.
+*/
+   warning(_("The last gc run reported the following. "
   "Please correct the root cause\n"
   "and remove %s.\n"
   "Automatic cleanup will not be performed "
   "until the file is removed.\n\n"
   "%s"),
gc_log_path, sb.buf);
+   ret = 1;
+   }
strbuf_release();
 done:
free(gc_log_path);
+   return ret;
 }
 
I.e. report_last_gc_error() returns 1 when finds that the previous
attempt to "gc --auto" failed.  And then

@@ -561,7 +576,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
fprintf(stderr, _("See \"git help gc\" for manual 
housekeeping.\n"));
}
if (detach_auto) {
-   report_last_gc_error(); /* dies on error */
+   int ret = report_last_gc_error();
+   if (ret < 0)
+   /* an I/O error occured, already reported */
+   exit(128);
+   if (ret == 1)
+   /* Last gc --auto failed. Skip this one. */
+   return 0;

... it exits with 0 without bothering to rerun "gc".

So it won't get stuck for 3 minutes; the repository after "gc
--auto" punts will stay to be suboptimal for a day, and the user
kill not get an "actionable" error notice (due to this hiding of
previous error), hence cannot make changes that may help like
shortening expiry period, though.



Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Martin Langhoff wrote:

> On Wed, Oct 10, 2018 at 7:27 AM Ævar Arnfjörð Bjarmason
>  wrote:
>> As Jeff's
>> https://public-inbox.org/git/20180716175103.gb18...@sigill.intra.peff.net/
>> and my https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/
>> note it's a bit more complex than that.
>
> Ok, my bad for not reading the whole thread :-) thanks for the kind 
> explanation.
>
>>  - The warning is actionable, you can decide to up your expiration
>>policy.
>
> A newbie-ish user shouldn't need to know git's internal store model
> _and the nuances of its special cases_ got get through.

Oh yeah, don't get me wrong. I think this whole thing sucks, and as the
linked threads show I've run into various sucky edge cases of this.

I'm just saying it's hard in this case to remove one piece without the
whole Jenga tower collapsing, and it's probably a good idea in some of
these cases to pester the user about what he wants, but probably not via
gc --auto emitting the same warning every time, e.g. in one of these
threads I suggested maybe "git status" should emit this.

>
>>  - We use this warning as a proxy for "let's not run for a day"
>
> Oh, so _that's_ the trick with creating gc.log? I then understand the
> idea of changing to exit 0.
>
> But it's far from clear, and a clear _flag_, and not spitting again
> the same warning, or differently-worded warning would be better.
>
> "We won't try running gc, a recent run was deemed pointless until some
> time passes. Nothing to worry about."

Yup. That would be better. Right now we don't write anything
machine-readable to the log, and we'd need to start doing that. E.g. we
could just as well be reporting that gc --auto is segfaulting and that's
why you have all this garbage. We just "cat" it.

>>  - This conflation of the user-visible warning and the policy is an
>>emergent effect of how the different gc pieces interact, which as I
>>note in the linked thread(s) sucks.
>
> It sure does, and that aspect should be easy to fix...(?)
>
>> So it's creating a lot of garbage during its cloning process that can
>> just be immediately thrown away? What is it doing? Using the object
>> store as a scratch pad for its own temporary state?
>
> Yeah, thats suspicious and I don't know why. I've worked on other
> importers and while those needed 'gc' to generate packs, they didn't
> generate garbage objects. After gc, the repo was "clean".

I tried to find this out in my reply-to-myself in
https://public-inbox.org/git/877eiqf2nk@evledraar.gmail.com/

But as noted just looked at this briefly, and I don't use git-svn for
years now, so I don't know and might be missing something.


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Oct 10 2018, Martin Langhoff wrote:
>
>> Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many
>> loose objects" makes sense to me.
>>
>> - remove unactionable warning
>> - as the warning is gone, no gc.log is produced
>> - subsequent gc runs don't exit due to gc.log
>>
>> My very humble +1 on that.
>>
>> As for downsides... if we have truly tons of _recent_ loose objects,
>> it'll ... take disk space? I'm fine with that.
>
> As Jeff's
> https://public-inbox.org/git/20180716175103.gb18...@sigill.intra.peff.net/
> and my https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/
> note it's a bit more complex than that.
>
> I.e.:
>
>  - The warning is actionable, you can decide to up your expiration
>policy.
>
>  - We use this warning as a proxy for "let's not run for a day",
>otherwise we'll just grind on gc --auto trying to consolidate
>possibly many hundreds of K of loose objects only to find none of
>them can be pruned because the run into the expiry policy. With the
>warning we retry that once per day, which sucks less.
>
>  - This conflation of the user-visible warning and the policy is an
>emergent effect of how the different gc pieces interact, which as I
>note in the linked thread(s) sucks.
>
>But we can't just yank one piece away (as Jonathan's patch does)
>without throwing the baby out with the bathwater.
>
>It will mean that e.g. if you have 10k loose objects in your git.git,
>and created them just now, that every time you run anything that runs
>"gc --auto" we'll fork to the background, peg a core at 100% CPU for
>2-3 minutes or whatever it is, only do get nowhere and do the same
>thing again in ~3 minutes when you run your next command.
>
>  - I think you may be underestimating some of the cases where this ends
>up taking a huge amount of disk space (and now we'll issue at least
>*some*) warning. See my
>https://public-inbox.org/git/87fu6bmr0j@evledraar.gmail.com/
>where a repo's .git went from 2.5G to 30G due to being stuck in this
>mode.
>
>> For more aggressive gc options, thoughts:
>>
>>  - Do we always consider git gc --prune=now "safe" in a "won't delete
>> stuff the user is likely to want" sense? For example -- are the
>> references from reflogs enough safety?
>
> The --prune=now command is not generally safe for the reasons noted in
> the "NOTES" section in "git help gc".
>
>>  - Even if we don't, for some commands it should be safe to run git gc
>> --prune=now at the end of the process, for example an import that
>> generates a new git repo (git svn clone).
>
> Yeah I don't see a problem with that, I didn't know about this
> interesting use-case, i.e. that "git svn clone" will create a lot of
> loose objects.
>
> As seen in my
> https://public-inbox.org/git/87tvm3go42@evledraar.gmail.com/ I'm
> working on making "gc --auto" run at the end of clone for unrelated
> reasons, i.e. so we generate the commit-graph, seems like "git svn
> clone" could do something similar.
>
> So it's creating a lot of garbage during its cloning process that can
> just be immediately thrown away? What is it doing? Using the object
> store as a scratch pad for its own temporary state?

To answer my own question (which was based on a thinko) it's continually
creating loose objects during import, i.e. packs are not involved (don't
know why I thought that), so yeah, because all of those have <2wks
expiry we end up warning as gc --auto is run.

But I actually think the git-svn import is revealing an entirely
different problem.

I.e. when I clone I seem to be getting a refs/remotes/git-svn branch
that's kept up-to-date, and when I "gc" everything's consolidated into a
pack, we don't have any loose objects that are meant for expiry.

But the reason git-svn is whining is because we're doing this in gc
(simplified for the sake af discussion):

if (too_many_loose()) {
expire();
repack();
if (too_many_loose())
die("oh noes too many loose that don't match our expiry policy!");
}

But they don't fall under our expiry policy at all, we're just assuming
that a crapload of loose objects haven't been added in the interim from
when we ran expire() + repack() until when we check too_many_loose()
again.

That's a logic error which we could just solve at some expense by seeing
*which* objects are loose and candidates for expiry at the beginning,
and not warning if at the end we have *different* loose objects that
should be consolidated, that just means we genuinely should run gc
again.

Or is this just wrong? I don't really know. If the above is true I'm
missing how tweaking gc.pruneExpire=5.minutes.ago is helping. Surely
we'd either just end up with the same set of loose objects (since the
clone is still running), or alternatively if git-svn hadn't gotten
around to updating refs create a corrupt repo.




>> m
>> On 

Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Martin Langhoff
On Wed, Oct 10, 2018 at 7:27 AM Ævar Arnfjörð Bjarmason
 wrote:
> As Jeff's
> https://public-inbox.org/git/20180716175103.gb18...@sigill.intra.peff.net/
> and my https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/
> note it's a bit more complex than that.

Ok, my bad for not reading the whole thread :-) thanks for the kind explanation.

>  - The warning is actionable, you can decide to up your expiration
>policy.

A newbie-ish user shouldn't need to know git's internal store model
_and the nuances of its special cases_ got get through.


>  - We use this warning as a proxy for "let's not run for a day"

Oh, so _that's_ the trick with creating gc.log? I then understand the
idea of changing to exit 0.

But it's far from clear, and a clear _flag_, and not spitting again
the same warning, or differently-worded warning would be better.

"We won't try running gc, a recent run was deemed pointless until some
time passes. Nothing to worry about."

>  - This conflation of the user-visible warning and the policy is an
>emergent effect of how the different gc pieces interact, which as I
>note in the linked thread(s) sucks.

It sure does, and that aspect should be easy to fix...(?)

> So it's creating a lot of garbage during its cloning process that can
> just be immediately thrown away? What is it doing? Using the object
> store as a scratch pad for its own temporary state?

Yeah, thats suspicious and I don't know why. I've worked on other
importers and while those needed 'gc' to generate packs, they didn't
generate garbage objects. After gc, the repo was "clean".

cheers,



m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Wed, Oct 10 2018, Martin Langhoff wrote:

> Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many
> loose objects" makes sense to me.
>
> - remove unactionable warning
> - as the warning is gone, no gc.log is produced
> - subsequent gc runs don't exit due to gc.log
>
> My very humble +1 on that.
>
> As for downsides... if we have truly tons of _recent_ loose objects,
> it'll ... take disk space? I'm fine with that.

As Jeff's
https://public-inbox.org/git/20180716175103.gb18...@sigill.intra.peff.net/
and my https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/
note it's a bit more complex than that.

I.e.:

 - The warning is actionable, you can decide to up your expiration
   policy.

 - We use this warning as a proxy for "let's not run for a day",
   otherwise we'll just grind on gc --auto trying to consolidate
   possibly many hundreds of K of loose objects only to find none of
   them can be pruned because the run into the expiry policy. With the
   warning we retry that once per day, which sucks less.

 - This conflation of the user-visible warning and the policy is an
   emergent effect of how the different gc pieces interact, which as I
   note in the linked thread(s) sucks.

   But we can't just yank one piece away (as Jonathan's patch does)
   without throwing the baby out with the bathwater.

   It will mean that e.g. if you have 10k loose objects in your git.git,
   and created them just now, that every time you run anything that runs
   "gc --auto" we'll fork to the background, peg a core at 100% CPU for
   2-3 minutes or whatever it is, only do get nowhere and do the same
   thing again in ~3 minutes when you run your next command.

 - I think you may be underestimating some of the cases where this ends
   up taking a huge amount of disk space (and now we'll issue at least
   *some*) warning. See my
   https://public-inbox.org/git/87fu6bmr0j@evledraar.gmail.com/
   where a repo's .git went from 2.5G to 30G due to being stuck in this
   mode.

> For more aggressive gc options, thoughts:
>
>  - Do we always consider git gc --prune=now "safe" in a "won't delete
> stuff the user is likely to want" sense? For example -- are the
> references from reflogs enough safety?

The --prune=now command is not generally safe for the reasons noted in
the "NOTES" section in "git help gc".

>  - Even if we don't, for some commands it should be safe to run git gc
> --prune=now at the end of the process, for example an import that
> generates a new git repo (git svn clone).

Yeah I don't see a problem with that, I didn't know about this
interesting use-case, i.e. that "git svn clone" will create a lot of
loose objects.

As seen in my
https://public-inbox.org/git/87tvm3go42@evledraar.gmail.com/ I'm
working on making "gc --auto" run at the end of clone for unrelated
reasons, i.e. so we generate the commit-graph, seems like "git svn
clone" could do something similar.

So it's creating a lot of garbage during its cloning process that can
just be immediately thrown away? What is it doing? Using the object
store as a scratch pad for its own temporary state?

> m
> On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano  wrote:
>>
>> Forwarding to Jonathan, as I think this is an interesting supporting
>> vote for the topic that we were stuck on.
>>
>> Eric Wong  writes:
>>
>> > Martin Langhoff  wrote:
>> >> Hi folks,
>> >>
>> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
>> >> I hit the gc error:
>> >>
>> >> warning: There are too many unreachable loose objects; run 'git prune'
>> >> to remove them.
>> >> gc --auto: command returned error: 255
>> >
>> > GC can be annoying when that happens... For git-svn, perhaps
>> > this can be appropriate to at least allow the import to continue:
>> >
>> > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
>> > index 76b2965905..9b0caa3d47 100644
>> > --- a/perl/Git/SVN.pm
>> > +++ b/perl/Git/SVN.pm
>> > @@ -999,7 +999,7 @@ sub restore_commit_header_env {
>> >  }
>> >
>> >  sub gc {
>> > - command_noisy('gc', '--auto');
>> > + eval { command_noisy('gc', '--auto') };
>> >  };
>> >
>> >  sub do_git_commit {
>> >
>> >
>> > But yeah, somebody else who works on git regularly could
>> > probably stop repack from writing thousands of loose
>> > objects (and instead write a self-contained pack with
>> > those objects, instead).  I haven't followed git closely
>> > lately, myself.


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Martin Langhoff
Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many
loose objects" makes sense to me.

- remove unactionable warning
- as the warning is gone, no gc.log is produced
- subsequent gc runs don't exit due to gc.log

My very humble +1 on that.

As for downsides... if we have truly tons of _recent_ loose objects,
it'll ... take disk space? I'm fine with that.

For more aggressive gc options, thoughts:

 - Do we always consider git gc --prune=now "safe" in a "won't delete
stuff the user is likely to want" sense? For example -- are the
references from reflogs enough safety?

 - Even if we don't, for some commands it should be safe to run git gc
--prune=now at the end of the process, for example an import that
generates a new git repo (git svn clone).

cheers,


m
On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano  wrote:
>
> Forwarding to Jonathan, as I think this is an interesting supporting
> vote for the topic that we were stuck on.
>
> Eric Wong  writes:
>
> > Martin Langhoff  wrote:
> >> Hi folks,
> >>
> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
> >> I hit the gc error:
> >>
> >> warning: There are too many unreachable loose objects; run 'git prune'
> >> to remove them.
> >> gc --auto: command returned error: 255
> >
> > GC can be annoying when that happens... For git-svn, perhaps
> > this can be appropriate to at least allow the import to continue:
> >
> > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
> > index 76b2965905..9b0caa3d47 100644
> > --- a/perl/Git/SVN.pm
> > +++ b/perl/Git/SVN.pm
> > @@ -999,7 +999,7 @@ sub restore_commit_header_env {
> >  }
> >
> >  sub gc {
> > - command_noisy('gc', '--auto');
> > + eval { command_noisy('gc', '--auto') };
> >  };
> >
> >  sub do_git_commit {
> >
> >
> > But yeah, somebody else who works on git regularly could
> > probably stop repack from writing thousands of loose
> > objects (and instead write a self-contained pack with
> > those objects, instead).  I haven't followed git closely
> > lately, myself.



-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Ævar Arnfjörð Bjarmason


On Tue, Oct 09 2018, Martin Langhoff wrote:

> Hi folks,
>
> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
> I hit the gc error:
>
> warning: There are too many unreachable loose objects; run 'git prune'
> to remove them.
> gc --auto: command returned error: 255
>
> I don't seem to be the only one --
> https://stackoverflow.com/questions/35738680/avoiding-warning-there-are-too-many-unreachable-loose-objects-during-git-svn
>
> Looking at code history, it dropped the ability to pass options to git
> repack when it was converted it to using git gc.
>
> Experimentally I find that tweaking it to run git gc --auto
> --prune=5.minutes.ago works well, while --prune=now breaks it.
> Attempts to commit fail with missing objects.
>
> - Why does --prune=now break it? Perhaps "gc" runs in the background,
> and races with the commit being prepared?
>
> - Would it be safe, sane to apply --prune=some.value on _clone_?
>
> - During _fetch_, --prune=some.value seems risky. In a checkout being
> actively used for development or merging it'd risk pruning objects
> users expect to be there for recovery. Would there be a safe, sane
> way?
>
> - Is there a safer, saner value than 5 minutes?

What you've found is the least sucky way to work around this right now,
but see my
https://public-inbox.org/git/87inc89j38@evledraar.gmail.com/ and
https://public-inbox.org/git/87d0vmck55@evledraar.gmail.com/ for
some prior (and recent) discussion of this problem on-list.

FWIW this has nothing to do with git-svn per-se, and also e.g. happens
to me when I do a 'git fetch --all' sometimes on git.git.


Re: git svn clone/fetch hits issues with gc --auto

2018-10-09 Thread Junio C Hamano
Forwarding to Jonathan, as I think this is an interesting supporting
vote for the topic that we were stuck on.

Eric Wong  writes:

> Martin Langhoff  wrote:
>> Hi folks,
>> 
>> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
>> I hit the gc error:
>> 
>> warning: There are too many unreachable loose objects; run 'git prune'
>> to remove them.
>> gc --auto: command returned error: 255
>
> GC can be annoying when that happens... For git-svn, perhaps
> this can be appropriate to at least allow the import to continue:
>
> diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
> index 76b2965905..9b0caa3d47 100644
> --- a/perl/Git/SVN.pm
> +++ b/perl/Git/SVN.pm
> @@ -999,7 +999,7 @@ sub restore_commit_header_env {
>  }
>  
>  sub gc {
> - command_noisy('gc', '--auto');
> + eval { command_noisy('gc', '--auto') };
>  };
>  
>  sub do_git_commit {
>
>
> But yeah, somebody else who works on git regularly could
> probably stop repack from writing thousands of loose
> objects (and instead write a self-contained pack with
> those objects, instead).  I haven't followed git closely
> lately, myself.


Re: git svn clone/fetch hits issues with gc --auto

2018-10-09 Thread Eric Wong
Martin Langhoff  wrote:
> Hi folks,
> 
> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
> I hit the gc error:
> 
> warning: There are too many unreachable loose objects; run 'git prune'
> to remove them.
> gc --auto: command returned error: 255

GC can be annoying when that happens... For git-svn, perhaps
this can be appropriate to at least allow the import to continue:

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 76b2965905..9b0caa3d47 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -999,7 +999,7 @@ sub restore_commit_header_env {
 }
 
 sub gc {
-   command_noisy('gc', '--auto');
+   eval { command_noisy('gc', '--auto') };
 };
 
 sub do_git_commit {


But yeah, somebody else who works on git regularly could
probably stop repack from writing thousands of loose
objects (and instead write a self-contained pack with
those objects, instead).  I haven't followed git closely
lately, myself.


git svn clone/fetch hits issues with gc --auto

2018-10-09 Thread Martin Langhoff
Hi folks,

Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
I hit the gc error:

warning: There are too many unreachable loose objects; run 'git prune'
to remove them.
gc --auto: command returned error: 255

I don't seem to be the only one --
https://stackoverflow.com/questions/35738680/avoiding-warning-there-are-too-many-unreachable-loose-objects-during-git-svn

Looking at code history, it dropped the ability to pass options to git
repack when it was converted it to using git gc.

Experimentally I find that tweaking it to run git gc --auto
--prune=5.minutes.ago works well, while --prune=now breaks it.
Attempts to commit fail with missing objects.

- Why does --prune=now break it? Perhaps "gc" runs in the background,
and races with the commit being prepared?

- Would it be safe, sane to apply --prune=some.value on _clone_?

- During _fetch_, --prune=some.value seems risky. In a checkout being
actively used for development or merging it'd risk pruning objects
users expect to be there for recovery. Would there be a safe, sane
way?

- Is there a safer, saner value than 5 minutes?

cheers,


m
-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff


-- 
 martin.langh...@gmail.com
 - ask interesting questions  ~  http://linkedin.com/in/martinlanghoff
 - don't be distracted~  http://github.com/martin-langhoff
   by shiny stuff