Re: git svn clone/fetch hits issues with gc --auto
On Wed, Oct 10 2018, Jonathan Nieder wrote: > Hi, > > Ævar Arnfjörð Bjarmason wrote: > >> I'm just saying it's hard in this case to remove one piece without the >> whole Jenga tower collapsing, and it's probably a good idea in some of >> these cases to pester the user about what he wants, but probably not via >> gc --auto emitting the same warning every time, e.g. in one of these >> threads I suggested maybe "git status" should emit this. > > I have to say, I don't have a lot of sympathy for this. > > I've been running with the patches I sent before for a while now, and > the behavior that they create is great. I think we can make further > refinements on top. To put it another way, I haven't actually > experienced any bad knock-on effects, and I think other feature > requests can be addressed separately. > > I do have sympathy for some wishes for changes to "git gc --auto" > behavior (I think it should be synchronous regardless of config and > the asynchrony should move to being requested explicitly through a > command line option by the callers within Git) but I don't understand > why this holds up a change that IMHO is wholly positive for users. > > To put it another way, I am getting the feeling that the objections to > that series were theoretical, while the practical benefits of the > patch are pretty immediate and real. I'm happy to help anyone who > wants to polish it but time has shown no one is working on that, so... [I wrote this before seeing Jeff's reply, but just to bo clear...] Yes, like Jeff says I'm not referring to your gitster/jn/gc-auto with this "Jenga tower" comment. Re that patch: I've said what I think about tools printing error messages saying "I can't do stuff" while not returning a non-zero exit code, so I won't repeat that here. But whatever anyone thinks of that it's ultimately a rather trivial detail, and doesn't have any knock-on effects on the rest of git-gc behavior. I'm talking about the "gc: do not warn about too many loose objects" patch and similar approaches. FWIW what I'm describing in <878t36f3ed@evledraar.gmail.com> isn't some theoretical concern. In some large repositories at work that experience a lot of branch churn and have fetch.prune / fetch.pruneTags turned on active checkouts very quickly get to the default 6700 limit. I've currently found that gc.pruneExpire=4.days.ago is close to a sweet spot of avoiding that issue for now, while not e.g. gc-ing a loose object someone committed on Friday before the same time on Monday, but before I tweaked that, but with the default of 2.weeks we'd much more regularly see the problem described in [1]. But as noted in the various GC threads linked from this one that sort of solution within the confines of the current implementation and configuration promises we've made, which lead to all sorts of stupidity. 1. https://public-inbox.org/git/87inc89j38@evledraar.gmail.com/
Re: git svn clone/fetch hits issues with gc --auto
On Wed, Oct 10, 2018 at 09:51:52AM -0700, Jonathan Nieder wrote: > Ævar Arnfjörð Bjarmason wrote: > > > I'm just saying it's hard in this case to remove one piece without the > > whole Jenga tower collapsing, and it's probably a good idea in some of > > these cases to pester the user about what he wants, but probably not via > > gc --auto emitting the same warning every time, e.g. in one of these > > threads I suggested maybe "git status" should emit this. > > I have to say, I don't have a lot of sympathy for this. > > I've been running with the patches I sent before for a while now, and > the behavior that they create is great. I think we can make further > refinements on top. To put it another way, I haven't actually > experienced any bad knock-on effects, and I think other feature > requests can be addressed separately. I think there may be some miscommunication here. The Jenga tower above is referring (I think) to Jonathan Tan's original patch to drop the warning entirely, which does have some unwanted side effects. Your patches are much less controversial, I think, and are in next and marked as "will merge to master" in the last "what's cooking". -Peff
Re: git svn clone/fetch hits issues with gc --auto
Hi, Ævar Arnfjörð Bjarmason wrote: > I'm just saying it's hard in this case to remove one piece without the > whole Jenga tower collapsing, and it's probably a good idea in some of > these cases to pester the user about what he wants, but probably not via > gc --auto emitting the same warning every time, e.g. in one of these > threads I suggested maybe "git status" should emit this. I have to say, I don't have a lot of sympathy for this. I've been running with the patches I sent before for a while now, and the behavior that they create is great. I think we can make further refinements on top. To put it another way, I haven't actually experienced any bad knock-on effects, and I think other feature requests can be addressed separately. I do have sympathy for some wishes for changes to "git gc --auto" behavior (I think it should be synchronous regardless of config and the asynchrony should move to being requested explicitly through a command line option by the callers within Git) but I don't understand why this holds up a change that IMHO is wholly positive for users. To put it another way, I am getting the feeling that the objections to that series were theoretical, while the practical benefits of the patch are pretty immediate and real. I'm happy to help anyone who wants to polish it but time has shown no one is working on that, so... Thanks, Jonathan
Re: git svn clone/fetch hits issues with gc --auto
On Wed, Oct 10, 2018 at 8:21 AM Junio C Hamano wrote: > We probably can keep the "let's not run for a day" safety while > pretending that "git gc -auto" succeeded for callers like "git svn" > so that these callers do not hae to do "eval { ... }" to hide our > exit code, no? > > I think that is what Jonathan's patch (jn/gc-auto) does. +1 `--auto` means "DTRT, but remember you're running as part of a larger process; don't error out unless it's critical". cheers, m -- martin.langh...@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted~ http://github.com/martin-langhoff by shiny stuff
Re: git svn clone/fetch hits issues with gc --auto
On Wed, Oct 10 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> - We use this warning as a proxy for "let's not run for a day", >>otherwise we'll just grind on gc --auto trying to consolidate >>possibly many hundreds of K of loose objects only to find none of >>them can be pruned because the run into the expiry policy. With the >>warning we retry that once per day, which sucks less. >> >> - This conflation of the user-visible warning and the policy is an >>emergent effect of how the different gc pieces interact, which as I >>note in the linked thread(s) sucks. >> >>But we can't just yank one piece away (as Jonathan's patch does) >>without throwing the baby out with the bathwater. >> >>It will mean that e.g. if you have 10k loose objects in your git.git, >>and created them just now, that every time you run anything that runs >>"gc --auto" we'll fork to the background, peg a core at 100% CPU for >>2-3 minutes or whatever it is, only do get nowhere and do the same >>thing again in ~3 minutes when you run your next command. > > We probably can keep the "let's not run for a day" safety while > pretending that "git gc -auto" succeeded for callers like "git svn" > so that these callers do not hae to do "eval { ... }" to hide our > exit code, no? > > I think that is what Jonathan's patch (jn/gc-auto) does. Yeah we could take that patch to skip the eval {} suggested upthread. As noted when it was discussed I'm *mildly* negative on hiding a IMO meaningful exit code like that, but maybe sprinkling eval {} or other "run but ignore exit code" in stuff running "gc --auto" is worth it, and we could just document that you may want to check gc.log. > From: Jonathan Nieder > Date: Mon, 16 Jul 2018 23:57:40 -0700 > Subject: [PATCH] gc: do not return error for prior errors in daemonized mode > > diff --git a/builtin/gc.c b/builtin/gc.c > index 95c8afd07b..ce8a663a01 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -438,9 +438,15 @@ static const char *lock_repo_for_gc(int force, pid_t* > ret_pid) > return NULL; > } > > -static void report_last_gc_error(void) > +/* > + * Returns 0 if there was no previous error and gc can proceed, 1 if > + * gc should not proceed due to an error in the last run. Prints a > + * message and returns -1 if an error occured while reading gc.log > + */ > +static int report_last_gc_error(void) > { > struct strbuf sb = STRBUF_INIT; > + int ret = 0; > ... > if (len < 0) > + ret = error_errno(_("cannot read '%s'"), gc_log_path); > + else if (len > 0) { > + /* > + * A previous gc failed. Report the error, and don't > + * bother with an automatic gc run since it is likely > + * to fail in the same way. > + */ > + warning(_("The last gc run reported the following. " > "Please correct the root cause\n" > "and remove %s.\n" > "Automatic cleanup will not be performed " > "until the file is removed.\n\n" > "%s"), > gc_log_path, sb.buf); > + ret = 1; > + } > strbuf_release(); > done: > free(gc_log_path); > + return ret; > } > > I.e. report_last_gc_error() returns 1 when finds that the previous > attempt to "gc --auto" failed. And then > > @@ -561,7 +576,13 @@ int cmd_gc(int argc, const char **argv, const char > *prefix) > fprintf(stderr, _("See \"git help gc\" for manual > housekeeping.\n")); > } > if (detach_auto) { > - report_last_gc_error(); /* dies on error */ > + int ret = report_last_gc_error(); > + if (ret < 0) > + /* an I/O error occured, already reported */ > + exit(128); > + if (ret == 1) > + /* Last gc --auto failed. Skip this one. */ > + return 0; > > ... it exits with 0 without bothering to rerun "gc". > > So it won't get stuck for 3 minutes; the repository after "gc > --auto" punts will stay to be suboptimal for a day, and the user > kill not get an "actionable" error notice (due to this hiding of > previous error), hence cannot make changes that may help like > shortening expiry period, though. Right, because it still writes the gc.log, but we'll still be yelling at the user on every commit/fetch etc. that we discovered such-and-such an issue on the last gc for that full day. That 3 minute comment was in reference to if we'd apply Jonathan Tan's "[PATCH] gc: do not warn about too many loose objects without any other changes. Then we'd just keep returning true on too_many_loose_objects() even though gc wouldn't help to resolve it.
Re: git svn clone/fetch hits issues with gc --auto
Ævar Arnfjörð Bjarmason writes: > - We use this warning as a proxy for "let's not run for a day", >otherwise we'll just grind on gc --auto trying to consolidate >possibly many hundreds of K of loose objects only to find none of >them can be pruned because the run into the expiry policy. With the >warning we retry that once per day, which sucks less. > > - This conflation of the user-visible warning and the policy is an >emergent effect of how the different gc pieces interact, which as I >note in the linked thread(s) sucks. > >But we can't just yank one piece away (as Jonathan's patch does) >without throwing the baby out with the bathwater. > >It will mean that e.g. if you have 10k loose objects in your git.git, >and created them just now, that every time you run anything that runs >"gc --auto" we'll fork to the background, peg a core at 100% CPU for >2-3 minutes or whatever it is, only do get nowhere and do the same >thing again in ~3 minutes when you run your next command. We probably can keep the "let's not run for a day" safety while pretending that "git gc -auto" succeeded for callers like "git svn" so that these callers do not hae to do "eval { ... }" to hide our exit code, no? I think that is what Jonathan's patch (jn/gc-auto) does. From: Jonathan Nieder Date: Mon, 16 Jul 2018 23:57:40 -0700 Subject: [PATCH] gc: do not return error for prior errors in daemonized mode diff --git a/builtin/gc.c b/builtin/gc.c index 95c8afd07b..ce8a663a01 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -438,9 +438,15 @@ static const char *lock_repo_for_gc(int force, pid_t* ret_pid) return NULL; } -static void report_last_gc_error(void) +/* + * Returns 0 if there was no previous error and gc can proceed, 1 if + * gc should not proceed due to an error in the last run. Prints a + * message and returns -1 if an error occured while reading gc.log + */ +static int report_last_gc_error(void) { struct strbuf sb = STRBUF_INIT; + int ret = 0; ... if (len < 0) + ret = error_errno(_("cannot read '%s'"), gc_log_path); + else if (len > 0) { + /* +* A previous gc failed. Report the error, and don't +* bother with an automatic gc run since it is likely +* to fail in the same way. +*/ + warning(_("The last gc run reported the following. " "Please correct the root cause\n" "and remove %s.\n" "Automatic cleanup will not be performed " "until the file is removed.\n\n" "%s"), gc_log_path, sb.buf); + ret = 1; + } strbuf_release(); done: free(gc_log_path); + return ret; } I.e. report_last_gc_error() returns 1 when finds that the previous attempt to "gc --auto" failed. And then @@ -561,7 +576,13 @@ int cmd_gc(int argc, const char **argv, const char *prefix) fprintf(stderr, _("See \"git help gc\" for manual housekeeping.\n")); } if (detach_auto) { - report_last_gc_error(); /* dies on error */ + int ret = report_last_gc_error(); + if (ret < 0) + /* an I/O error occured, already reported */ + exit(128); + if (ret == 1) + /* Last gc --auto failed. Skip this one. */ + return 0; ... it exits with 0 without bothering to rerun "gc". So it won't get stuck for 3 minutes; the repository after "gc --auto" punts will stay to be suboptimal for a day, and the user kill not get an "actionable" error notice (due to this hiding of previous error), hence cannot make changes that may help like shortening expiry period, though.
Re: git svn clone/fetch hits issues with gc --auto
On Wed, Oct 10 2018, Martin Langhoff wrote: > On Wed, Oct 10, 2018 at 7:27 AM Ævar Arnfjörð Bjarmason > wrote: >> As Jeff's >> https://public-inbox.org/git/20180716175103.gb18...@sigill.intra.peff.net/ >> and my https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/ >> note it's a bit more complex than that. > > Ok, my bad for not reading the whole thread :-) thanks for the kind > explanation. > >> - The warning is actionable, you can decide to up your expiration >>policy. > > A newbie-ish user shouldn't need to know git's internal store model > _and the nuances of its special cases_ got get through. Oh yeah, don't get me wrong. I think this whole thing sucks, and as the linked threads show I've run into various sucky edge cases of this. I'm just saying it's hard in this case to remove one piece without the whole Jenga tower collapsing, and it's probably a good idea in some of these cases to pester the user about what he wants, but probably not via gc --auto emitting the same warning every time, e.g. in one of these threads I suggested maybe "git status" should emit this. > >> - We use this warning as a proxy for "let's not run for a day" > > Oh, so _that's_ the trick with creating gc.log? I then understand the > idea of changing to exit 0. > > But it's far from clear, and a clear _flag_, and not spitting again > the same warning, or differently-worded warning would be better. > > "We won't try running gc, a recent run was deemed pointless until some > time passes. Nothing to worry about." Yup. That would be better. Right now we don't write anything machine-readable to the log, and we'd need to start doing that. E.g. we could just as well be reporting that gc --auto is segfaulting and that's why you have all this garbage. We just "cat" it. >> - This conflation of the user-visible warning and the policy is an >>emergent effect of how the different gc pieces interact, which as I >>note in the linked thread(s) sucks. > > It sure does, and that aspect should be easy to fix...(?) > >> So it's creating a lot of garbage during its cloning process that can >> just be immediately thrown away? What is it doing? Using the object >> store as a scratch pad for its own temporary state? > > Yeah, thats suspicious and I don't know why. I've worked on other > importers and while those needed 'gc' to generate packs, they didn't > generate garbage objects. After gc, the repo was "clean". I tried to find this out in my reply-to-myself in https://public-inbox.org/git/877eiqf2nk@evledraar.gmail.com/ But as noted just looked at this briefly, and I don't use git-svn for years now, so I don't know and might be missing something.
Re: git svn clone/fetch hits issues with gc --auto
On Wed, Oct 10 2018, Ævar Arnfjörð Bjarmason wrote: > On Wed, Oct 10 2018, Martin Langhoff wrote: > >> Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many >> loose objects" makes sense to me. >> >> - remove unactionable warning >> - as the warning is gone, no gc.log is produced >> - subsequent gc runs don't exit due to gc.log >> >> My very humble +1 on that. >> >> As for downsides... if we have truly tons of _recent_ loose objects, >> it'll ... take disk space? I'm fine with that. > > As Jeff's > https://public-inbox.org/git/20180716175103.gb18...@sigill.intra.peff.net/ > and my https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/ > note it's a bit more complex than that. > > I.e.: > > - The warning is actionable, you can decide to up your expiration >policy. > > - We use this warning as a proxy for "let's not run for a day", >otherwise we'll just grind on gc --auto trying to consolidate >possibly many hundreds of K of loose objects only to find none of >them can be pruned because the run into the expiry policy. With the >warning we retry that once per day, which sucks less. > > - This conflation of the user-visible warning and the policy is an >emergent effect of how the different gc pieces interact, which as I >note in the linked thread(s) sucks. > >But we can't just yank one piece away (as Jonathan's patch does) >without throwing the baby out with the bathwater. > >It will mean that e.g. if you have 10k loose objects in your git.git, >and created them just now, that every time you run anything that runs >"gc --auto" we'll fork to the background, peg a core at 100% CPU for >2-3 minutes or whatever it is, only do get nowhere and do the same >thing again in ~3 minutes when you run your next command. > > - I think you may be underestimating some of the cases where this ends >up taking a huge amount of disk space (and now we'll issue at least >*some*) warning. See my >https://public-inbox.org/git/87fu6bmr0j@evledraar.gmail.com/ >where a repo's .git went from 2.5G to 30G due to being stuck in this >mode. > >> For more aggressive gc options, thoughts: >> >> - Do we always consider git gc --prune=now "safe" in a "won't delete >> stuff the user is likely to want" sense? For example -- are the >> references from reflogs enough safety? > > The --prune=now command is not generally safe for the reasons noted in > the "NOTES" section in "git help gc". > >> - Even if we don't, for some commands it should be safe to run git gc >> --prune=now at the end of the process, for example an import that >> generates a new git repo (git svn clone). > > Yeah I don't see a problem with that, I didn't know about this > interesting use-case, i.e. that "git svn clone" will create a lot of > loose objects. > > As seen in my > https://public-inbox.org/git/87tvm3go42@evledraar.gmail.com/ I'm > working on making "gc --auto" run at the end of clone for unrelated > reasons, i.e. so we generate the commit-graph, seems like "git svn > clone" could do something similar. > > So it's creating a lot of garbage during its cloning process that can > just be immediately thrown away? What is it doing? Using the object > store as a scratch pad for its own temporary state? To answer my own question (which was based on a thinko) it's continually creating loose objects during import, i.e. packs are not involved (don't know why I thought that), so yeah, because all of those have <2wks expiry we end up warning as gc --auto is run. But I actually think the git-svn import is revealing an entirely different problem. I.e. when I clone I seem to be getting a refs/remotes/git-svn branch that's kept up-to-date, and when I "gc" everything's consolidated into a pack, we don't have any loose objects that are meant for expiry. But the reason git-svn is whining is because we're doing this in gc (simplified for the sake af discussion): if (too_many_loose()) { expire(); repack(); if (too_many_loose()) die("oh noes too many loose that don't match our expiry policy!"); } But they don't fall under our expiry policy at all, we're just assuming that a crapload of loose objects haven't been added in the interim from when we ran expire() + repack() until when we check too_many_loose() again. That's a logic error which we could just solve at some expense by seeing *which* objects are loose and candidates for expiry at the beginning, and not warning if at the end we have *different* loose objects that should be consolidated, that just means we genuinely should run gc again. Or is this just wrong? I don't really know. If the above is true I'm missing how tweaking gc.pruneExpire=5.minutes.ago is helping. Surely we'd either just end up with the same set of loose objects (since the clone is still running), or alternatively if git-svn hadn't gotten around to updating refs create a corrupt repo. >> m >> On
Re: git svn clone/fetch hits issues with gc --auto
On Wed, Oct 10, 2018 at 7:27 AM Ævar Arnfjörð Bjarmason wrote: > As Jeff's > https://public-inbox.org/git/20180716175103.gb18...@sigill.intra.peff.net/ > and my https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/ > note it's a bit more complex than that. Ok, my bad for not reading the whole thread :-) thanks for the kind explanation. > - The warning is actionable, you can decide to up your expiration >policy. A newbie-ish user shouldn't need to know git's internal store model _and the nuances of its special cases_ got get through. > - We use this warning as a proxy for "let's not run for a day" Oh, so _that's_ the trick with creating gc.log? I then understand the idea of changing to exit 0. But it's far from clear, and a clear _flag_, and not spitting again the same warning, or differently-worded warning would be better. "We won't try running gc, a recent run was deemed pointless until some time passes. Nothing to worry about." > - This conflation of the user-visible warning and the policy is an >emergent effect of how the different gc pieces interact, which as I >note in the linked thread(s) sucks. It sure does, and that aspect should be easy to fix...(?) > So it's creating a lot of garbage during its cloning process that can > just be immediately thrown away? What is it doing? Using the object > store as a scratch pad for its own temporary state? Yeah, thats suspicious and I don't know why. I've worked on other importers and while those needed 'gc' to generate packs, they didn't generate garbage objects. After gc, the repo was "clean". cheers, m -- martin.langh...@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted~ http://github.com/martin-langhoff by shiny stuff
Re: git svn clone/fetch hits issues with gc --auto
On Wed, Oct 10 2018, Martin Langhoff wrote: > Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many > loose objects" makes sense to me. > > - remove unactionable warning > - as the warning is gone, no gc.log is produced > - subsequent gc runs don't exit due to gc.log > > My very humble +1 on that. > > As for downsides... if we have truly tons of _recent_ loose objects, > it'll ... take disk space? I'm fine with that. As Jeff's https://public-inbox.org/git/20180716175103.gb18...@sigill.intra.peff.net/ and my https://public-inbox.org/git/878t69dgvx@evledraar.gmail.com/ note it's a bit more complex than that. I.e.: - The warning is actionable, you can decide to up your expiration policy. - We use this warning as a proxy for "let's not run for a day", otherwise we'll just grind on gc --auto trying to consolidate possibly many hundreds of K of loose objects only to find none of them can be pruned because the run into the expiry policy. With the warning we retry that once per day, which sucks less. - This conflation of the user-visible warning and the policy is an emergent effect of how the different gc pieces interact, which as I note in the linked thread(s) sucks. But we can't just yank one piece away (as Jonathan's patch does) without throwing the baby out with the bathwater. It will mean that e.g. if you have 10k loose objects in your git.git, and created them just now, that every time you run anything that runs "gc --auto" we'll fork to the background, peg a core at 100% CPU for 2-3 minutes or whatever it is, only do get nowhere and do the same thing again in ~3 minutes when you run your next command. - I think you may be underestimating some of the cases where this ends up taking a huge amount of disk space (and now we'll issue at least *some*) warning. See my https://public-inbox.org/git/87fu6bmr0j@evledraar.gmail.com/ where a repo's .git went from 2.5G to 30G due to being stuck in this mode. > For more aggressive gc options, thoughts: > > - Do we always consider git gc --prune=now "safe" in a "won't delete > stuff the user is likely to want" sense? For example -- are the > references from reflogs enough safety? The --prune=now command is not generally safe for the reasons noted in the "NOTES" section in "git help gc". > - Even if we don't, for some commands it should be safe to run git gc > --prune=now at the end of the process, for example an import that > generates a new git repo (git svn clone). Yeah I don't see a problem with that, I didn't know about this interesting use-case, i.e. that "git svn clone" will create a lot of loose objects. As seen in my https://public-inbox.org/git/87tvm3go42@evledraar.gmail.com/ I'm working on making "gc --auto" run at the end of clone for unrelated reasons, i.e. so we generate the commit-graph, seems like "git svn clone" could do something similar. So it's creating a lot of garbage during its cloning process that can just be immediately thrown away? What is it doing? Using the object store as a scratch pad for its own temporary state? > m > On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano wrote: >> >> Forwarding to Jonathan, as I think this is an interesting supporting >> vote for the topic that we were stuck on. >> >> Eric Wong writes: >> >> > Martin Langhoff wrote: >> >> Hi folks, >> >> >> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo >> >> I hit the gc error: >> >> >> >> warning: There are too many unreachable loose objects; run 'git prune' >> >> to remove them. >> >> gc --auto: command returned error: 255 >> > >> > GC can be annoying when that happens... For git-svn, perhaps >> > this can be appropriate to at least allow the import to continue: >> > >> > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm >> > index 76b2965905..9b0caa3d47 100644 >> > --- a/perl/Git/SVN.pm >> > +++ b/perl/Git/SVN.pm >> > @@ -999,7 +999,7 @@ sub restore_commit_header_env { >> > } >> > >> > sub gc { >> > - command_noisy('gc', '--auto'); >> > + eval { command_noisy('gc', '--auto') }; >> > }; >> > >> > sub do_git_commit { >> > >> > >> > But yeah, somebody else who works on git regularly could >> > probably stop repack from writing thousands of loose >> > objects (and instead write a self-contained pack with >> > those objects, instead). I haven't followed git closely >> > lately, myself.
Re: git svn clone/fetch hits issues with gc --auto
Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many loose objects" makes sense to me. - remove unactionable warning - as the warning is gone, no gc.log is produced - subsequent gc runs don't exit due to gc.log My very humble +1 on that. As for downsides... if we have truly tons of _recent_ loose objects, it'll ... take disk space? I'm fine with that. For more aggressive gc options, thoughts: - Do we always consider git gc --prune=now "safe" in a "won't delete stuff the user is likely to want" sense? For example -- are the references from reflogs enough safety? - Even if we don't, for some commands it should be safe to run git gc --prune=now at the end of the process, for example an import that generates a new git repo (git svn clone). cheers, m On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano wrote: > > Forwarding to Jonathan, as I think this is an interesting supporting > vote for the topic that we were stuck on. > > Eric Wong writes: > > > Martin Langhoff wrote: > >> Hi folks, > >> > >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo > >> I hit the gc error: > >> > >> warning: There are too many unreachable loose objects; run 'git prune' > >> to remove them. > >> gc --auto: command returned error: 255 > > > > GC can be annoying when that happens... For git-svn, perhaps > > this can be appropriate to at least allow the import to continue: > > > > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm > > index 76b2965905..9b0caa3d47 100644 > > --- a/perl/Git/SVN.pm > > +++ b/perl/Git/SVN.pm > > @@ -999,7 +999,7 @@ sub restore_commit_header_env { > > } > > > > sub gc { > > - command_noisy('gc', '--auto'); > > + eval { command_noisy('gc', '--auto') }; > > }; > > > > sub do_git_commit { > > > > > > But yeah, somebody else who works on git regularly could > > probably stop repack from writing thousands of loose > > objects (and instead write a self-contained pack with > > those objects, instead). I haven't followed git closely > > lately, myself. -- martin.langh...@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted~ http://github.com/martin-langhoff by shiny stuff
Re: git svn clone/fetch hits issues with gc --auto
On Tue, Oct 09 2018, Martin Langhoff wrote: > Hi folks, > > Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo > I hit the gc error: > > warning: There are too many unreachable loose objects; run 'git prune' > to remove them. > gc --auto: command returned error: 255 > > I don't seem to be the only one -- > https://stackoverflow.com/questions/35738680/avoiding-warning-there-are-too-many-unreachable-loose-objects-during-git-svn > > Looking at code history, it dropped the ability to pass options to git > repack when it was converted it to using git gc. > > Experimentally I find that tweaking it to run git gc --auto > --prune=5.minutes.ago works well, while --prune=now breaks it. > Attempts to commit fail with missing objects. > > - Why does --prune=now break it? Perhaps "gc" runs in the background, > and races with the commit being prepared? > > - Would it be safe, sane to apply --prune=some.value on _clone_? > > - During _fetch_, --prune=some.value seems risky. In a checkout being > actively used for development or merging it'd risk pruning objects > users expect to be there for recovery. Would there be a safe, sane > way? > > - Is there a safer, saner value than 5 minutes? What you've found is the least sucky way to work around this right now, but see my https://public-inbox.org/git/87inc89j38@evledraar.gmail.com/ and https://public-inbox.org/git/87d0vmck55@evledraar.gmail.com/ for some prior (and recent) discussion of this problem on-list. FWIW this has nothing to do with git-svn per-se, and also e.g. happens to me when I do a 'git fetch --all' sometimes on git.git.
Re: git svn clone/fetch hits issues with gc --auto
Forwarding to Jonathan, as I think this is an interesting supporting vote for the topic that we were stuck on. Eric Wong writes: > Martin Langhoff wrote: >> Hi folks, >> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo >> I hit the gc error: >> >> warning: There are too many unreachable loose objects; run 'git prune' >> to remove them. >> gc --auto: command returned error: 255 > > GC can be annoying when that happens... For git-svn, perhaps > this can be appropriate to at least allow the import to continue: > > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm > index 76b2965905..9b0caa3d47 100644 > --- a/perl/Git/SVN.pm > +++ b/perl/Git/SVN.pm > @@ -999,7 +999,7 @@ sub restore_commit_header_env { > } > > sub gc { > - command_noisy('gc', '--auto'); > + eval { command_noisy('gc', '--auto') }; > }; > > sub do_git_commit { > > > But yeah, somebody else who works on git regularly could > probably stop repack from writing thousands of loose > objects (and instead write a self-contained pack with > those objects, instead). I haven't followed git closely > lately, myself.
Re: git svn clone/fetch hits issues with gc --auto
Martin Langhoff wrote: > Hi folks, > > Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo > I hit the gc error: > > warning: There are too many unreachable loose objects; run 'git prune' > to remove them. > gc --auto: command returned error: 255 GC can be annoying when that happens... For git-svn, perhaps this can be appropriate to at least allow the import to continue: diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm index 76b2965905..9b0caa3d47 100644 --- a/perl/Git/SVN.pm +++ b/perl/Git/SVN.pm @@ -999,7 +999,7 @@ sub restore_commit_header_env { } sub gc { - command_noisy('gc', '--auto'); + eval { command_noisy('gc', '--auto') }; }; sub do_git_commit { But yeah, somebody else who works on git regularly could probably stop repack from writing thousands of loose objects (and instead write a self-contained pack with those objects, instead). I haven't followed git closely lately, myself.
git svn clone/fetch hits issues with gc --auto
Hi folks, Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo I hit the gc error: warning: There are too many unreachable loose objects; run 'git prune' to remove them. gc --auto: command returned error: 255 I don't seem to be the only one -- https://stackoverflow.com/questions/35738680/avoiding-warning-there-are-too-many-unreachable-loose-objects-during-git-svn Looking at code history, it dropped the ability to pass options to git repack when it was converted it to using git gc. Experimentally I find that tweaking it to run git gc --auto --prune=5.minutes.ago works well, while --prune=now breaks it. Attempts to commit fail with missing objects. - Why does --prune=now break it? Perhaps "gc" runs in the background, and races with the commit being prepared? - Would it be safe, sane to apply --prune=some.value on _clone_? - During _fetch_, --prune=some.value seems risky. In a checkout being actively used for development or merging it'd risk pruning objects users expect to be there for recovery. Would there be a safe, sane way? - Is there a safer, saner value than 5 minutes? cheers, m -- martin.langh...@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted~ http://github.com/martin-langhoff by shiny stuff -- martin.langh...@gmail.com - ask interesting questions ~ http://linkedin.com/in/martinlanghoff - don't be distracted~ http://github.com/martin-langhoff by shiny stuff