Re: "disabling bitmap writing, as some objects are not being packed"?
On Wed, 2017-02-08 at 09:44 -0800, Junio C Hamano wrote: > Duy Nguyenwrites: > > > On second thought, perhaps gc.autoDetach should default to false if > > there's no tty, since its main point it to stop breaking interactive > > usage. That would make the server side happy (no tty there). > > Sounds like an idea, but wouldn't that keep the end-user coming over > the network waiting after accepting a push until the GC completes, I > wonder. If an impatient user disconnects, would that end up killing > an ongoing GC? etc. Regardless, it's impolite to keep the user waiting. So, I think we should just not write the "too many unreachable loose objects" message if auto-gc is on. Does that sound OK?
Re: "disabling bitmap writing, as some objects are not being packed"?
On Wed, Feb 08, 2017 at 04:18:25PM -0800, Junio C Hamano wrote: > > We wrote something similar at GitHub, too, but we never ended up using > > it in production. We found that with a sane scheduler, it's not too big > > a deal to just do maintenance once in a while. > > Thanks again for this. I've also been wondering about how effective > a "concatenate packs without paying reachability penalty" would be. For the sake of posterity, I'll include our patch at the end (sorry, not chunked into nice readable commits; that never existed in the first place). > > I'm still not sure if it's worth making the fatal/non-fatal distinction. > > Doing so is perhaps safer, but it does mean that somebody has to decide > > which errors are important enough to block a retry totally, and which > > are not. In theory, it would be safe to always _try_ and then the gc > > process can decide when something is broken and abort. And all you've > > wasted is some processing power each day. > > Yup, and somebody or something need to monitor so that repeated > failures can be dealt with. Yes. I think that part is probably outside the scope of Git. But if auto-gc leaves gc.log lying around, it would be easy to visit each repo and collect the various failures. -- >8 -- This is the "pack-fast" patch, for reference. It applies on v2.6.5, though I had to do some wiggling due to a few of our other custom patches, so it's possible I introduced new bugs. It compiles, but I didn't actually re-test the result. I _think_ the original at least generated valid packs in all cases. So I would certainly not recommend anybody run this. It's just a possible base to work off of if anybody's interested in the topic. I haven't looked at David's combine-packs at all to see if it is any less gross. :) --- Makefile| 1 + builtin.h | 1 + builtin/pack-fast.c | 618 +++ cache.h | 5 + git.c | 1 + pack-bitmap-write.c | 167 +- pack-bitmap.c | 2 +- pack-bitmap.h | 8 + sha1_file.c | 4 +- 9 files changed, 792 insertions(+), 15 deletions(-) diff --git a/Makefile b/Makefile index 37e2d9e18..524b185ec 100644 --- a/Makefile +++ b/Makefile @@ -887,6 +887,7 @@ BUILTIN_OBJS += builtin/mv.o BUILTIN_OBJS += builtin/name-rev.o BUILTIN_OBJS += builtin/notes.o BUILTIN_OBJS += builtin/pack-objects.o +BUILTIN_OBJS += builtin/pack-fast.o BUILTIN_OBJS += builtin/pack-redundant.o BUILTIN_OBJS += builtin/pack-refs.o BUILTIN_OBJS += builtin/patch-id.o diff --git a/builtin.h b/builtin.h index 79aaf0afe..df4e4d668 100644 --- a/builtin.h +++ b/builtin.h @@ -95,6 +95,7 @@ extern int cmd_mv(int argc, const char **argv, const char *prefix); extern int cmd_name_rev(int argc, const char **argv, const char *prefix); extern int cmd_notes(int argc, const char **argv, const char *prefix); extern int cmd_pack_objects(int argc, const char **argv, const char *prefix); +extern int cmd_pack_fast(int argc, const char **argv, const char *prefix); extern int cmd_pack_redundant(int argc, const char **argv, const char *prefix); extern int cmd_patch_id(int argc, const char **argv, const char *prefix); extern int cmd_prune(int argc, const char **argv, const char *prefix); diff --git a/builtin/pack-fast.c b/builtin/pack-fast.c new file mode 100644 index 0..ad9f5e5f1 --- /dev/null +++ b/builtin/pack-fast.c @@ -0,0 +1,618 @@ +#include "builtin.h" +#include "cache.h" +#include "pack.h" +#include "progress.h" +#include "csum-file.h" +#include "sha1-lookup.h" +#include "parse-options.h" +#include "tempfile.h" +#include "pack-bitmap.h" +#include "pack-revindex.h" + +static const char *pack_usage[] = { + N_("git pack-fast --quiet [options...] [base-name]"), + NULL +}; + +struct packwriter { + struct tempfile *tmp; + off_t total; + int fd; + uint32_t crc32; + unsigned do_crc; +}; + +static void packwriter_crc32_start(struct packwriter *w) +{ + w->crc32 = crc32(0, NULL, 0); + w->do_crc = 1; +} + +static uint32_t packwriter_crc32_end(struct packwriter *w) +{ + w->do_crc = 0; + return w->crc32; +} + +static void packwriter_write(struct packwriter *w, const void *buf, unsigned int count) +{ + if (w->do_crc) + w->crc32 = crc32(w->crc32, buf, count); + write_or_die(w->fd, buf, count); + w->total += count; +} + +static off_t packwriter_total(struct packwriter *w) +{ + return w->total; +} + +static void packwriter_init(struct packwriter *w) +{ + char tmpname[PATH_MAX]; + + w->fd = odb_mkstemp(tmpname, sizeof(tmpname), "pack/tmp_pack_XX"); + w->total = 0; + w->do_crc = 0; + w->tmp = xcalloc(1, sizeof(*w->tmp)); + + register_tempfile(w->tmp, tmpname); +} + + +static int progress = 1; +static struct progress *progress_state; +static struct pack_idx_option pack_idx_opts; +static const char *base_name = "pack-fast"; +static int
Re: "disabling bitmap writing, as some objects are not being packed"?
Jeff Kingwrites: > In my experience, auto-gc has never been a low-maintenance operation on > the server side (and I do think it was primarily designed with clients > in mind). I do not think auto-gc was ever tweaked to help server usage, in its history since it was invented strictly to help end-users (mostly new ones). > At GitHub we disable it entirely, and do our own gc based on a throttled > job queue ... > I wish regular Git were more turn-key in that respect. Maybe it is for > smaller sites, but we certainly didn't find it so. And I don't know that > it's feasible to really share the solution. It's entangled with our > database (to store last-pushed and last-maintenance values for repos) > and our job scheduler. Thanks for sharing the insights from the trenches ;-) > Yeah, I'm certainly open to improving Git's defaults. If it's not clear > from the above, I mostly just gave up for a site the size of GitHub. :) > >> Idea 1: when gc --auto would issue this message, instead it could create >> a file named gc.too-much-garbage (instead of gc.log), with this message. >> If that file exists, and it is less than one day (?) old, then we don't >> attempt to do a full gc; instead we just run git repack -A -d. (If it's >> more than one day old, we just delete it and continue anyway). > > I kind of wonder if this should apply to _any_ error. I.e., just check > the mtime of gc.log and forcibly remove it when it's older than a day. > You never want to get into a state that will fail to resolve itself > eventually. That might still happen (e.g., corrupt repo), but at the > very least it won't be because Git is too dumb to try again. ;-) >> Idea 2 : Like idea 1, but instead of repacking, just smash the existing >> packs together into one big pack. In other words, don't consider >> dangling objects, or recompute deltas. Twitter has a tool called "git >> combine-pack" that does this: >> https://github.com/dturner-tw/git/blob/dturner/journal/builtin/combine-pack.c > > We wrote something similar at GitHub, too, but we never ended up using > it in production. We found that with a sane scheduler, it's not too big > a deal to just do maintenance once in a while. Thanks again for this. I've also been wondering about how effective a "concatenate packs without paying reachability penalty" would be. > I'm still not sure if it's worth making the fatal/non-fatal distinction. > Doing so is perhaps safer, but it does mean that somebody has to decide > which errors are important enough to block a retry totally, and which > are not. In theory, it would be safe to always _try_ and then the gc > process can decide when something is broken and abort. And all you've > wasted is some processing power each day. Yup, and somebody or something need to monitor so that repeated failures can be dealt with.
Re: "disabling bitmap writing, as some objects are not being packed"?
On Wed, Feb 08, 2017 at 05:14:03PM -0500, David Turner wrote: > > I wonder if you'd want to either bump the auto-gc object limit, or > > possibly reduce the gc.pruneExpire limit to keep this situation from > > coming up in the first place (or at least mitigating the amount of time > > it's the case). > > Auto-gc might not succeed in pruning objects, but it will at least > reduce the number of packs, which is super-important for performance. Right, I mean to bump the loose-object limit but keep the gc.autoPackLimit at 50. If you couple that with setting transfer.unpackLimit, then each push creates a single pack, and you repack after 50 pushes. You don't have to care about loose objects, because you know you only get them when a "gc" ejects loose objects (so they're not as efficient, but nothing actually accesses them; they just hang around until their mtime grace period is up). > I think the intent of automatic gc is to have a git repository be > relatively low-maintenance from a server-operator perspective. (Side > note: it's fairly trivial for a user with push access to mess with the > check simply by pushing a bunch of objects whose shas start with 17). > It seems odd that git gets itself into a state where it refuses to do > any maintenance just because at some point some piece of the maintenance > didn't make progress. In my experience, auto-gc has never been a low-maintenance operation on the server side (and I do think it was primarily designed with clients in mind). At GitHub we disable it entirely, and do our own gc based on a throttled job queue (one reason to throttle is that repacking is memory and I/O intensive, so you really don't want to a bunch of repacks kicking off all at once). So basically any repo that gets pushed to goes on the queue, and then we pick the worst cases from the queue based on how badly they need packing[1]. I wish regular Git were more turn-key in that respect. Maybe it is for smaller sites, but we certainly didn't find it so. And I don't know that it's feasible to really share the solution. It's entangled with our database (to store last-pushed and last-maintenance values for repos) and our job scheduler. [1] The "how bad" thing is a heuristic, and we found it's generally proportional to the number of bytes stored in objects _outside_ of the big "main" pack. So 2 big pushes may need maintenance more than 10 tiny pushes, because they have more objects (and our goal with maintenance isn't just saving disk space or avoiding the linear pack search, but having up-to-date bitmaps and good on-disk deltas to make serving fetches as cheap as possible). > Sure, I could change my configuration, but that doesn't help the other > folks (e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=813084 ) > who run into this. Yeah, I'm certainly open to improving Git's defaults. If it's not clear from the above, I mostly just gave up for a site the size of GitHub. :) > Idea 1: when gc --auto would issue this message, instead it could create > a file named gc.too-much-garbage (instead of gc.log), with this message. > If that file exists, and it is less than one day (?) old, then we don't > attempt to do a full gc; instead we just run git repack -A -d. (If it's > more than one day old, we just delete it and continue anyway). I kind of wonder if this should apply to _any_ error. I.e., just check the mtime of gc.log and forcibly remove it when it's older than a day. You never want to get into a state that will fail to resolve itself eventually. That might still happen (e.g., corrupt repo), but at the very least it won't be because Git is too dumb to try again. > Idea 2 : Like idea 1, but instead of repacking, just smash the existing > packs together into one big pack. In other words, don't consider > dangling objects, or recompute deltas. Twitter has a tool called "git > combine-pack" that does this: > https://github.com/dturner-tw/git/blob/dturner/journal/builtin/combine-pack.c We wrote something similar at GitHub, too, but we never ended up using it in production. We found that with a sane scheduler, it's not too big a deal to just do maintenance once in a while. Also, our original problem was that repos which have gotten out of hand (say, 5000 packs) repacked _very_ slowly with a normal repack. So a "fast pack" followed by a real pack was a viable way out of that. In the end, I just made pack-objects handle this case better, and we don't need the fast-pack. > That's less space-efficient than a true repack, but it's no worse than > having the packs separate, and it's a win for read performance because > there's no need to do a linear search over N packs to find an object. Over the long term you may end up with worse packs, because the true repack will drop some delta opportunities between objects in the same pack (reasoning that they weren't made into deltas last time, so it's not worth trying again). You'd probably need to use "-f"
Re: "disabling bitmap writing, as some objects are not being packed"?
On Wed, 2017-02-08 at 14:08 -0500, Jeff King wrote: > On Wed, Feb 08, 2017 at 02:05:42PM -0500, David Turner wrote: > > > On Wed, 2017-02-08 at 09:44 -0800, Junio C Hamano wrote: > > > Duy Nguyenwrites: > > > > > > > On second thought, perhaps gc.autoDetach should default to false if > > > > there's no tty, since its main point it to stop breaking interactive > > > > usage. That would make the server side happy (no tty there). > > > > > > Sounds like an idea, but wouldn't that keep the end-user coming over > > > the network waiting after accepting a push until the GC completes, I > > > wonder. If an impatient user disconnects, would that end up killing > > > an ongoing GC? etc. > > > > Regardless, it's impolite to keep the user waiting. So, I think we > > should just not write the "too many unreachable loose objects" message > > if auto-gc is on. Does that sound OK? > > I thought the point of that message was to prevent auto-gc from kicking > in over and over again due to objects that won't actually get pruned. > > I wonder if you'd want to either bump the auto-gc object limit, or > possibly reduce the gc.pruneExpire limit to keep this situation from > coming up in the first place (or at least mitigating the amount of time > it's the case). Auto-gc might not succeed in pruning objects, but it will at least reduce the number of packs, which is super-important for performance. I think the intent of automatic gc is to have a git repository be relatively low-maintenance from a server-operator perspective. (Side note: it's fairly trivial for a user with push access to mess with the check simply by pushing a bunch of objects whose shas start with 17). It seems odd that git gets itself into a state where it refuses to do any maintenance just because at some point some piece of the maintenance didn't make progress. Sure, I could change my configuration, but that doesn't help the other folks (e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=813084 ) who run into this. I have three thoughts on this: Idea 1: when gc --auto would issue this message, instead it could create a file named gc.too-much-garbage (instead of gc.log), with this message. If that file exists, and it is less than one day (?) old, then we don't attempt to do a full gc; instead we just run git repack -A -d. (If it's more than one day old, we just delete it and continue anyway). Idea 2 : Like idea 1, but instead of repacking, just smash the existing packs together into one big pack. In other words, don't consider dangling objects, or recompute deltas. Twitter has a tool called "git combine-pack" that does this: https://github.com/dturner-tw/git/blob/dturner/journal/builtin/combine-pack.c That's less space-efficient than a true repack, but it's no worse than having the packs separate, and it's a win for read performance because there's no need to do a linear search over N packs to find an object. Idea 3: As I suggested last time, separate fatal and non-fatal errors. If gc fails because of EIO or something, we probably don't want to touch the disk anymore. But here, the worst consequence is that we waste some processing power. And it's better to occasionally waste processing power in a non-interactive setting than it is to do so when a user will be blocked on it. So non-fatal warnings should go to gc.log, and fatal errors should go to gc.fatal. gc.log won't block gc from running. I think this is my preferred option.
Re: "disabling bitmap writing, as some objects are not being packed"?
On Wed, Feb 08, 2017 at 02:05:42PM -0500, David Turner wrote: > On Wed, 2017-02-08 at 09:44 -0800, Junio C Hamano wrote: > > Duy Nguyenwrites: > > > > > On second thought, perhaps gc.autoDetach should default to false if > > > there's no tty, since its main point it to stop breaking interactive > > > usage. That would make the server side happy (no tty there). > > > > Sounds like an idea, but wouldn't that keep the end-user coming over > > the network waiting after accepting a push until the GC completes, I > > wonder. If an impatient user disconnects, would that end up killing > > an ongoing GC? etc. > > Regardless, it's impolite to keep the user waiting. So, I think we > should just not write the "too many unreachable loose objects" message > if auto-gc is on. Does that sound OK? I thought the point of that message was to prevent auto-gc from kicking in over and over again due to objects that won't actually get pruned. I wonder if you'd want to either bump the auto-gc object limit, or possibly reduce the gc.pruneExpire limit to keep this situation from coming up in the first place (or at least mitigating the amount of time it's the case). -Peff
Re: "disabling bitmap writing, as some objects are not being packed"?
Duy Nguyenwrites: > On second thought, perhaps gc.autoDetach should default to false if > there's no tty, since its main point it to stop breaking interactive > usage. That would make the server side happy (no tty there). Sounds like an idea, but wouldn't that keep the end-user coming over the network waiting after accepting a push until the GC completes, I wonder. If an impatient user disconnects, would that end up killing an ongoing GC? etc.
Re: "disabling bitmap writing, as some objects are not being packed"?
On Wed, Feb 8, 2017 at 3:24 PM, David Turnerwrote: > On Wed, 2017-02-08 at 13:45 +0700, Duy Nguyen wrote: >> On Wed, Feb 8, 2017 at 8:03 AM, David Turner wrote: >> > On Sat, 2016-12-17 at 14:50 +0700, Duy Nguyen wrote: >> >> And we can't grep for fatal errors anyway. The problem that led to >> >> 329e6e8794 was this line >> >> >> >> warning: There are too many unreachable loose objects; run 'git >> >> prune' to remove them. >> >> >> >> which is not fatal. >> > >> > So, speaking of that message, I noticed that our git servers were >> > getting slow again and found that message in gc.log. >> > >> > I propose to make auto gc not write that message either. Any objections? >> >> Does that really help? auto gc would run more often, but unreachable >> loose objects are still present and potentially make your servers >> slow? Should these servers run periodic and explicit gc/prune? > > At least pack files wouldn't accumulate. This is the major cause of > slowdown, since each pack file must be checked for each object. > > (And, also, maybe those unreachable loose objects are too new to get > gc'd, but if we retry next week, we'll gc them). I was about to suggest a config option that lets you run auto gc unconditionally, which, I think, is better than suppressing the message. Then I found gc.autoDetach. If you set it to false globally, I think you'll get the behavior you want. On second thought, perhaps gc.autoDetach should default to false if there's no tty, since its main point it to stop breaking interactive usage. That would make the server side happy (no tty there). -- Duy
Re: "disabling bitmap writing, as some objects are not being packed"?
On Wed, 2017-02-08 at 13:45 +0700, Duy Nguyen wrote: > On Wed, Feb 8, 2017 at 8:03 AM, David Turnerwrote: > > On Sat, 2016-12-17 at 14:50 +0700, Duy Nguyen wrote: > >> And we can't grep for fatal errors anyway. The problem that led to > >> 329e6e8794 was this line > >> > >> warning: There are too many unreachable loose objects; run 'git > >> prune' to remove them. > >> > >> which is not fatal. > > > > So, speaking of that message, I noticed that our git servers were > > getting slow again and found that message in gc.log. > > > > I propose to make auto gc not write that message either. Any objections? > > Does that really help? auto gc would run more often, but unreachable > loose objects are still present and potentially make your servers > slow? Should these servers run periodic and explicit gc/prune? At least pack files wouldn't accumulate. This is the major cause of slowdown, since each pack file must be checked for each object. (And, also, maybe those unreachable loose objects are too new to get gc'd, but if we retry next week, we'll gc them).
Re: "disabling bitmap writing, as some objects are not being packed"?
On Wed, Feb 8, 2017 at 8:03 AM, David Turnerwrote: > On Sat, 2016-12-17 at 14:50 +0700, Duy Nguyen wrote: >> And we can't grep for fatal errors anyway. The problem that led to >> 329e6e8794 was this line >> >> warning: There are too many unreachable loose objects; run 'git >> prune' to remove them. >> >> which is not fatal. > > So, speaking of that message, I noticed that our git servers were > getting slow again and found that message in gc.log. > > I propose to make auto gc not write that message either. Any objections? Does that really help? auto gc would run more often, but unreachable loose objects are still present and potentially make your servers slow? Should these servers run periodic and explicit gc/prune? -- Duy
Re: "disabling bitmap writing, as some objects are not being packed"?
On Sat, 2016-12-17 at 14:50 +0700, Duy Nguyen wrote: > And we can't grep for fatal errors anyway. The problem that led to > 329e6e8794 was this line > > warning: There are too many unreachable loose objects; run 'git > prune' to remove them. > > which is not fatal. So, speaking of that message, I noticed that our git servers were getting slow again and found that message in gc.log. I propose to make auto gc not write that message either. Any objections?
Re: "disabling bitmap writing, as some objects are not being packed"?
On Sat, Dec 17, 2016 at 4:28 AM, Junio C Hamanowrote: > David Turner writes: > >> I'm a bit confused by the message "disabling bitmap writing, as some >> objects are not being packed". I see it the my gc.log file on my git >> server. > >> 1. Its presence in the gc.log file prevents future automatic garbage >> collection. This seems bad. I understand the desire to avoid making >> things worse if a past gc has run into issues. But this warning is >> non-fatal; the only consequence is that many operations get slower. But >> a lack of gc when there are too many packs causes that consequence too >> (often a much worse slowdown than would be caused by the missing >> bitmap). >> >> So I wonder if it would be better for auto gc to grep gc.log for fatal >> errors (as opposed to warnings) and only skip running if any are found. >> Alternately, we could simply put warnings into gc.log.warning and >> reserve gc.log for fatal errors. I'm not sure which would be simpler. > > I am not sure if string matching is really a good idea, as I'd > assume that these messages are eligible for i18n. And we can't grep for fatal errors anyway. The problem that led to 329e6e8794 was this line warning: There are too many unreachable loose objects; run 'git prune' to remove them. which is not fatal. > 329e6e8794 ("gc: save log from daemonized gc --auto and print it > next time", 2015-09-19) wanted to notice that auto-gc is not > making progress and used the presense of error messages as a cue. > In your case, I think the auto-gc _is_ making progress, reducing > number of loose objects in the repository or consolidating many > packfiles into one Yeah the key point is making progress, and to reliably detect that we need some way for all the commands that git-gc executes to tell it about that, git-repack in this particular case but... > and the message is only about the fact that > packing is punting and not producing a bitmap as you asked, which > is different from not making any progress. I do not think log vs > warn is a good criteria to tell them apart, either. > > In any case, as the error message asks the user to do, the user > eventually wants to correct the root cause before removing the > gc.log; I am not sure report_last_gc_error() is the place to correct > this in the first place. > >> 2. I don't understand what would cause that message. That is, what bad >> thing am I doing that I should stop doing? I've briefly skimmed the >> code and commit message, but the answer isn't leaping out at me. > > Enabling bitmap generation for incremental packing that does not > cram everything into a single pack is triggering it, I would > presume. Perhaps we should ignore -b option in most of the cases > and enable it only for "repack -a -d -f" codepath? Or detect that > we are being run from "gc --auto" and automatically disable -b? ... since we have to change down in git-repack for that, perhaps doing this is better. We can pass --auto (or something) to repack to tell it about this special caller, so it only prints something to stderr in serious cases. Or we detect cases where background gc'ing won't work well and always do it in foreground (e.g. when bitmap generation is enabled). > I have a feeling that an approach along that line is closer to the > real solution than tweaking report_last_gc_error() and trying to > deduce if we are making any progress. -- Duy
Re: "disabling bitmap writing, as some objects are not being packed"?
On Fri, Dec 16, 2016 at 04:40:16PM -0500, David Turner wrote: > I would assume, based on the documentation, that auto gc would be doing > an all-into-one repack: > "If the number of packs exceeds the value of gc.autopacklimit, then > existing packs (except those marked with a .keep file) are > consolidated into a single pack by using the -A option of git > repack." > > I don't have any settings that limit the size of packs, either. And a > manual git repack -a -d creates only a single pack. Its loneliness > doesn't last long, because pretty soon a new pack is created by an > incoming push. The interesting code is in need_to_gc(): /* * If there are too many loose objects, but not too many * packs, we run "repack -d -l". If there are too many packs, * we run "repack -A -d -l". Otherwise we tell the caller * there is no need. */ if (too_many_packs()) add_repack_all_option(); else if (!too_many_loose_objects()) return 0; So if you have (say) 10 packs and 10,000 objects, we'll incrementally pack those objects into a single new pack. I never noticed this myself because we do not use auto-gc at GitHub at all. We only ever do a big all-into-one repack. > Unless this just means that some objects are being kept loose (perhaps > because they are unreferenced)? If they're unreferenced, they won't be part of the new pack. You might accumulate loose objects that are ejected from previous packs, which could trigger auto-gc to do an incremental pack (even though it wouldn't be productive, because they're unreferenced!). You may also get them from pushes (small pushes will be exploded into loose objects by default). -Peff
Re: "disabling bitmap writing, as some objects are not being packed"?
On Fri, 2016-12-16 at 16:32 -0500, Jeff King wrote: > On Fri, Dec 16, 2016 at 01:28:00PM -0800, Junio C Hamano wrote: > > > > 2. I don't understand what would cause that message. That is, what bad > > > thing am I doing that I should stop doing? I've briefly skimmed the > > > code and commit message, but the answer isn't leaping out at me. > > > > Enabling bitmap generation for incremental packing that does not > > cram everything into a single pack is triggering it, I would > > presume. Perhaps we should ignore -b option in most of the cases > > and enable it only for "repack -a -d -f" codepath? Or detect that > > we are being run from "gc --auto" and automatically disable -b? I > > have a feeling that an approach along that line is closer to the > > real solution than tweaking report_last_gc_error() and trying to > > deduce if we are making any progress. > > Ah, indeed. I was thinking in my other response that "git gc" would > always kick off an all-into-one repack. But "gc --auto" will not in > certain cases. And yes, in those cases you definitely would want > --no-write-bitmap-index. I think it would be reasonable for "git repack" > to disable bitmap-writing automatically when not doing an all-into-one > repack. I do not have alternates and am not using --local. Nor do I have .keep packs. I would assume, based on the documentation, that auto gc would be doing an all-into-one repack: "If the number of packs exceeds the value of gc.autopacklimit, then existing packs (except those marked with a .keep file) are consolidated into a single pack by using the -A option of git repack." I don't have any settings that limit the size of packs, either. And a manual git repack -a -d creates only a single pack. Its loneliness doesn't last long, because pretty soon a new pack is created by an incoming push. Unless this just means that some objects are being kept loose (perhaps because they are unreferenced)?
Re: "disabling bitmap writing, as some objects are not being packed"?
On Fri, Dec 16, 2016 at 01:28:00PM -0800, Junio C Hamano wrote: > > 2. I don't understand what would cause that message. That is, what bad > > thing am I doing that I should stop doing? I've briefly skimmed the > > code and commit message, but the answer isn't leaping out at me. > > Enabling bitmap generation for incremental packing that does not > cram everything into a single pack is triggering it, I would > presume. Perhaps we should ignore -b option in most of the cases > and enable it only for "repack -a -d -f" codepath? Or detect that > we are being run from "gc --auto" and automatically disable -b? I > have a feeling that an approach along that line is closer to the > real solution than tweaking report_last_gc_error() and trying to > deduce if we are making any progress. Ah, indeed. I was thinking in my other response that "git gc" would always kick off an all-into-one repack. But "gc --auto" will not in certain cases. And yes, in those cases you definitely would want --no-write-bitmap-index. I think it would be reasonable for "git repack" to disable bitmap-writing automatically when not doing an all-into-one repack. -Peff
Re: "disabling bitmap writing, as some objects are not being packed"?
On Fri, Dec 16, 2016 at 04:05:31PM -0500, David Turner wrote: > 1. Its presence in the gc.log file prevents future automatic garbage > collection. This seems bad. I understand the desire to avoid making > things worse if a past gc has run into issues. But this warning is > non-fatal; the only consequence is that many operations get slower. But > a lack of gc when there are too many packs causes that consequence too > (often a much worse slowdown than would be caused by the missing > bitmap). > > So I wonder if it would be better for auto gc to grep gc.log for fatal > errors (as opposed to warnings) and only skip running if any are found. > Alternately, we could simply put warnings into gc.log.warning and > reserve gc.log for fatal errors. I'm not sure which would be simpler. Without thinking too hard on it, that seems like the appropriate solution to me, too. > 2. I don't understand what would cause that message. That is, what bad > thing am I doing that I should stop doing? I've briefly skimmed the > code and commit message, but the answer isn't leaping out at me. Do you have alternates and are using --local? Do you have .keep packs and have set repack.packKeptObjects to false? There are other ways (e.g., an incremental repack), but I think those are the likely ones to get via "git gc". -Peff
Re: "disabling bitmap writing, as some objects are not being packed"?
David Turnerwrites: > I'm a bit confused by the message "disabling bitmap writing, as some > objects are not being packed". I see it the my gc.log file on my git > server. > 1. Its presence in the gc.log file prevents future automatic garbage > collection. This seems bad. I understand the desire to avoid making > things worse if a past gc has run into issues. But this warning is > non-fatal; the only consequence is that many operations get slower. But > a lack of gc when there are too many packs causes that consequence too > (often a much worse slowdown than would be caused by the missing > bitmap). > > So I wonder if it would be better for auto gc to grep gc.log for fatal > errors (as opposed to warnings) and only skip running if any are found. > Alternately, we could simply put warnings into gc.log.warning and > reserve gc.log for fatal errors. I'm not sure which would be simpler. I am not sure if string matching is really a good idea, as I'd assume that these messages are eligible for i18n. 329e6e8794 ("gc: save log from daemonized gc --auto and print it next time", 2015-09-19) wanted to notice that auto-gc is not making progress and used the presense of error messages as a cue. In your case, I think the auto-gc _is_ making progress, reducing number of loose objects in the repository or consolidating many packfiles into one, and the message is only about the fact that packing is punting and not producing a bitmap as you asked, which is different from not making any progress. I do not think log vs warn is a good criteria to tell them apart, either. In any case, as the error message asks the user to do, the user eventually wants to correct the root cause before removing the gc.log; I am not sure report_last_gc_error() is the place to correct this in the first place. > 2. I don't understand what would cause that message. That is, what bad > thing am I doing that I should stop doing? I've briefly skimmed the > code and commit message, but the answer isn't leaping out at me. Enabling bitmap generation for incremental packing that does not cram everything into a single pack is triggering it, I would presume. Perhaps we should ignore -b option in most of the cases and enable it only for "repack -a -d -f" codepath? Or detect that we are being run from "gc --auto" and automatically disable -b? I have a feeling that an approach along that line is closer to the real solution than tweaking report_last_gc_error() and trying to deduce if we are making any progress.