Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
On Wed, Nov 04, 2015 at 02:08:21PM -0600, Doug Kelly wrote: > On Wed, Nov 4, 2015 at 2:02 PM, Jeff Kingwrote: > > Definitely cleaning up the .bitmap is sane and not racy (it's in the > > same boat as the .idx, I think). > > > > .keep files are more tricky. I'd have to go over the receive-pack code > > to confirm, but I think they _are_ racy. That is, receive-pack will > > create them as a lockfile before moving the pack into place. That's OK, > > though, if we use mtimes to give ourselves a grace period (I haven't > > looked at your series yet). > > > > But moreover, .keep files can be created manually by the user. If the > > pack they referenced goes away, they are not really serving any purpose. > > But it's possible that the user would want to salvage the content of the > > file, or know that it was there. > > > > So I'd argue we should leave them. Or at least leave ones that do not > > have the generic "{receive,fetch}-pack $pid on $host comment in them, > > which were clearly created as lockfiles. > > Currently there's no mtime-guarding logic (I dug up that conversation > earlier, though, but after I'd done the respin on this series)... OK, > in that case, I'll create a separate patch that tests/cleans up > .bitmap, but doesn't touch .keep. This might be a small series since > I think the logic for finding pack garbage doesn't know anything about > .bitmap per-se, so it's looking like I'll extend that relevant code, > before adding the handling in gc and appropriate tests. I happened to be looking over your series again, and I noticed that we didn't end up with any mtime logic at all in what got merged. I _think_ that is probably OK, because we always write the pack, followed by the .idx, followed by the .bitmap (if any). And we don't drop .keep files (though I think we would perhaps note them as possible cruft?). So I don't think there are any races introduced here, but I wonder if we want to be a bit more conservative. Sorry to bring this up so much after the fact; I completely forgot about it when reviewing the patches. These changes are slated for the v2.7 release. Like I said, I don't think it's buggy, so we don't necessarily need to address it before the release. We could add an mtime check in the next cycle as a belt-and-suspenders safety, rather than a fix. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
On Wed, Nov 4, 2015 at 1:35 PM, Junio C Hamanowrote: > Doug Kelly writes: > >> I think the patches I sent (a bit prematurely) address the >> remaining comments... I did find there was a relevant test in >> t5304 already, so I added a new test in the same section (and >> cleaned up some of the garbage it wasn't removing before). I'm >> not sure if it's poor form to move tests around like this, but I >> figured it might be best to keep them logically grouped. > > OK, will queue as I didn't spot anything glaringly wrong ;-) > > I did wonder if we want to say anything about .bitmap files, though. > If there is one without matching .idx and .pack, shouldn't we report > just like we report .idx without .pack (or vice versa)? > > Thanks. I think you're right -- this would be something worth following up on. At least, t5304 doesn't cover this case explicitly, but when I tried adding an empty bitmap with a bogus name, I did see a "no corresponding .idx or .pack" error, similar to the stale .keep file. I'd trust your (and Jeff's) knowledge on this far more than my own, but would it be a bad idea to clean up .keep and .bitmap files if the .idx/.pack pair are missing? I think we may have had a discussion previously on how things along these lines might be racey -- but I don't know what order the .keep file is created in relation to the .idx/.pack. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
On Wed, Nov 04, 2015 at 11:35:52AM -0800, Junio C Hamano wrote: > Doug Kellywrites: > > > I think the patches I sent (a bit prematurely) address the > > remaining comments... I did find there was a relevant test in > > t5304 already, so I added a new test in the same section (and > > cleaned up some of the garbage it wasn't removing before). I'm > > not sure if it's poor form to move tests around like this, but I > > figured it might be best to keep them logically grouped. > > OK, will queue as I didn't spot anything glaringly wrong ;-) > > I did wonder if we want to say anything about .bitmap files, though. > If there is one without matching .idx and .pack, shouldn't we report > just like we report .idx without .pack (or vice versa)? Yeah, I think so. The logic should really extend to anything without a matching .pack. And I think the sane rule is probably: If we have pack-$sha.$ext, but not pack-$sha.pack, then: 1. if $ext is known to us as a cache that can be regenerated from the .pack (i.e., .idx, .bitmap), then delete it 2. if $ext is known to us as precious, do nothing (there is nothing in this category right now, though) 3. if $ext is not known to us, warn but do not delete (in case a future version adds something precious) The conservatism in (3) is the right thing to do, I think, but I doubt it will ever matter, because we probably cannot ever add non-cache auxiliary files to the pack. Old versions of git would not delete such precious files, but nor would they carry them forward during a repack. So short of a repo-version bump, I think we are effectively limited to adding only caches which can be re-generated from an original .pack. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
On Wed, Nov 04, 2015 at 01:56:38PM -0600, Doug Kelly wrote: > > I did wonder if we want to say anything about .bitmap files, though. > > If there is one without matching .idx and .pack, shouldn't we report > > just like we report .idx without .pack (or vice versa)? > > I think you're right -- this would be something worth following up on. > At least, t5304 doesn't cover this case explicitly, but when I tried > adding an empty bitmap with a bogus name, I did see a "no > corresponding .idx or .pack" error, similar to the stale .keep file. Yeah, that should be harmless warning (although note because the bitmap code only really handles a single bitmap, it can prevent loading of the "real" bitmap; so you'd want to clean it up, for sure). > I'd trust your (and Jeff's) knowledge on this far more than my own, > but would it be a bad idea to clean up .keep and .bitmap files if the > .idx/.pack pair are missing? I think we may have had a discussion > previously on how things along these lines might be racey -- but I > don't know what order the .keep file is created in relation to the > .idx/.pack. Definitely cleaning up the .bitmap is sane and not racy (it's in the same boat as the .idx, I think). .keep files are more tricky. I'd have to go over the receive-pack code to confirm, but I think they _are_ racy. That is, receive-pack will create them as a lockfile before moving the pack into place. That's OK, though, if we use mtimes to give ourselves a grace period (I haven't looked at your series yet). But moreover, .keep files can be created manually by the user. If the pack they referenced goes away, they are not really serving any purpose. But it's possible that the user would want to salvage the content of the file, or know that it was there. So I'd argue we should leave them. Or at least leave ones that do not have the generic "{receive,fetch}-pack $pid on $host comment in them, which were clearly created as lockfiles. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
On Wed, Nov 4, 2015 at 2:02 PM, Jeff Kingwrote: > On Wed, Nov 04, 2015 at 01:56:38PM -0600, Doug Kelly wrote: > >> > I did wonder if we want to say anything about .bitmap files, though. >> > If there is one without matching .idx and .pack, shouldn't we report >> > just like we report .idx without .pack (or vice versa)? >> >> I think you're right -- this would be something worth following up on. >> At least, t5304 doesn't cover this case explicitly, but when I tried >> adding an empty bitmap with a bogus name, I did see a "no >> corresponding .idx or .pack" error, similar to the stale .keep file. > > Yeah, that should be harmless warning (although note because the bitmap > code only really handles a single bitmap, it can prevent loading of the > "real" bitmap; so you'd want to clean it up, for sure). > >> I'd trust your (and Jeff's) knowledge on this far more than my own, >> but would it be a bad idea to clean up .keep and .bitmap files if the >> .idx/.pack pair are missing? I think we may have had a discussion >> previously on how things along these lines might be racey -- but I >> don't know what order the .keep file is created in relation to the >> .idx/.pack. > > Definitely cleaning up the .bitmap is sane and not racy (it's in the > same boat as the .idx, I think). > > .keep files are more tricky. I'd have to go over the receive-pack code > to confirm, but I think they _are_ racy. That is, receive-pack will > create them as a lockfile before moving the pack into place. That's OK, > though, if we use mtimes to give ourselves a grace period (I haven't > looked at your series yet). > > But moreover, .keep files can be created manually by the user. If the > pack they referenced goes away, they are not really serving any purpose. > But it's possible that the user would want to salvage the content of the > file, or know that it was there. > > So I'd argue we should leave them. Or at least leave ones that do not > have the generic "{receive,fetch}-pack $pid on $host comment in them, > which were clearly created as lockfiles. > > -Peff Currently there's no mtime-guarding logic (I dug up that conversation earlier, though, but after I'd done the respin on this series)... OK, in that case, I'll create a separate patch that tests/cleans up .bitmap, but doesn't touch .keep. This might be a small series since I think the logic for finding pack garbage doesn't know anything about .bitmap per-se, so it's looking like I'll extend that relevant code, before adding the handling in gc and appropriate tests. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
On Wed, Nov 04, 2015 at 02:08:21PM -0600, Doug Kelly wrote: > Currently there's no mtime-guarding logic (I dug up that conversation > earlier, though, but after I'd done the respin on this series)... OK, > in that case, I'll create a separate patch that tests/cleans up > .bitmap, but doesn't touch .keep. This might be a small series since > I think the logic for finding pack garbage doesn't know anything about > .bitmap per-se, so it's looking like I'll extend that relevant code, > before adding the handling in gc and appropriate tests. I'd hoped you could reuse the list of extensions found in builtin/repack.c (e.g., see remove_redundant_pack). But I guess that is not connected with the garbage-reporting code. And anyway, the simple list probably does not carry sufficient information (it does not know that ".keep" is potentially more precious than ".idx", for example). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
Doug Kellywrites: > I think the patches I sent (a bit prematurely) address the > remaining comments... I did find there was a relevant test in > t5304 already, so I added a new test in the same section (and > cleaned up some of the garbage it wasn't removing before). I'm > not sure if it's poor form to move tests around like this, but I > figured it might be best to keep them logically grouped. OK, will queue as I didn't spot anything glaringly wrong ;-) I did wonder if we want to say anything about .bitmap files, though. If there is one without matching .idx and .pack, shouldn't we report just like we report .idx without .pack (or vice versa)? Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
On Wed, Oct 28, 2015 at 5:43 PM, Doug Kellywrote: > On Wed, Oct 28, 2015 at 12:48 PM, Junio C Hamano wrote: >> Junio C Hamano writes: >> >>> Eric Sunshine writes: >>> > -static void real_report_garbage(const char *desc, const char *path) > +const char *bits_to_msg(unsigned seen_bits) If you don't expect other callers outside this file, then this should be declared 'static'. If you do expect future external callers, then this should be declared in a public header file (but renamed to be more meaningful). >>> >>> I think this can be private to this file. The sole point of moving >>> this logic to this file is to make it private, after all ;-) Thanks >>> for sharp eyes. >>> >>> Together with the need for a description on "why", this probably >>> deserves a test or two, probably at the end of t5304. >>> >>> Thanks. >> >> Does somebody want to help tying the final loose ends on this topic? >> It has been listed in the [Stalled] section for too long, I _think_ >> what it attempts to do is a worthy thing, and it is shame to see the >> initial implementation and review cycles we have spent so far go to >> waste. >> >> If I find nothing else to do before any taker appears, I could >> volunteer myself, but thought I should ask first. >> >> Thanks. > > I agree; I've been wanting to get back to it, but had some > higher-priority things at work for a while, so I've not had time. I'd > be happy to get back into it, but if you get to it first, believe me, > I'm not going to be offended. :) > > I'll see if I can't devote a little extra time to it this upcoming > week, though. Hopefully it doesn't need too much additional polishing > to be ready. > > P.S. Does a Googler want to tell the Inbox team that the inability to > send plain-text email is really annoying? :P I think the patches I sent (a bit prematurely) address the remaining comments... I did find there was a relevant test in t5304 already, so I added a new test in the same section (and cleaned up some of the garbage it wasn't removing before). I'm not sure if it's poor form to move tests around like this, but I figured it might be best to keep them logically grouped. Let me know if there's anything I can do, and once again, sorry for the delay! -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
Junio C Hamanowrites: > Eric Sunshine writes: > >>> -static void real_report_garbage(const char *desc, const char *path) >>> +const char *bits_to_msg(unsigned seen_bits) >> >> If you don't expect other callers outside this file, then this should >> be declared 'static'. If you do expect future external callers, then >> this should be declared in a public header file (but renamed to be >> more meaningful). > > I think this can be private to this file. The sole point of moving > this logic to this file is to make it private, after all ;-) Thanks > for sharp eyes. > > Together with the need for a description on "why", this probably > deserves a test or two, probably at the end of t5304. > > Thanks. Does somebody want to help tying the final loose ends on this topic? It has been listed in the [Stalled] section for too long, I _think_ what it attempts to do is a worthy thing, and it is shame to see the initial implementation and review cycles we have spent so far go to waste. If I find nothing else to do before any taker appears, I could volunteer myself, but thought I should ask first. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
On Wed, Oct 28, 2015 at 12:48 PM, Junio C Hamanowrote: > Junio C Hamano writes: > >> Eric Sunshine writes: >> -static void real_report_garbage(const char *desc, const char *path) +const char *bits_to_msg(unsigned seen_bits) >>> >>> If you don't expect other callers outside this file, then this should >>> be declared 'static'. If you do expect future external callers, then >>> this should be declared in a public header file (but renamed to be >>> more meaningful). >> >> I think this can be private to this file. The sole point of moving >> this logic to this file is to make it private, after all ;-) Thanks >> for sharp eyes. >> >> Together with the need for a description on "why", this probably >> deserves a test or two, probably at the end of t5304. >> >> Thanks. > > Does somebody want to help tying the final loose ends on this topic? > It has been listed in the [Stalled] section for too long, I _think_ > what it attempts to do is a worthy thing, and it is shame to see the > initial implementation and review cycles we have spent so far go to > waste. > > If I find nothing else to do before any taker appears, I could > volunteer myself, but thought I should ask first. > > Thanks. I agree; I've been wanting to get back to it, but had some higher-priority things at work for a while, so I've not had time. I'd be happy to get back into it, but if you get to it first, believe me, I'm not going to be offended. :) I'll see if I can't devote a little extra time to it this upcoming week, though. Hopefully it doesn't need too much additional polishing to be ready. P.S. Does a Googler want to tell the Inbox team that the inability to send plain-text email is really annoying? :P -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
Eric Sunshine sunsh...@sunshineco.com writes: -static void real_report_garbage(const char *desc, const char *path) +const char *bits_to_msg(unsigned seen_bits) If you don't expect other callers outside this file, then this should be declared 'static'. If you do expect future external callers, then this should be declared in a public header file (but renamed to be more meaningful). I think this can be private to this file. The sole point of moving this logic to this file is to make it private, after all ;-) Thanks for sharp eyes. Together with the need for a description on why, this probably deserves a test or two, probably at the end of t5304. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
From: Junio C Hamano gits...@pobox.com The hook to report garbage files in $GIT_OBJECT_DIRECTORY/pack/ could be generic but is too specific to count-object's needs. Move the part to produce human-readable messages to count-objects, and refine the interface to callback with the bits with values defined in the cache.h header file, so that other callers (e.g. prune) can later use the same mechanism to enumerate different kinds of garbage files and do something intelligent about them, other than reporting in textual messages. Signed-off-by: Junio C Hamano gits...@pobox.com --- builtin/count-objects.c | 26 -- cache.h | 7 +-- path.c | 2 +- sha1_file.c | 23 ++- 4 files changed, 36 insertions(+), 22 deletions(-) diff --git a/builtin/count-objects.c b/builtin/count-objects.c index ad0c799..4c3198e 100644 --- a/builtin/count-objects.c +++ b/builtin/count-objects.c @@ -15,9 +15,31 @@ static int verbose; static unsigned long loose, packed, packed_loose; static off_t loose_size; -static void real_report_garbage(const char *desc, const char *path) +const char *bits_to_msg(unsigned seen_bits) +{ + switch (seen_bits) { + case 0: + return no corresponding .idx or .pack; + case PACKDIR_FILE_GARBAGE: + return garbage found; + case PACKDIR_FILE_PACK: + return no corresponding .idx; + case PACKDIR_FILE_IDX: + return no corresponding .pack; + case PACKDIR_FILE_PACK|PACKDIR_FILE_IDX: + default: + return NULL; + } +} + +static void real_report_garbage(unsigned seen_bits, const char *path) { struct stat st; + const char *desc = bits_to_msg(seen_bits); + + if (!desc) + return; + if (!stat(path, st)) size_garbage += st.st_size; warning(%s: %s, desc, path); @@ -27,7 +49,7 @@ static void real_report_garbage(const char *desc, const char *path) static void loose_garbage(const char *path) { if (verbose) - report_garbage(garbage found, path); + report_garbage(PACKDIR_FILE_GARBAGE, path); } static int count_loose(const unsigned char *sha1, const char *path, void *data) diff --git a/cache.h b/cache.h index 6bb7119..2d4dedc 100644 --- a/cache.h +++ b/cache.h @@ -1212,8 +1212,11 @@ struct pack_entry { extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path); -/* A hook for count-objects to report invalid files in pack directory */ -extern void (*report_garbage)(const char *desc, const char *path); +/* A hook to report invalid files in pack directory */ +#define PACKDIR_FILE_PACK 1 +#define PACKDIR_FILE_IDX 2 +#define PACKDIR_FILE_GARBAGE 4 +extern void (*report_garbage)(unsigned seen_bits, const char *path); extern void prepare_packed_git(void); extern void reprepare_packed_git(void); diff --git a/path.c b/path.c index 10f4cbf..75ec236 100644 --- a/path.c +++ b/path.c @@ -143,7 +143,7 @@ void report_linked_checkout_garbage(void) strbuf_setlen(sb, len); strbuf_addstr(sb, path); if (file_exists(sb.buf)) - report_garbage(unused in linked checkout, sb.buf); + report_garbage(PACKDIR_FILE_GARBAGE, sb.buf); } strbuf_release(sb); } diff --git a/sha1_file.c b/sha1_file.c index 1cee438..0c0b652 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1183,27 +1183,16 @@ void install_packed_git(struct packed_git *pack) packed_git = pack; } -void (*report_garbage)(const char *desc, const char *path); +void (*report_garbage)(unsigned seen_bits, const char *path); static void report_helper(const struct string_list *list, int seen_bits, int first, int last) { - const char *msg; - switch (seen_bits) { - case 0: - msg = no corresponding .idx or .pack; - break; - case 1: - msg = no corresponding .idx; - break; - case 2: - msg = no corresponding .pack; - break; - default: + if (seen_bits == (PACKDIR_FILE_PACK|PACKDIR_FILE_IDX)) return; - } + for (; first last; first++) - report_garbage(msg, list-items[first].string); + report_garbage(seen_bits, list-items[first].string); } static void report_pack_garbage(struct string_list *list) @@ -1226,7 +1215,7 @@ static void report_pack_garbage(struct string_list *list) if (baselen == -1) { const char *dot = strrchr(path, '.'); if (!dot) { - report_garbage(garbage found, path); + report_garbage(PACKDIR_FILE_GARBAGE, path); continue; }
Re: [PATCH 1/2] prepare_packed_git(): refactor garbage reporting in pack directory
On Thu, Aug 13, 2015 at 2:02 PM, Doug Kelly dougk@gmail.com wrote: From: Junio C Hamano gits...@pobox.com The hook to report garbage files in $GIT_OBJECT_DIRECTORY/pack/ could be generic but is too specific to count-object's needs. Move the part to produce human-readable messages to count-objects, and refine the interface to callback with the bits with values defined in the cache.h header file, so that other callers (e.g. prune) can later use the same mechanism to enumerate different kinds of garbage files and do something intelligent about them, other than reporting in textual messages. Signed-off-by: Junio C Hamano gits...@pobox.com Since you're forwarding Junio's patch, you'd also want to sign-off (following his). --- diff --git a/builtin/count-objects.c b/builtin/count-objects.c index ad0c799..4c3198e 100644 --- a/builtin/count-objects.c +++ b/builtin/count-objects.c @@ -15,9 +15,31 @@ static int verbose; static unsigned long loose, packed, packed_loose; static off_t loose_size; -static void real_report_garbage(const char *desc, const char *path) +const char *bits_to_msg(unsigned seen_bits) If you don't expect other callers outside this file, then this should be declared 'static'. If you do expect future external callers, then this should be declared in a public header file (but renamed to be more meaningful). +{ + switch (seen_bits) { + case 0: + return no corresponding .idx or .pack; + case PACKDIR_FILE_GARBAGE: + return garbage found; + case PACKDIR_FILE_PACK: + return no corresponding .idx; + case PACKDIR_FILE_IDX: + return no corresponding .pack; + case PACKDIR_FILE_PACK|PACKDIR_FILE_IDX: + default: + return NULL; + } +} -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html