Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
Jeff Kingwrites: > So I dunno. That is far from exhaustive, but it does seem like most > cases should assume the presence of the file. You are right. I should have considered that "we got a random object-name looking thing and we do not care if it does not exist" is a very narrow minority case. Most of the object names we deal with come from local repository traversal and unless things like the "fsck-promisor" comes into the picture, we should always have them available locally. > But it may not be that big a deal. For the most part, all bets are off > in a corrupt repo. My main concern is just that we do not want the > corruption to spread or to make it harder for us to recover from (and > that includes allowing us to delete or overwrite other data that would > otherwise be forbidden, which is what's happening in the fetch case). Absolutely. Thanks.
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
On Thu, Nov 23, 2017 at 11:35:21AM +0900, Junio C Hamano wrote: > Not limiting us to the caller in the "fetch" codepath, I think the > expectation by callers of lookup_commit_reference_gently() in the > ideal world would be: > > - It has an object name, and wants to use it as point in the commit >DAG to define the traversal over the DAG, if it refers to a >commit known to us. > > - It does not know if these object names represent a tag object, a >commit object, or some other object. It does not know if the >local repository actually has them (e.g. we received a "have" >from the other side---missing is expected). > > - Hence, it would happily accept a NULL as "we do not have it" and >"we do have it, but it is not a commit-ish". > > And from that point of view, 2, 3a (missing), and 4 (0{40}) to yield > NULL is perfectly fine. 3b (exists but broken) may be a noteworthy > event, but for the purpose of the caller, it may want to proceed as > if the object is missing from our end, so it might deserve warning() > but not die(), at least as the default behaviour. Hmm, yeah, I did not differentiate 3a and 3b in my analysis. How we'd want to handle "missing" would vary from caller to caller, I think. Certainly for this case in "fetch" where it's the "old" value for a ref, it would be a corruption not to have it. Just grepping around a few of the other callers, I see: - archive.c:parse_treeish_arg: fine not to have it (we'd complain soon after that it doesn't point to a tree either). But also fine to complain hard. - blame.c:dwim_reverse_initial, and checkout_paths and switch_branches in checkout.c: missing object would mean a broken HEAD - show-branch.c:append_ref: missing would mean a broken ref - clear_commit_marks_for_object_array: conceptually OK to have a missing object, though I suspect practically impossible since we examined and marked the objects earlier - ref-filter's --merged, --contains, etc: the low-level code quietly ignores a missing object or non-commit, but the command-line parser enforces that we find a commit. So probably impossible to trigger, but arguably the second call should be a BUG(). So I dunno. That is far from exhaustive, but it does seem like most cases should assume the presence of the file. As for your proposed behavior: > And from that point of view, 2, 3a (missing), and 4 (0{40}) to yield > NULL is perfectly fine. 3b (exists but broken) may be a noteworthy > event, but for the purpose of the caller, it may want to proceed as > if the object is missing from our end, so it might deserve warning() > but not die(), at least as the default behaviour. That's actually not far from what happens now, with the only difference being that we _do_ actually die() on a corruption (really any error except ENOENT). I forgot that when I wrote my earlier message. You can see it by updating the "fetch" reproduction I sent earlier to corrupt: -- >8 -- git init parent git -C parent commit --allow-empty -m base git clone parent child git -C parent commit --allow-empty -m more cd child for i in .git/objects/??/* do chmod +w $i echo corrupt >$i done git fetch -- 8< -- which gives something like: error: inflate: data stream error (incorrect header check) error: unable to unpack 55c66a401fd4190382f9cd8b70c11f9f5adb044e header fatal: loose object 55c66a401fd4190382f9cd8b70c11f9f5adb044e (stored in .git/objects/55/c66a401fd4190382f9cd8b70c11f9f5adb044e) is corrupt So that's good. I do still think that many of the callers of lookup_commit_reference_gently() probably ought to die() in the "missing" case rather than continue (and produce subtly wrong answers). But it may not be that big a deal. For the most part, all bets are off in a corrupt repo. My main concern is just that we do not want the corruption to spread or to make it harder for us to recover from (and that includes allowing us to delete or overwrite other data that would otherwise be forbidden, which is what's happening in the fetch case). Most of the callers probably don't fall into that situation, but it might be nice to err on the side of caution. -Peff
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
Jeff Kingwrites: > So there are four cases we care about for this call in fetch: > > 1. We fed a real sha1 and got a commit (or peeled to one). > > 2. We fed a real sha1 which resolved to a non-commit, and we got NULL. > > 3. We fed a real sha1 and the object was missing or corrupted, and we > got NULL. > > 4. We fed a null sha1 and got NULL. > > Right now we lump cases 2-4 together as "do not do a fast-forward > check". That's fine for 2 and 4, but probably not for 3. We can easily > catch case 4 ourselves (if we care to), but distinguishing case 3 from > the others is hard. How should lookup_commit_reference_gently() signal > it to us? Not limiting us to the caller in the "fetch" codepath, I think the expectation by callers of lookup_commit_reference_gently() in the ideal world would be: - It has an object name, and wants to use it as point in the commit DAG to define the traversal over the DAG, if it refers to a commit known to us. - It does not know if these object names represent a tag object, a commit object, or some other object. It does not know if the local repository actually has them (e.g. we received a "have" from the other side---missing is expected). - Hence, it would happily accept a NULL as "we do not have it" and "we do have it, but it is not a commit-ish". And from that point of view, 2, 3a (missing), and 4 (0{40}) to yield NULL is perfectly fine. 3b (exists but broken) may be a noteworthy event, but for the purpose of the caller, it may want to proceed as if the object is missing from our end, so it might deserve warning() but not die(), at least as the default behaviour.
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
On Wed, Nov 22, 2017 at 10:42:30AM +0900, Junio C Hamano wrote: > Jeff Kingwrites: > > > I'm not sure what the right behavior is, but I'm pretty sure that's not > > it. Probably one of: > > > > - skip updating the ref when we see the breakage > > > > - ditto, but terminate the whole operation, since we might be deleting > > other refs and in a broken repo we're probably best to make as few > > changes as possible > > > > - behave as if it was a non-ff, which would allow "--force" to > > overwrite the broken ref. Maybe convenient for fixing things, but > > possibly surprising (and it's not that hard to just delete the > > broken refs manually before proceeding). > > Perhaps the last one would be the ideal endgame, but the second one > may be a good stopping point in the shorter term. This turns out to be a lot trickier than I expected. The crux of the matter is that the case we care about is hidden inside lookup_commit_reference_gently(), which doesn't distinguish between corruption and "not a commit". So there are four cases we care about for this call in fetch: 1. We fed a real sha1 and got a commit (or peeled to one). 2. We fed a real sha1 which resolved to a non-commit, and we got NULL. 3. We fed a real sha1 and the object was missing or corrupted, and we got NULL. 4. We fed a null sha1 and got NULL. Right now we lump cases 2-4 together as "do not do a fast-forward check". That's fine for 2 and 4, but probably not for 3. We can easily catch case 4 ourselves (if we care to), but distinguishing case 3 from the others is hard. How should lookup_commit_reference_gently() signal it to us? Or should lookup_commit_reference_gently() die on corruption? That's not very "gentle", but I think the "gently" here is really about "it might not be a commit", not "the repo might be corrupted". But I think even that may be the tip of the iceberg. The next thing we do is feed the commits to in_merge_bases(), which will happily return "nope" if the old commit cannot be parsed (because it has only a boolean return value). So I dunno. Maybe it is a losing battle to try to pass this kind of corruption information up the stack. I'm tempted to say that there should just be a "paranoid" flag to globally die() whenever we see a corruption (and you could run with it normally, but relax it whenever you're investigating a broken repo). But I doubt even that works. Not having the "old_oid" object at all would be a repo corruption here, but how are the low-level routines supposed to know when a missing object is a corruption and when it is not? -Peff
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
On Wed, Nov 22, 2017 at 10:49:25AM +0900, Junio C Hamano wrote: > WRT existing codepaths that pass 0{40} and refuses to notice a > potential repository corruption (from getting a NULL for a non null > object name), I think we would need a sweep of the codebase and fix > them in the longer term. As long as that will happen someday, either > approach between "we know 'no loose object? let's redo the packs' is > the part that matters performance-wise, so let's do a short-cut only > for that" and "we know that callers that comes with 0{40} want to get > NULL back" should be OK, I would think. I agree. Let's go with the "v2 5/5" I posted then. I'll try to work up a patch for the fetch.c case I found tomorrow, but I suspect there are many more. But that's largely orthogonal to the series. -Peff
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
Jeff Kingwrites: > Here's a re-roll of patch 5 that behaves this way (the patch should be > unsurprising, but I've updated the commit message to match). > > I did notice one other thing: the function looks up replace objects, so > we have both the original and the replaced sha1 available. My earlier > patch used the original sha1, but this one uses the replaced value. > I'm not sure if that's sane or not. It lets the fast-path kick in if you > point a replace ref at 0{40}. And it lets you point refs/replace/0{40} > to something else. I doubt that's useful, but it could perhaps be for > debugging, etc. > > In most cases, of course, it wouldn't matter (since pointing to or from > the null sha1 is vaguely crazy in the first place). I tend to agree that those who go crazy/fancy with replace mechanism can keep both halves when it breaks. WRT existing codepaths that pass 0{40} and refuses to notice a potential repository corruption (from getting a NULL for a non null object name), I think we would need a sweep of the codebase and fix them in the longer term. As long as that will happen someday, either approach between "we know 'no loose object? let's redo the packs' is the part that matters performance-wise, so let's do a short-cut only for that" and "we know that callers that comes with 0{40} want to get NULL back" should be OK, I would think.
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
Jeff Kingwrites: > I'm not sure what the right behavior is, but I'm pretty sure that's not > it. Probably one of: > > - skip updating the ref when we see the breakage > > - ditto, but terminate the whole operation, since we might be deleting > other refs and in a broken repo we're probably best to make as few > changes as possible > > - behave as if it was a non-ff, which would allow "--force" to > overwrite the broken ref. Maybe convenient for fixing things, but > possibly surprising (and it's not that hard to just delete the > broken refs manually before proceeding). Perhaps the last one would be the ideal endgame, but the second one may be a good stopping point in the shorter term.
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
On Tue, Nov 21, 2017 at 02:20:19PM +0900, Junio C Hamano wrote: > After queuing this series to an earlier part of 'pu' and resolving a > conflict with jh/fsck-promisors topic, I ended up with a code that > rejects 0{40} a lot earlier, before we try to see if a pack entry > for 0{40} exists, even though the patch that is queued on this topic > is more conservative (i.e. the above one). > > Perhaps we would want to use the alternate version that declares the > 0{40} is a sentinel that signals that there is no such object in > this topic---that would give us a consistent semantics without > having to adjust jh/fsck-promisors when it becomes ready to be > merged. I think we could adjust the patches without too much effort. That said, I am very on-the-fence about whether to just declare the null sha1 as "not a real object" and automatically return from the lookup without doing any work. But if you agree that it's not likely to hurt anything, it's IMHO a lot easier to reason about. Here's a re-roll of patch 5 that behaves this way (the patch should be unsurprising, but I've updated the commit message to match). I did notice one other thing: the function looks up replace objects, so we have both the original and the replaced sha1 available. My earlier patch used the original sha1, but this one uses the replaced value. I'm not sure if that's sane or not. It lets the fast-path kick in if you point a replace ref at 0{40}. And it lets you point refs/replace/0{40} to something else. I doubt that's useful, but it could perhaps be for debugging, etc. In most cases, of course, it wouldn't matter (since pointing to or from the null sha1 is vaguely crazy in the first place). -- >8 -- Subject: [PATCH v2 5/5] sha1_file: fast-path null sha1 as a missing object In theory nobody should ever ask the low-level object code for a null sha1. It's used as a sentinel for "no such object" in lots of places, so leaking through to this level is a sign that the higher-level code is not being careful about its error-checking. In practice, though, quite a few code paths seem to rely on the null sha1 lookup failing as a way to quietly propagate non-existence (e.g., by feeding it to lookup_commit_reference_gently(), which then returns NULL). When this happens, we do two inefficient things: 1. We actually search for the null sha1 in packs and in the loose object directory. 2. When we fail to find it, we re-scan the pack directory in case a simultaneous repack happened to move it from loose to packed. This can be very expensive if you have a large number of packs. Only the second one actually causes noticeable performance problems, so we could treat them independently. But for the sake of simplicity (both of code and of reasoning about it), it makes sense to just declare that the null sha1 cannot be a real on-disk object, and looking it up will always return "no such object". There's no real loss of functionality to do so Its use as a sentinel value means that anybody who is unlucky enough to hit the 2^-160th chance of generating an object with that sha1 is already going to find the object largely unusable. In an ideal world, we'd simply fix all of the callers to notice the null sha1 and avoid passing it to us. But a simple experiment to catch this with a BUG() shows that there are a large number of code paths that do so. So in the meantime, let's fix the performance problem by taking a fast exit from the object lookup when we see a null sha1. p5551 shows off the improvement (when a fetched ref is new, the "old" sha1 is 0{40}, which ends up being passed for fast-forward checks, the status table abbreviations, etc): TestHEAD^ HEAD 5551.4: fetch 5.51(5.03+0.48) 0.17(0.10+0.06) -96.9% Signed-off-by: Jeff King--- sha1_file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sha1_file.c b/sha1_file.c index 8a7c6b7eba..ae4b71f445 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1152,6 +1152,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, lookup_replace_object(sha1) : sha1; + if (is_null_sha1(real)) + return -1; + if (!oi) oi = _oi; -- 2.15.0.578.g35b419775f
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
On Tue, Nov 21, 2017 at 11:37:28AM +0900, Junio C Hamano wrote: > Jeff Kingwrites: > > > In an ideal world, we'd simply fix all of the callers to > > notice the null sha1 and avoid passing it to us. But a > > simple experiment to catch this with a BUG() shows that > > there are a large number of code paths. > > Well, we can view this (or the alternative you sent later that does > the same a bit earlier in the function) as "fixing the caller" but > has already refactord the common logic to a helper function that all > of these callers call into ;-). Yes, I'm definitely tempted by that view. :) What worries me, though, is that callers who lazily propagate the null sha1 to the lookup functions cannot reasonably tell the difference between "this object was corrupt or missing" and "we passed the null sha1, and a missing object is expected". For example, look at how fetch.c:update_local_ref() looks up objects. It feeds the old and new sha1 to lookup_commit_reference_gently(), and if either is NULL, it skips the fast-forward check. That makes sense if we expect the null sha1 to get translated into a NULL commit. But it also triggers for a broken ref, and we'd overwrite it (when the right thing is probably refusing to update). Here's a runnable example: -- >8 -- git init parent git -C parent commit --allow-empty -m base git clone parent child git -C parent commit --allow-empty -m more cd child rm -f .git/objects/??/* git fetch -- 8< -- That final fetch spews a bunch of errors about broken refs, and then overwrites the value of origin/master, even though it's broken (in this case it actually is a fast-forward, but the child repo doesn't even know that). I'm not sure what the right behavior is, but I'm pretty sure that's not it. Probably one of: - skip updating the ref when we see the breakage - ditto, but terminate the whole operation, since we might be deleting other refs and in a broken repo we're probably best to make as few changes as possible - behave as if it was a non-ff, which would allow "--force" to overwrite the broken ref. Maybe convenient for fixing things, but possibly surprising (and it's not that hard to just delete the broken refs manually before proceeding). -Peff
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
Jeff Kingwrites: > This is the minimal fix that addresses the performance issues. > I'd actually have no problem at all declaring that looking up a null > sha1 is insane, and having the object-lookup routines simply return "no > such object" without even doing the loose/pack lookup first. > > diff --git a/sha1_file.c b/sha1_file.c > index 8a7c6b7eba..dde0ad101d 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -1180,6 +1180,9 @@ int sha1_object_info_extended(const unsigned char > *sha1, struct object_info *oi, > if (!sha1_loose_object_info(real, oi, flags)) > return 0; > > + if (is_null_sha1(sha1)) > + return -1; > + > /* Not a loose object; someone else may have just packed it. */ > if (flags & OBJECT_INFO_QUICK) { > return -1; After queuing this series to an earlier part of 'pu' and resolving a conflict with jh/fsck-promisors topic, I ended up with a code that rejects 0{40} a lot earlier, before we try to see if a pack entry for 0{40} exists, even though the patch that is queued on this topic is more conservative (i.e. the above one). Perhaps we would want to use the alternate version that declares the 0{40} is a sentinel that signals that there is no such object in this topic---that would give us a consistent semantics without having to adjust jh/fsck-promisors when it becomes ready to be merged.
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
Jeff Kingwrites: > In an ideal world, we'd simply fix all of the callers to > notice the null sha1 and avoid passing it to us. But a > simple experiment to catch this with a BUG() shows that > there are a large number of code paths. Well, we can view this (or the alternative you sent later that does the same a bit earlier in the function) as "fixing the caller" but has already refactord the common logic to a helper function that all of these callers call into ;-).
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
On Mon, Nov 20, 2017 at 12:47:53PM -0800, Stefan Beller wrote: > > This is the minimal fix that addresses the performance issues. > > I'd actually have no problem at all declaring that looking up a null > > sha1 is insane, and having the object-lookup routines simply return "no > > such object" without even doing the loose/pack lookup first. > > > > I'm also fine with going down the BUG() route and fixing the > > higher-level callers, but it's a big enough task (with little enough > > real-world impact) that I think it would be worth applying this in the > > meantime. > > It would have a lot of impact in the future, when new developers > are hindered mis-using the API. The (unwritten) series with introducing > BUG() would help a lot in 'holding it right' and I would expect fewer > performance > regressions over time. Yeah, I agree the BUG() route is the nicest. I started trying to hunt down every code path causing a test failure, though, and soon got overwhelmed. I actually think it's not the worst thing in the world to say "it's OK to look up the null sha1; the result is well-defined, and it does not exist". It's perhaps philosophically a little ugly, but I think it's quite practical. > The patch is impressively small for such a performance gain. > Personally I think (1) (which essentially means "making null sha1 > work like a regular sha1") is quite an endeavor at this point in time > for this code base. Yeah, it would be a crazy amount of effort to try to support that, for very little gain. If we did want to just declare that the null sha1 does not exist (even if you happen to have objects/00/00... in your repository), the patch is similarly easy: diff --git a/sha1_file.c b/sha1_file.c index 8a7c6b7eba..754fe7f03e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1152,6 +1152,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, lookup_replace_object(sha1) : sha1; + if (is_null_sha1(sha1)) + return -1; + if (!oi) oi = _oi; > As a tangent, a similar but solved problem in the diff code is how > NUL in user data is treated in xdiff for example, as there we kept > being careful since the beginning (though I think we don't have tests > for it, so it might be broken) There I think it's nice if we can be tolerant of NULs, because they're a thing that might actually happen in user input. A real null sha1 has a 2^-160 chance of happening, and the design of sha1 is such that it's hard for somebody to do it intentionally. You might be able to get up to some mischief with replace objects, though. -Peff
Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
On Mon, Nov 20, 2017 at 12:35 PM, Jeff Kingwrote: > In theory nobody should ever ask the low-level object code > for a null sha1. It's used as a sentinel for "no such > object" in lots of places, so leaking through to this level > is a sign that the higher-level code is not being careful > about its error-checking. In practice, though, quite a few > code paths seem to rely on the null sha1 lookup failing as a > way to quietly propagate non-existence (e.g., by feeding it > to lookup_commit_reference_gently(), which then returns > NULL). > > When this happens, we do two inefficient things: > > 1. We actually search for the null sha1 in packs and in > the loose object directory. > > 2. When we fail to find it, we re-scan the pack directory > in case a simultaneous repack happened to move it from > loose to packed. > > It's debatable whether (1) is a good idea or not. The > performance drop is at least linear in the number of > lookups, so it's not too bad. And if by some 2^-160th chance > somebody does have such an object, we probably ought to > access it. On the other hand, enough code paths treat the > null sha1 specially that the object probably isn't usable, > anyway. > > Problem (2), on the other hand, is a pretty severe > performance issue. If you have a large number of packs, > rescanning the pack directory can be very expensive. And it > only helps in the case that you have an object with the null > sha1 _and_ somebody was simultaneously repacking it. The > tradeoff is certainly a bad one there. > > In an ideal world, we'd simply fix all of the callers to > notice the null sha1 and avoid passing it to us. But a > simple experiment to catch this with a BUG() shows that > there are a large number of code paths. > > In the meantime, let's address problem (2). That's the > minimal fix we can do to make the performance problems go > away. p5551 shows this off (when a fetched ref is new, the > "old" sha1 is 0{40}, which ends up being passed for > fast-forward checks, the status table abbreviations, etc): > > TestHEAD^ HEAD > > 5551.4: fetch 5.51(5.03+0.48) 0.17(0.10+0.06) -96.9% > > We could address (1), too, but there's not much performance > improvement left to make. > > Signed-off-by: Jeff King > --- > This is the minimal fix that addresses the performance issues. > I'd actually have no problem at all declaring that looking up a null > sha1 is insane, and having the object-lookup routines simply return "no > such object" without even doing the loose/pack lookup first. > > I'm also fine with going down the BUG() route and fixing the > higher-level callers, but it's a big enough task (with little enough > real-world impact) that I think it would be worth applying this in the > meantime. It would have a lot of impact in the future, when new developers are hindered mis-using the API. The (unwritten) series with introducing BUG() would help a lot in 'holding it right' and I would expect fewer performance regressions over time. The patch is impressively small for such a performance gain. Personally I think (1) (which essentially means "making null sha1 work like a regular sha1") is quite an endeavor at this point in time for this code base. As a tangent, a similar but solved problem in the diff code is how NUL in user data is treated in xdiff for example, as there we kept being careful since the beginning (though I think we don't have tests for it, so it might be broken) Stefan
[PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1
In theory nobody should ever ask the low-level object code for a null sha1. It's used as a sentinel for "no such object" in lots of places, so leaking through to this level is a sign that the higher-level code is not being careful about its error-checking. In practice, though, quite a few code paths seem to rely on the null sha1 lookup failing as a way to quietly propagate non-existence (e.g., by feeding it to lookup_commit_reference_gently(), which then returns NULL). When this happens, we do two inefficient things: 1. We actually search for the null sha1 in packs and in the loose object directory. 2. When we fail to find it, we re-scan the pack directory in case a simultaneous repack happened to move it from loose to packed. It's debatable whether (1) is a good idea or not. The performance drop is at least linear in the number of lookups, so it's not too bad. And if by some 2^-160th chance somebody does have such an object, we probably ought to access it. On the other hand, enough code paths treat the null sha1 specially that the object probably isn't usable, anyway. Problem (2), on the other hand, is a pretty severe performance issue. If you have a large number of packs, rescanning the pack directory can be very expensive. And it only helps in the case that you have an object with the null sha1 _and_ somebody was simultaneously repacking it. The tradeoff is certainly a bad one there. In an ideal world, we'd simply fix all of the callers to notice the null sha1 and avoid passing it to us. But a simple experiment to catch this with a BUG() shows that there are a large number of code paths. In the meantime, let's address problem (2). That's the minimal fix we can do to make the performance problems go away. p5551 shows this off (when a fetched ref is new, the "old" sha1 is 0{40}, which ends up being passed for fast-forward checks, the status table abbreviations, etc): TestHEAD^ HEAD 5551.4: fetch 5.51(5.03+0.48) 0.17(0.10+0.06) -96.9% We could address (1), too, but there's not much performance improvement left to make. Signed-off-by: Jeff King--- This is the minimal fix that addresses the performance issues. I'd actually have no problem at all declaring that looking up a null sha1 is insane, and having the object-lookup routines simply return "no such object" without even doing the loose/pack lookup first. I'm also fine with going down the BUG() route and fixing the higher-level callers, but it's a big enough task (with little enough real-world impact) that I think it would be worth applying this in the meantime. sha1_file.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sha1_file.c b/sha1_file.c index 8a7c6b7eba..dde0ad101d 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1180,6 +1180,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi, if (!sha1_loose_object_info(real, oi, flags)) return 0; + if (is_null_sha1(sha1)) + return -1; + /* Not a loose object; someone else may have just packed it. */ if (flags & OBJECT_INFO_QUICK) { return -1; -- 2.15.0.494.g79a8547723