Re: Help with "fatal: unable to read ...." error during GC?
On Sat, Aug 11, 2018 at 4:25 PM Jeff King wrote: > > I do still have these warnings and no amount of git gc/git fsck/etc. > > has reduced them in any way: > > > > $ git gc > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > I think these would go away via "reflog expire" (I'd have thought "git > gc" would do so, though). I wonder if this is yet another tool that > needs to be taught about worktree heads. You would need "reflog expire --expire-unreachable=now" because the default 30 days are probably too long for this case. And yes "reflog expire --all" needs to be aware of other heads (and other per-worktree refs in general). I'm pretty sure right now it only cares about the current worktree's head. -- Duy
Re: Help with "fatal: unable to read ...." error during GC?
On Sat, Aug 11, 2018 at 04:38:00PM +0200, Duy Nguyen wrote: > On Sat, Aug 11, 2018 at 4:25 PM Jeff King wrote: > > Responding myself and adding Duy to the cc to increase visibility among > > worktree experts. :) > > I do silently watch this thread (and yes I still have to fix that fsck > thing, hit a roadblock with ref names but I should really restart it > soon). Now you have found one more thing for me to do. Why Jeff why? > j/k :) I was actually thinking about doing it myself, but was worried that the refactoring might complicate things. And it sounds from the fact that you looked into it and hit a roadblock that it is more complicated than I thought. So I'll leave it for now, but I'm happy to review or discuss ideas. -Peff
Re: Help with "fatal: unable to read ...." error during GC?
On Sat, Aug 11, 2018 at 4:25 PM Jeff King wrote: > Responding myself and adding Duy to the cc to increase visibility among > worktree experts. :) I do silently watch this thread (and yes I still have to fix that fsck thing, hit a roadblock with ref names but I should really restart it soon). Now you have found one more thing for me to do. Why Jeff why? j/k -- Duy
Re: Help with "fatal: unable to read ...." error during GC?
On Sat, Aug 11, 2018 at 10:23:41AM -0400, Jeff King wrote: > > I do still have these warnings and no amount of git gc/git fsck/etc. > > has reduced them in any way: > > > > $ git gc > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > warning: reflog of 'HEAD' references pruned commits > > I think these would go away via "reflog expire" (I'd have thought "git > gc" would do so, though). I wonder if this is yet another tool that > needs to be taught about worktree heads. > > > I've run git gc --prune=all then git fsck reports only these dangling > > commits: > > > > dangling commit cef0678a5e0765506e3fac41286696fd37a9b1e9 > > dangling commit 1729195f021a1b95ea8ca10b9c32e76bf2257e67 > > dangling commit 08385b9731291607a8c6d4bf10272002d8f31e1f > > dangling commit c4ddfb2139eeb5a3c132dbfc84cc6e27fdeb46d1 > > dangling commit 1df8ebcc1cd5f59dd224ce1f3ba39f24370cf4e7 > > > > (this is down from probably 50 or so "dangling ..." commits, blobs, and > > trees before). > > I'd also expect "--prune=all" to drop all dangling heads. But I think > this is the worktree thing, again. The code in fsck starts it > connectivity check with this: > > if (head_points_at && !is_null_oid(&head_oid)) > fsck_handle_ref("HEAD", &head_oid, 0, NULL); > for_each_rawref(fsck_handle_ref, NULL); > if (include_reflogs) > for_each_reflog(fsck_handle_reflog, NULL); > > but looking at the similar code in revision.c that has been upgraded to > handle worktrees (e.g., add_reflogs_to_pending()), I think that is not > going to look at worktree HEADs nor reflogs. > > I'd hoped to give you a one-liner to try out, but I think it will > require some refactoring. Responding myself and adding Duy to the cc to increase visibility among worktree experts. :) -Peff
Re: Help with "fatal: unable to read ...." error during GC?
On Sat, Aug 11, 2018 at 08:13:17AM -0400, Paul Smith wrote: > I rebuilt Git 2.18.0 without optimization to try to get more debug > information. Unfortunately I didn't think to create a backup of my > problematic .git directory. > > When I ran the above command under the debugger using the non-optimized > version of Git... it worked! That fixed the problem so that now when I > run "git gc" using the original optimized version I no longer see the > issue there either. > > So... clearly something is wrong but because I was dumb and didn't make > a backup I can no longer reproduce the problem :(. On the other hand, > my repository is no longer throwing errors so that's good. Heh. Well, I would like to have known what the problem was. But if it never happens again, we can go on with our lives. And if it does, then we'll have another reproduction. :) The fact that disabling optimizations changed things is worrisome. It makes me wonder if this is somehow related to the struct-packing changes in pack-objects. I don't know of any problems there, but in some modifications to them post-v2.18, we had to deal with some race conditions. One alternative theory is that it wasn't the optimizations at all, but rather the clock moving forward. The repack process cares about the current time with respect to the mtime of the unreachable objects on disk. It's possible that between yesterday and today, some objects crossed the line to "too old to keep" (though from the earlier digging, I'm not sure this is related to unreachable objects at all). Thanks for your patience in digging into this, and please let us know if you run into similar problems again. > I do still have these warnings and no amount of git gc/git fsck/etc. > has reduced them in any way: > > $ git gc > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits I think these would go away via "reflog expire" (I'd have thought "git gc" would do so, though). I wonder if this is yet another tool that needs to be taught about worktree heads. > I've run git gc --prune=all then git fsck reports only these dangling > commits: > > dangling commit cef0678a5e0765506e3fac41286696fd37a9b1e9 > dangling commit 1729195f021a1b95ea8ca10b9c32e76bf2257e67 > dangling commit 08385b9731291607a8c6d4bf10272002d8f31e1f > dangling commit c4ddfb2139eeb5a3c132dbfc84cc6e27fdeb46d1 > dangling commit 1df8ebcc1cd5f59dd224ce1f3ba39f24370cf4e7 > > (this is down from probably 50 or so "dangling ..." commits, blobs, and > trees before). I'd also expect "--prune=all" to drop all dangling heads. But I think this is the worktree thing, again. The code in fsck starts it connectivity check with this: if (head_points_at && !is_null_oid(&head_oid)) fsck_handle_ref("HEAD", &head_oid, 0, NULL); for_each_rawref(fsck_handle_ref, NULL); if (include_reflogs) for_each_reflog(fsck_handle_reflog, NULL); but looking at the similar code in revision.c that has been upgraded to handle worktrees (e.g., add_reflogs_to_pending()), I think that is not going to look at worktree HEADs nor reflogs. I'd hoped to give you a one-liner to try out, but I think it will require some refactoring. -Peff
Re: Help with "fatal: unable to read ...." error during GC?
On Wed, 2018-08-08 at 14:24 -0400, Jeff King wrote: > If so, can you try running it under gdb and getting a stack trace? > Something like: > > gdb git > [and then inside gdb...] > set args pack-objects --all --reflog --indexed-objects foobreak die > run > bt > > That might give us a clue where the broken object reference is coming > from. Oh no. I messed up :(. I rebuilt Git 2.18.0 without optimization to try to get more debug information. Unfortunately I didn't think to create a backup of my problematic .git directory. When I ran the above command under the debugger using the non-optimized version of Git... it worked! That fixed the problem so that now when I run "git gc" using the original optimized version I no longer see the issue there either. So... clearly something is wrong but because I was dumb and didn't make a backup I can no longer reproduce the problem :(. On the other hand, my repository is no longer throwing errors so that's good. I do still have these warnings and no amount of git gc/git fsck/etc. has reduced them in any way: $ git gc warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits Enumerating objects: 506556, done. Counting objects: 100% (506556/506556), done. Delta compression using up to 8 threads. Compressing objects: 100% (101199/101199), done. Writing objects: 100% (506556/506556), done. Total 506556 (delta 358957), reused 506556 (delta 358957) warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits Checking connectivity: 506556, done. I've run git gc --prune=all then git fsck reports only these dangling commits: dangling commit cef0678a5e0765506e3fac41286696fd37a9b1e9 dangling commit 1729195f021a1b95ea8ca10b9c32e76bf2257e67 dangling commit 08385b9731291607a8c6d4bf10272002d8f31e1f dangling commit c4ddfb2139eeb5a3c132dbfc84cc6e27fdeb46d1 dangling commit 1df8ebcc1cd5f59dd224ce1f3ba39f24370cf4e7 (this is down from probably 50 or so "dangling ..." commits, blobs, and trees before).
Re: Help with "fatal: unable to read ...." error during GC?
On Wed, Aug 08, 2018 at 10:45:49PM -0400, Paul Smith wrote: > On Wed, 2018-08-08 at 14:24 -0400, Jeff King wrote: > > If so, can you try running it under gdb and getting a stack trace? > > Something like: > > > > gdb git > > [and then inside gdb...] > > set args pack-objects --all --reflog --indexed-objects foo > break die > > run > > bt > > > > That might give us a clue where the broken object reference is coming > > Here we go. I can rebuild with -Og or -O0 if more detailed debugging > is needed; most everything appears to be optimized out: No, I think this is enough to give a general sense of the problem location. > Compressing objects: 100% (10/10), done. > Writing objects: 54% (274416/508176) > Thread 1 "git" hit Breakpoint 1, die (err=err@entry=0x5a373a "unable to read > %s") at usage.c:119 > 119 { > (gdb) bt > #0 die (err=err@entry=0x5a373a "unable to read %s") at usage.c:119 > #1 0x004563f3 in get_delta (entry=) at > builtin/pack-objects.c:143 > #2 write_no_reuse_object () at builtin/pack-objects.c:308 > #3 0x00456592 in write_reuse_object (usable_delta=, > limit=, entry=, f=) at > builtin/pack-objects.c:516 > #4 write_object (write_offset=, entry=0x7fffc9a8d940, > f=0x198fb70) at builtin/pack-objects.c:518 > #5 write_one () at builtin/pack-objects.c:576 > #6 0x004592f0 in write_pack_file () at builtin/pack-objects.c:849 > #7 cmd_pack_objects (argc=, argv=, > prefix=) at builtin/pack-objects.c:3354 > #8 0x00404f06 in run_builtin (argv=, argc= out>, p=) at git.c:417 > #9 handle_builtin (argc=, argv=) at git.c:632 > #10 0x00405f21 in run_argv (argv=0x7fffe210, > argcp=0x7fffe21c) at git.c:761 > #11 cmd_main (argc=, argc@entry=6, argv=, > argv@entry=0x7fffe448) at git.c:761 > #12 0x00404b15 in main (argc=6, argv=0x7fffe448) at > common-main.c:45 So that's quite unexpected. I assumed we'd have hit this problem while deciding _which_ objects to write. But we get all the way to the point of writing out the result before we notice it's missing. I don't think I've run such a case before, but I wonder if "pack-objects --all" is too lax about adding missing blobs during its object traversal (especially during the "unreachable but recent" part of the traversal that I mentioned, which should silently omit missing objects). I played around with recreating this situation, though, and I don't think it's possible to cause the results you're seeing. We come up with a list of recent objects, but we only use it as a look-up index for discarding too-old objects. So: - it wouldn't ever cause us to choose to write an object into a pack, which is what you're seeing - we'd never consider a missing object; it's a pure lookup table, and the actual list of objects we consider is found by walking the set of packs So that's probably a dead end. What I really wonder is where we found out about that object name in the first place. Can you instrument your Git build like this: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 71056d8294..5ff6de5ddf 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1112,6 +1112,13 @@ static int add_object_entry(const struct object_id *oid, enum object_type type, struct packed_git *found_pack = NULL; off_t found_offset = 0; uint32_t index_pos; + static const struct object_id funny_oid = { + "\xc1\x04\xb8\xfb\x36\x31\xb5\xc5\x46\x95" + "\x20\x6b\x2f\x73\x31\x0c\x02\x3c\x99\x63" + }; + + if (!oidcmp(oid, &funny_oid)) + warning("found funny oid"); display_progress(progress_state, ++nr_seen); and similarly get a backtrace when we hit that warning()? (Or if you're a gdb expert, you could probably use a conditional breakpoint, but I find just modifying the source easier). -Peff
Re: Help with "fatal: unable to read ...." error during GC?
On Wed, 2018-08-08 at 14:24 -0400, Jeff King wrote: > If so, can you try running it under gdb and getting a stack trace? > Something like: > > gdb git > [and then inside gdb...] > set args pack-objects --all --reflog --indexed-objects foobreak die > run > bt > > That might give us a clue where the broken object reference is coming Here we go. I can rebuild with -Og or -O0 if more detailed debugging is needed; most everything appears to be optimized out: ... Compressing objects: 100% (10/10), done. Writing objects: 54% (274416/508176) Thread 1 "git" hit Breakpoint 1, die (err=err@entry=0x5a373a "unable to read %s") at usage.c:119 119 { (gdb) bt #0 die (err=err@entry=0x5a373a "unable to read %s") at usage.c:119 #1 0x004563f3 in get_delta (entry=) at builtin/pack-objects.c:143 #2 write_no_reuse_object () at builtin/pack-objects.c:308 #3 0x00456592 in write_reuse_object (usable_delta=, limit=, entry=, f=) at builtin/pack-objects.c:516 #4 write_object (write_offset=, entry=0x7fffc9a8d940, f=0x198fb70) at builtin/pack-objects.c:518 #5 write_one () at builtin/pack-objects.c:576 #6 0x004592f0 in write_pack_file () at builtin/pack-objects.c:849 #7 cmd_pack_objects (argc=, argv=, prefix=) at builtin/pack-objects.c:3354 #8 0x00404f06 in run_builtin (argv=, argc=, p=) at git.c:417 #9 handle_builtin (argc=, argv=) at git.c:632 #10 0x00405f21 in run_argv (argv=0x7fffe210, argcp=0x7fffe21c) at git.c:761 #11 cmd_main (argc=, argc@entry=6, argv=, argv@entry=0x7fffe448) at git.c:761 #12 0x00404b15 in main (argc=6, argv=0x7fffe448) at common-main.c:45
Re: Help with "fatal: unable to read ...." error during GC?
On Wed, 2018-08-08 at 14:24 -0400, Jeff King wrote: > Let's narrow it down first and make sure we're dying where I expect. > Can > you try: > > GIT_TRACE=1 git gc > > and confirm the program running when the fatal error is produced? > > From what you've shown it's going to be git-repack, but what I'm not > clear on is whether it is repack itself that is complaining, or the > pack-objects process it spawns. I'd guess the latter. You are correct: 15:27:24.264161 git.c:415 trace: built-in: git pack- objects --keep-true-parents --honor-pack-keep --non-empty --all -- reflog --indexed-objects --unpack-unreachable=2.weeks.ago --local -- delta-base-offset .git/objects/pack/.tmp-17617-pack > If so, can you try running it under gdb and getting a stack trace? I would... but I discovered all my Git binaries are stripped to the max and no symbols available. I'll do a quick rebuild with some debug info and get back to you. Thanks for the pointers!
Re: Help with "fatal: unable to read ...." error during GC?
On Wed, Aug 08, 2018 at 01:35:30PM -0400, Paul Smith wrote: > Thanks for the note! Unhappily for me none of these operations seem to > find any actionable problems... > [...] Drat. One other option is that it _could_ be related to the "old unreachable objects that are reachable from recent unreachable objects should be kept" code. That's supposed to quietly ignore broken links in unreachable objects, but there could be a bug. Let's narrow it down first and make sure we're dying where I expect. Can you try: GIT_TRACE=1 git gc and confirm the program running when the fatal error is produced? >From what you've shown it's going to be git-repack, but what I'm not clear on is whether it is repack itself that is complaining, or the pack-objects process it spawns. I'd guess the latter. If so, can you try running it under gdb and getting a stack trace? Something like: gdb git [and then inside gdb...] set args pack-objects --all --reflog --indexed-objects foo
Re: Help with "fatal: unable to read ...." error during GC?
On Wed, 2018-08-08 at 12:06 -0400, Jeff King wrote: > I'd have expected fsck to find it, too. However, looking at the code, > I'm not convinced that fsck is actually considering detached worktree > heads properly, either. Try: > > git rev-list --all --reflog --objects >/dev/null > > which I know checks worktrees correctly. I'd expect that to fail. > > If it does, then we need to narrow down which worktree is corrupt. > Perhaps something like: > > git worktree list | > while read worktree head junk; do > git rev-list --objects $head >/dev/null || > echo "$worktree seems corrupt" > done Thanks for the note! Unhappily for me none of these operations seem to find any actionable problems... $ git rev-list --all --reflog --objects >/dev/null warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits $ echo $? 0 $ git worktree list | while read wt head junk; do \ git rev-list --objects $head >/dev/null || echo "$wt seems corrupt"; \ done $ Just to be sure I updated the loop above to echo $wt and $head and they were correct. I also re-ran git gc after the above and still got the original error output so it didn't magically fix itself :).
Re: Help with "fatal: unable to read ...." error during GC?
On Wed, Aug 08, 2018 at 10:30:11AM -0400, Paul Smith wrote: > I recently upgraded from Git 2.9.2 to 2.18.0 (note, I have no > particular reason to believe this is related just passing info). I'm > running on Linux (64bit Ubuntu 18.04.1 but I've compiled Git myself > from source, I'm not using the distro version). > > I have a local repository I've been using for about two years (the > .git/description file, which I don't use, has a TLM of July 31, 2016), > with lots of worktrees being created/pruned/etc. during that time. > > Note I'm doing all these operations in the 'main' repository, not in > any of the worktrees. Hrm, there was a pretty serious corruption bug in early versions of the worktree code (IIRC, pruning would not consider detached HEADs from other worktrees, and could drop that object). > Yesterday, when I tried to fetch from my upstream I got a notification > about GC needed. Then GC failed with these errors (HEAD is set to > master which is the same as origin/master): > > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > warning: reflog of 'HEAD' references pruned commits > fatal: unable to read c104b8fb3631b5c54695206b2f73310c023c9963 > error: failed to run repack So that definitely looks like the corruption I'd expect from the worktree bug, but... > I ran a git fsck --full which showed me a lot of dangling commits and > blobs, but no errors, no broken link messages, etc. I'd have expected fsck to find it, too. However, looking at the code, I'm not convinced that fsck is actually considering detached worktree heads properly, either. Try: git rev-list --all --reflog --objects >/dev/null which I know checks worktrees correctly. I'd expect that to fail. If it does, then we need to narrow down which worktree is corrupt. Perhaps something like: git worktree list | while read worktree head junk; do git rev-list --objects $head >/dev/null || echo "$worktree seems corrupt" done > I can't find that SHA anywhere: I looked in .git/objects, etc. I also > can't find any problems with my repo; obviously I haven't checked > everything but I can show the git log back to the initial commit, all > my stashes look fine, all my worktrees seem to be OK (git status etc. > work fine in all of them). "git status" might succeed if the corruption is further back in the history. > I would hate to have to throw this setup away since it has 23 stashes > and 25 worktrees in various states that would be annoying to have to > recreate... Definitely don't throw it away. I suspect you have a single corrupt worktree, and everything else is fine. -Peff
Help with "fatal: unable to read ...." error during GC?
I recently upgraded from Git 2.9.2 to 2.18.0 (note, I have no particular reason to believe this is related just passing info). I'm running on Linux (64bit Ubuntu 18.04.1 but I've compiled Git myself from source, I'm not using the distro version). I have a local repository I've been using for about two years (the .git/description file, which I don't use, has a TLM of July 31, 2016), with lots of worktrees being created/pruned/etc. during that time. Note I'm doing all these operations in the 'main' repository, not in any of the worktrees. Yesterday, when I tried to fetch from my upstream I got a notification about GC needed. Then GC failed with these errors (HEAD is set to master which is the same as origin/master): warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits warning: reflog of 'HEAD' references pruned commits fatal: unable to read c104b8fb3631b5c54695206b2f73310c023c9963 error: failed to run repack I ran a git fsck --full which showed me a lot of dangling commits and blobs, but no errors, no broken link messages, etc. I ran git reflog expire --all --stale-fix but no change. I can't find that SHA anywhere: I looked in .git/objects, etc. I also can't find any problems with my repo; obviously I haven't checked everything but I can show the git log back to the initial commit, all my stashes look fine, all my worktrees seem to be OK (git status etc. work fine in all of them). But whenever I pull etc. Git wants to run gc and I get this set of errors again. FWIW other repos created from the same remote don't show any issues so it appears to be just this local copy of the repo. I've seen many SO and blog posts about issues like this but all were concentrating on recovering things and I don't even know if I've lost anything... and anyway the operations they suggest don't work for me because nothing can access that SHA; I just get "bad object". Any ideas on what to look at next? I would hate to have to throw this setup away since it has 23 stashes and 25 worktrees in various states that would be annoying to have to recreate...