Re: [PATCH 0/16] make prune mtime-checking more careful
Jeff King writes: > So that explains that bug (as a side note, you might think that if we > are flipping return values, lots of things would fail when they ask "do > we have this packed object" and it erroneously says "yes". But that does > not happen. The wrong return value comes from freshening the file, so we > only flip "yes" to "no", and never the opposite). > ... > When dt/cache-tree-repair is merged, we have a valid cache tree when we > run "git commit", and we realize that we do not need to write out the > tree object at all. Thus we never hit the buggy code, the object isn't > created, and the subsequent prune reports that there is nothing to > prune. Wow, that is a tricky "bug" (rather the series with the bug failed to fail when applied to 'master', so it is an "unbug" that hides a bug) to hunt down. Thanks for digging. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16] make prune mtime-checking more careful
On Sun, Oct 05, 2014 at 09:42:49PM -0400, Jeff King wrote: > On Sat, Oct 04, 2014 at 03:22:10PM -0700, Junio C Hamano wrote: > > > This applied on top of 'maint' (which does have c40fdd01) makes the > > test #9 (prune: do not prune detached HEAD with no reflog) fail. > > I'll fix the bone-headed error returns that René noticed and double > check that they were the complete culprit in the test failure you saw > (and not just masking some other problem). OK, I figured out what is going on. The short of it is that yes, it's the return-value bug René noticed. Read on only if you are really interested. :) The test runs "git prune" and intends to check that the detached HEAD commit is not in the list. It does so by checking the whole output of "git prune -n", which it expects to be empty (and it generally is, because we've gc'd in previous tests and all objects were either packed or pruned). However, when the test fails, there is a pruned object. But it is not the HEAD commit that we just wrote, but rather the tree that it points to. When we run "git commit", it tries to write out the tree with write_sha1_file. This in turn calls freshen_packed_object to check whether we have the object packed. We do, but the return-value bug makes us think we do not. As a result, we write out an extra copy of the tree as a loose object, and that is what gets pruned (and not by prune's normal is-it-old-and-unreachable code, but by its call to prune_packed). So that explains that bug (as a side note, you might think that if we are flipping return values, lots of things would fail when they ask "do we have this packed object" and it erroneously says "yes". But that does not happen. The wrong return value comes from freshening the file, so we only flip "yes" to "no", and never the opposite). > > If we merge 'dt/cache-tree-repair' (which in turn needs > > 'jc/reopen-lock-file') to 'maint' and then apply these on top, the > > said test passes. But I do not see an apparent reason why X-<. When dt/cache-tree-repair is merged, we have a valid cache tree when we run "git commit", and we realize that we do not need to write out the tree object at all. Thus we never hit the buggy code, the object isn't created, and the subsequent prune reports that there is nothing to prune. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16] make prune mtime-checking more careful
On Sat, Oct 04, 2014 at 03:22:10PM -0700, Junio C Hamano wrote: > This applied on top of 'maint' (which does have c40fdd01) makes the > test #9 (prune: do not prune detached HEAD with no reflog) fail. I'll fix the bone-headed error returns that René noticed and double check that they were the complete culprit in the test failure you saw (and not just masking some other problem). > If we merge 'dt/cache-tree-repair' (which in turn needs > 'jc/reopen-lock-file') to 'maint' and then apply these on top, the > said test passes. But I do not see an apparent reason why X-<. I suspect it's an unintended interaction that just happens to let my bogus code code the right thing, but I'll double check on that, too. Thanks both of you for spotting the problems. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16] make prune mtime-checking more careful
Am 05.10.2014 um 00:22 schrieb Junio C Hamano: Jeff King writes: There's quite a lot of patches here, but most of them are preparatory cleanups. The meat is in patches 13, 15, and 16. [01/16]: foreach_alt_odb: propagate return value from callback [02/16]: isxdigit: cast input to unsigned char [03/16]: object_array: factor out slopbuf-freeing logic [04/16]: object_array: add a "clear" function [05/16]: clean up name allocation in prepare_revision_walk [06/16]: reachable: clear pending array after walking it [07/16]: t5304: use test_path_is_* instead of "test -f" [08/16]: t5304: use helper to report failure of "test foo = bar" [09/16]: prune: factor out loose-object directory traversal [10/16]: count-objects: do not use xsize_t when counting object size [11/16]: count-objects: use for_each_loose_file_in_objdir [12/16]: sha1_file: add for_each iterators for loose and packed objects [13/16]: prune: keep objects reachable from recent objects [14/16]: pack-objects: refactor unpack-unreachable expiration check [15/16]: pack-objects: match prune logic for discarding objects [16/16]: write_sha1_file: freshen existing objects Somebody please help me out here. This applied on top of 'maint' (which does have c40fdd01) makes the test #9 (prune: do not prune detached HEAD with no reflog) fail. The test passes if the return value of freshen_file() is negated in freshen_packed_object() (see my reply to patch 16). If we merge 'dt/cache-tree-repair' (which in turn needs 'jc/reopen-lock-file') to 'maint' and then apply these on top, the said test passes. But I do not see an apparent reason why X-<. Didn't check this interaction, but it sounds strange. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16] make prune mtime-checking more careful
Jeff King writes: > There's quite a lot of patches here, but most of them are preparatory > cleanups. The meat is in patches 13, 15, and 16. > > [01/16]: foreach_alt_odb: propagate return value from callback > [02/16]: isxdigit: cast input to unsigned char > [03/16]: object_array: factor out slopbuf-freeing logic > [04/16]: object_array: add a "clear" function > [05/16]: clean up name allocation in prepare_revision_walk > [06/16]: reachable: clear pending array after walking it > [07/16]: t5304: use test_path_is_* instead of "test -f" > [08/16]: t5304: use helper to report failure of "test foo = bar" > [09/16]: prune: factor out loose-object directory traversal > [10/16]: count-objects: do not use xsize_t when counting object size > [11/16]: count-objects: use for_each_loose_file_in_objdir > [12/16]: sha1_file: add for_each iterators for loose and packed objects > [13/16]: prune: keep objects reachable from recent objects > [14/16]: pack-objects: refactor unpack-unreachable expiration check > [15/16]: pack-objects: match prune logic for discarding objects > [16/16]: write_sha1_file: freshen existing objects Somebody please help me out here. This applied on top of 'maint' (which does have c40fdd01) makes the test #9 (prune: do not prune detached HEAD with no reflog) fail. If we merge 'dt/cache-tree-repair' (which in turn needs 'jc/reopen-lock-file') to 'maint' and then apply these on top, the said test passes. But I do not see an apparent reason why X-<. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/16] make prune mtime-checking more careful
Jeff King writes: > ... The objects are removed by prune, which doesn't realize > that they are part of an ongoing operation. Prune uses the filesystem > mtime to determine this, but we are not very thorough in making sure > that is kept up to date. The whole series looked quite sensible. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/16] make prune mtime-checking more careful
At GitHub we've occasionally run across repos getting corrupted by trees and blobs near the tip going missing. We do a lot of "test merges" between branches and HEAD (this is what feeds the "OK to merge" button on the web interface), and the objects are almost always related to these merges. The objects are removed by prune, which doesn't realize that they are part of an ongoing operation. Prune uses the filesystem mtime to determine this, but we are not very thorough in making sure that is kept up to date. This series tries to fix that with two techniques: 1. When we try to write an object to disk, we optimize out the write if we already have the object. Instead, we should still update the mtime of the object in this case. 2. Treat objects reachable from "recent" objects as recent themselves. When we check that we have an object, we do not check whether we have all of the objects it can reach. If you have some new objects that refer to some old objects (e.g., you create and delete a tree on day 1, and then create a new tree referring to the blob on day 2), then prune may delete the old object but not the new (in this case, we delete the blob but not the tree). Any subsequent use of the new object will check that we have it (e.g., commit-tree makes sure we have the tree we feed it), but not other objects it can reach. This can lead to referencing a half-formed part of the graph. Note that this does not make prune race-free. For example, you could check for and update the mtime of an object just as prune is deleting it, and think that it is written when it is not. Fixing that would require some atomic mechanism for prune to check the mtime and delete. But I do think this series cuts us down to "real" race conditions, with millisecond-ish timing. The problems we're fixing here are much worse than that. The distance between an object being written and being referred may operate on human timescales (e.g., writing a commit message). Or the time distance between two objects that refer to each other may be days or weeks; a prune where one falls in the "recent" boundary and another does not can be disastrous. There's quite a lot of patches here, but most of them are preparatory cleanups. The meat is in patches 13, 15, and 16. [01/16]: foreach_alt_odb: propagate return value from callback [02/16]: isxdigit: cast input to unsigned char [03/16]: object_array: factor out slopbuf-freeing logic [04/16]: object_array: add a "clear" function [05/16]: clean up name allocation in prepare_revision_walk [06/16]: reachable: clear pending array after walking it [07/16]: t5304: use test_path_is_* instead of "test -f" [08/16]: t5304: use helper to report failure of "test foo = bar" [09/16]: prune: factor out loose-object directory traversal [10/16]: count-objects: do not use xsize_t when counting object size [11/16]: count-objects: use for_each_loose_file_in_objdir [12/16]: sha1_file: add for_each iterators for loose and packed objects [13/16]: prune: keep objects reachable from recent objects [14/16]: pack-objects: refactor unpack-unreachable expiration check [15/16]: pack-objects: match prune logic for discarding objects [16/16]: write_sha1_file: freshen existing objects Note that these aren't yet running on GitHub servers. I know that they fix real potential problems (see the new t6501 for examples), but I don't know for sure if they will catch the problems we have seen. The frequency of these issues is relatively rare, so even once deployed, we won't know for sure until a few weeks or months have passed. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html