Re: [PATCH v3 16/20] commit-graph: verify contents match checksum
On 6/2/2018 11:52 AM, Jakub Narebski wrote: Derrick Stolee writes: The commit-graph file ends with a SHA1 hash of the previous contents. If a commit-graph file has errors but the checksum hash is correct, then we know that the problem is a bug in Git and not simply file corruption after-the-fact. Compute the checksum right away so it is the first error that appears, and make the message translatable since this error can be "corrected" by a user by simply deleting the file and recomputing. The rest of the errors are useful only to developers. Should we then provide --quiet / --verbose options, so that ordinary user is not flooded with error messages meant for power users and Git developers, then? Be sure to continue checking the rest of the file data if the checksum is wrong. This is important for our tests, as we break the checksum as we modify bytes of the commit-graph file. Well, we could have used sha1sum program, or test-sha1 helper to fix the checksum after corrupting the commit-graph file... Signed-off-by: Derrick Stolee --- commit-graph.c | 16 ++-- t/t5318-commit-graph.sh | 6 ++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index d2b291aca2..a33600c584 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -841,6 +841,7 @@ void write_commit_graph(const char *obj_dir, oids.nr = 0; } +#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2 static int verify_commit_graph_error; static void graph_report(const char *fmt, ...) @@ -860,7 +861,9 @@ static void graph_report(const char *fmt, ...) int verify_commit_graph(struct commit_graph *g) { uint32_t i, cur_fanout_pos = 0; - struct object_id prev_oid, cur_oid; + struct object_id prev_oid, cur_oid, checksum; + struct hashfile *f; + int devnull; if (!g) { graph_report("no commit-graph file loaded"); @@ -879,6 +882,15 @@ int verify_commit_graph(struct commit_graph *g) if (verify_commit_graph_error) return verify_commit_graph_error; + devnull = open("/dev/null", O_WRONLY); + f = hashfd(devnull, NULL); + hashwrite(f, g->data, g->data_len - g->hash_len); + finalize_hashfile(f, checksum.hash, CSUM_CLOSE); + if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) { + graph_report(_("the commit-graph file has incorrect checksum and is likely corrupt")); + verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH; + } Is it the best way of calculating the SHA-1 checksum that out internal APIs provide? Is it how SHA-1 checksum is calculated and checked for packfiles? This pattern is similar to hashfd_check() in csum-file.c, except we are hashing the file data directly instead of re-creating it from scratch (as is done for 'git index-pack --verify'). + for (i = 0; i < g->num_commits; i++) { struct commit *graph_commit; @@ -916,7 +928,7 @@ int verify_commit_graph(struct commit_graph *g) cur_fanout_pos++; } - if (verify_commit_graph_error) + if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH) return verify_commit_graph_error; So if we detected that checksum do not match, but we have not found an error, we say that it is all right? This only prevents us from stopping early. We will still report an error. for (i = 0; i < g->num_commits; i++) { diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 240aef6add..2680a2ebff 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=`expr $HASH_LEN + 16` GRAPH_OCTOPUS_DATA_OFFSET=`expr $GRAPH_COMMIT_DATA_OFFSET + \ $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS` GRAPH_BYTE_OCTOPUS=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4` +GRAPH_BYTE_FOOTER=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES` # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -388,4 +389,9 @@ test_expect_success 'detect incorrect parent for octopus merge' ' "invalid parent" ' +test_expect_success 'detect invalid checksum hash' ' + corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \ + "incorrect checksum" This would not work under GETTEXT_POISON, as the message is marked as translatable, but corrupt_graph_and_verify uses 'grep' and not 'test_i18grep' from t/test-lib-functions.sh I fixed this locally based on Szeder's comment. +' If it is pure checksum corruption, wouldn't this fail because it is not a failure (exit code is 0)? It is not zero, so the test passes.
Re: [PATCH v3 16/20] commit-graph: verify contents match checksum
Derrick Stolee writes: > The commit-graph file ends with a SHA1 hash of the previous contents. If > a commit-graph file has errors but the checksum hash is correct, then we > know that the problem is a bug in Git and not simply file corruption > after-the-fact. > > Compute the checksum right away so it is the first error that appears, > and make the message translatable since this error can be "corrected" by > a user by simply deleting the file and recomputing. The rest of the > errors are useful only to developers. Should we then provide --quiet / --verbose options, so that ordinary user is not flooded with error messages meant for power users and Git developers, then? > > Be sure to continue checking the rest of the file data if the checksum > is wrong. This is important for our tests, as we break the checksum as > we modify bytes of the commit-graph file. Well, we could have used sha1sum program, or test-sha1 helper to fix the checksum after corrupting the commit-graph file... > > Signed-off-by: Derrick Stolee > --- > commit-graph.c | 16 ++-- > t/t5318-commit-graph.sh | 6 ++ > 2 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index d2b291aca2..a33600c584 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -841,6 +841,7 @@ void write_commit_graph(const char *obj_dir, > oids.nr = 0; > } > > +#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2 > static int verify_commit_graph_error; > > static void graph_report(const char *fmt, ...) > @@ -860,7 +861,9 @@ static void graph_report(const char *fmt, ...) > int verify_commit_graph(struct commit_graph *g) > { > uint32_t i, cur_fanout_pos = 0; > - struct object_id prev_oid, cur_oid; > + struct object_id prev_oid, cur_oid, checksum; > + struct hashfile *f; > + int devnull; > > if (!g) { > graph_report("no commit-graph file loaded"); > @@ -879,6 +882,15 @@ int verify_commit_graph(struct commit_graph *g) > if (verify_commit_graph_error) > return verify_commit_graph_error; > > + devnull = open("/dev/null", O_WRONLY); > + f = hashfd(devnull, NULL); > + hashwrite(f, g->data, g->data_len - g->hash_len); > + finalize_hashfile(f, checksum.hash, CSUM_CLOSE); > + if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) { > + graph_report(_("the commit-graph file has incorrect checksum > and is likely corrupt")); > + verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH; > + } Is it the best way of calculating the SHA-1 checksum that out internal APIs provide? Is it how SHA-1 checksum is calculated and checked for packfiles? > + > for (i = 0; i < g->num_commits; i++) { > struct commit *graph_commit; > > @@ -916,7 +928,7 @@ int verify_commit_graph(struct commit_graph *g) > cur_fanout_pos++; > } > > - if (verify_commit_graph_error) > + if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH) > return verify_commit_graph_error; So if we detected that checksum do not match, but we have not found an error, we say that it is all right? > > for (i = 0; i < g->num_commits; i++) { > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > index 240aef6add..2680a2ebff 100755 > --- a/t/t5318-commit-graph.sh > +++ b/t/t5318-commit-graph.sh > @@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=`expr $HASH_LEN + 16` > GRAPH_OCTOPUS_DATA_OFFSET=`expr $GRAPH_COMMIT_DATA_OFFSET + \ > $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS` > GRAPH_BYTE_OCTOPUS=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4` > +GRAPH_BYTE_FOOTER=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES` > > # usage: corrupt_graph_and_verify > # Manipulates the commit-graph file at the position > @@ -388,4 +389,9 @@ test_expect_success 'detect incorrect parent for octopus > merge' ' > "invalid parent" > ' > > +test_expect_success 'detect invalid checksum hash' ' > + corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \ > + "incorrect checksum" This would not work under GETTEXT_POISON, as the message is marked as translatable, but corrupt_graph_and_verify uses 'grep' and not 'test_i18grep' from t/test-lib-functions.sh > +' If it is pure checksum corruption, wouldn't this fail because it is not a failure (exit code is 0)? > + > test_done
Re: [PATCH v3 16/20] commit-graph: verify contents match checksum
> diff --git a/commit-graph.c b/commit-graph.c > index d2b291aca2..a33600c584 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -841,6 +841,7 @@ void write_commit_graph(const char *obj_dir, > oids.nr =3D 0; > } > =20 > +#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2 > static int verify_commit_graph_error; > =20 > static void graph_report(const char *fmt, ...) > @@ -860,7 +861,9 @@ static void graph_report(const char *fmt, ...) > int verify_commit_graph(struct commit_graph *g) > { > uint32_t i, cur_fanout_pos =3D 0; > - struct object_id prev_oid, cur_oid; > + struct object_id prev_oid, cur_oid, checksum; > + struct hashfile *f; > + int devnull; > =20 > if (!g) { > graph_report("no commit-graph file loaded"); > @@ -879,6 +882,15 @@ int verify_commit_graph(struct commit_graph *g) > if (verify_commit_graph_error) > return verify_commit_graph_error; > =20 > + devnull =3D open("/dev/null", O_WRONLY); > + f =3D hashfd(devnull, NULL); > + hashwrite(f, g->data, g->data_len - g->hash_len); > + finalize_hashfile(f, checksum.hash, CSUM_CLOSE); > + if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) { > + graph_report(_("the commit-graph file has incorrect checksum > and is likely corrupt")); This error message is translated ... > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > index 240aef6add..2680a2ebff 100755 > --- a/t/t5318-commit-graph.sh > +++ b/t/t5318-commit-graph.sh > @@ -388,4 +389,9 @@ test_expect_success 'detect incorrect parent for octopu= > s merge' ' > "invalid parent" > ' > =20 > +test_expect_success 'detect invalid checksum hash' ' > + corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \ > + "incorrect checksum" ... but here in 'corrupt_graph_and_verify' you look for "incorrect checksum" with plain 'grep' (as opposed to 'test_i18ngrep', which won't find that string in a GETTEXT_POISON build, and ultimately causes the test to fail.