Re: [PATCH v3 16/20] commit-graph: verify contents match checksum

2018-06-04 Thread Derrick Stolee

On 6/2/2018 11:52 AM, Jakub Narebski wrote:

Derrick Stolee  writes:


The commit-graph file ends with a SHA1 hash of the previous contents. If
a commit-graph file has errors but the checksum hash is correct, then we
know that the problem is a bug in Git and not simply file corruption
after-the-fact.

Compute the checksum right away so it is the first error that appears,
and make the message translatable since this error can be "corrected" by
a user by simply deleting the file and recomputing. The rest of the
errors are useful only to developers.

Should we then provide --quiet / --verbose options, so that ordinary
user is not flooded with error messages meant for power users and Git
developers, then?


Be sure to continue checking the rest of the file data if the checksum
is wrong. This is important for our tests, as we break the checksum as
we modify bytes of the commit-graph file.

Well, we could have used sha1sum program, or test-sha1 helper to fix the
checksum after corrupting the commit-graph file...


Signed-off-by: Derrick Stolee 
---
  commit-graph.c  | 16 ++--
  t/t5318-commit-graph.sh |  6 ++
  2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index d2b291aca2..a33600c584 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -841,6 +841,7 @@ void write_commit_graph(const char *obj_dir,
oids.nr = 0;
  }
  
+#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2

  static int verify_commit_graph_error;
  
  static void graph_report(const char *fmt, ...)

@@ -860,7 +861,9 @@ static void graph_report(const char *fmt, ...)
  int verify_commit_graph(struct commit_graph *g)
  {
uint32_t i, cur_fanout_pos = 0;
-   struct object_id prev_oid, cur_oid;
+   struct object_id prev_oid, cur_oid, checksum;
+   struct hashfile *f;
+   int devnull;
  
  	if (!g) {

graph_report("no commit-graph file loaded");
@@ -879,6 +882,15 @@ int verify_commit_graph(struct commit_graph *g)
if (verify_commit_graph_error)
return verify_commit_graph_error;
  
+	devnull = open("/dev/null", O_WRONLY);

+   f = hashfd(devnull, NULL);
+   hashwrite(f, g->data, g->data_len - g->hash_len);
+   finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
+   if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+   graph_report(_("the commit-graph file has incorrect checksum and is 
likely corrupt"));
+   verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
+   }

Is it the best way of calculating the SHA-1 checksum that out internal
APIs provide?  Is it how SHA-1 checksum is calculated and checked for
packfiles?
This pattern is similar to hashfd_check() in csum-file.c, except we are 
hashing the file data directly instead of re-creating it from scratch 
(as is done for 'git index-pack --verify').





+
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit;
  
@@ -916,7 +928,7 @@ int verify_commit_graph(struct commit_graph *g)

cur_fanout_pos++;
}
  
-	if (verify_commit_graph_error)

+   if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
return verify_commit_graph_error;

So if we detected that checksum do not match, but we have not found an
error, we say that it is all right?


This only prevents us from stopping early. We will still report an error.



  
  	for (i = 0; i < g->num_commits; i++) {

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 240aef6add..2680a2ebff 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=`expr $HASH_LEN + 16`
  GRAPH_OCTOPUS_DATA_OFFSET=`expr $GRAPH_COMMIT_DATA_OFFSET + \
$GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS`
  GRAPH_BYTE_OCTOPUS=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4`
+GRAPH_BYTE_FOOTER=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES`
  
  # usage: corrupt_graph_and_verify   

  # Manipulates the commit-graph file at the position
@@ -388,4 +389,9 @@ test_expect_success 'detect incorrect parent for octopus 
merge' '
"invalid parent"
  '
  
+test_expect_success 'detect invalid checksum hash' '

+   corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+   "incorrect checksum"

This would not work under GETTEXT_POISON, as the message is marked as
translatable, but corrupt_graph_and_verify uses 'grep' and not
'test_i18grep' from t/test-lib-functions.sh


I fixed this locally based on Szeder's comment.




+'

If it is pure checksum corruption, wouldn't this fail because it is not
a failure (exit code is 0)?


It is not zero, so the test passes.



Re: [PATCH v3 16/20] commit-graph: verify contents match checksum

2018-06-02 Thread Jakub Narebski
Derrick Stolee  writes:

> The commit-graph file ends with a SHA1 hash of the previous contents. If
> a commit-graph file has errors but the checksum hash is correct, then we
> know that the problem is a bug in Git and not simply file corruption
> after-the-fact.
>
> Compute the checksum right away so it is the first error that appears,
> and make the message translatable since this error can be "corrected" by
> a user by simply deleting the file and recomputing. The rest of the
> errors are useful only to developers.

Should we then provide --quiet / --verbose options, so that ordinary
user is not flooded with error messages meant for power users and Git
developers, then?

>
> Be sure to continue checking the rest of the file data if the checksum
> is wrong. This is important for our tests, as we break the checksum as
> we modify bytes of the commit-graph file.

Well, we could have used sha1sum program, or test-sha1 helper to fix the
checksum after corrupting the commit-graph file...

>
> Signed-off-by: Derrick Stolee 
> ---
>  commit-graph.c  | 16 ++--
>  t/t5318-commit-graph.sh |  6 ++
>  2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index d2b291aca2..a33600c584 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -841,6 +841,7 @@ void write_commit_graph(const char *obj_dir,
>   oids.nr = 0;
>  }
>  
> +#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
>  static int verify_commit_graph_error;
>  
>  static void graph_report(const char *fmt, ...)
> @@ -860,7 +861,9 @@ static void graph_report(const char *fmt, ...)
>  int verify_commit_graph(struct commit_graph *g)
>  {
>   uint32_t i, cur_fanout_pos = 0;
> - struct object_id prev_oid, cur_oid;
> + struct object_id prev_oid, cur_oid, checksum;
> + struct hashfile *f;
> + int devnull;
>  
>   if (!g) {
>   graph_report("no commit-graph file loaded");
> @@ -879,6 +882,15 @@ int verify_commit_graph(struct commit_graph *g)
>   if (verify_commit_graph_error)
>   return verify_commit_graph_error;
>  
> + devnull = open("/dev/null", O_WRONLY);
> + f = hashfd(devnull, NULL);
> + hashwrite(f, g->data, g->data_len - g->hash_len);
> + finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
> + if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
> + graph_report(_("the commit-graph file has incorrect checksum 
> and is likely corrupt"));
> + verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
> + }

Is it the best way of calculating the SHA-1 checksum that out internal
APIs provide?  Is it how SHA-1 checksum is calculated and checked for
packfiles?

> +
>   for (i = 0; i < g->num_commits; i++) {
>   struct commit *graph_commit;
>  
> @@ -916,7 +928,7 @@ int verify_commit_graph(struct commit_graph *g)
>   cur_fanout_pos++;
>   }
>  
> - if (verify_commit_graph_error)
> + if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
>   return verify_commit_graph_error;

So if we detected that checksum do not match, but we have not found an
error, we say that it is all right?

>  
>   for (i = 0; i < g->num_commits; i++) {
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 240aef6add..2680a2ebff 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=`expr $HASH_LEN + 16`
>  GRAPH_OCTOPUS_DATA_OFFSET=`expr $GRAPH_COMMIT_DATA_OFFSET + \
>   $GRAPH_COMMIT_DATA_WIDTH \* $NUM_COMMITS`
>  GRAPH_BYTE_OCTOPUS=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4`
> +GRAPH_BYTE_FOOTER=`expr $GRAPH_OCTOPUS_DATA_OFFSET + 4 \* $NUM_OCTOPUS_EDGES`
>  
>  # usage: corrupt_graph_and_verify   
>  # Manipulates the commit-graph file at the position
> @@ -388,4 +389,9 @@ test_expect_success 'detect incorrect parent for octopus 
> merge' '
>   "invalid parent"
>  '
>  
> +test_expect_success 'detect invalid checksum hash' '
> + corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> + "incorrect checksum"

This would not work under GETTEXT_POISON, as the message is marked as
translatable, but corrupt_graph_and_verify uses 'grep' and not
'test_i18grep' from t/test-lib-functions.sh

> +'

If it is pure checksum corruption, wouldn't this fail because it is not
a failure (exit code is 0)?

> +
>  test_done


Re: [PATCH v3 16/20] commit-graph: verify contents match checksum

2018-05-30 Thread SZEDER Gábor


> diff --git a/commit-graph.c b/commit-graph.c
> index d2b291aca2..a33600c584 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -841,6 +841,7 @@ void write_commit_graph(const char *obj_dir,
>   oids.nr =3D 0;
>  }
> =20
> +#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
>  static int verify_commit_graph_error;
> =20
>  static void graph_report(const char *fmt, ...)
> @@ -860,7 +861,9 @@ static void graph_report(const char *fmt, ...)
>  int verify_commit_graph(struct commit_graph *g)
>  {
>   uint32_t i, cur_fanout_pos =3D 0;
> - struct object_id prev_oid, cur_oid;
> + struct object_id prev_oid, cur_oid, checksum;
> + struct hashfile *f;
> + int devnull;
> =20
>   if (!g) {
>   graph_report("no commit-graph file loaded");
> @@ -879,6 +882,15 @@ int verify_commit_graph(struct commit_graph *g)
>   if (verify_commit_graph_error)
>   return verify_commit_graph_error;
> =20
> + devnull =3D open("/dev/null", O_WRONLY);
> + f =3D hashfd(devnull, NULL);
> + hashwrite(f, g->data, g->data_len - g->hash_len);
> + finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
> + if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
> + graph_report(_("the commit-graph file has incorrect checksum 
> and is likely corrupt"));

This error message is translated ...

> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 240aef6add..2680a2ebff 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh

> @@ -388,4 +389,9 @@ test_expect_success 'detect incorrect parent for octopu=
> s merge' '
>   "invalid parent"
>  '
> =20
> +test_expect_success 'detect invalid checksum hash' '
> + corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
> + "incorrect checksum"

... but here in 'corrupt_graph_and_verify' you look for "incorrect
checksum" with plain 'grep' (as opposed to 'test_i18ngrep', which
won't find that string in a GETTEXT_POISON build, and ultimately
causes the test to fail.