Re: [RFC PATCH 08/12] commit-graph: verify commit contents against odb

2018-04-20 Thread Jakub Narebski
Derrick Stolee  writes:

One more check that could been done, and which do not require accessing
the object database, would be testing correctness of the Large Edge List
(EDGE) chunk.

For each commit in the commit-graph (in the Commit Data (CDAT) chunk),
check if it has more than two parents (if the value for second parent is
different from 0x but has the most significant bit set).  If
there is any such commit, then.

1. Check that EDGE chunk exists
2. For each octopus merge:
   - check that the index into EDGE array is less than number of its
 elements (sanity check the index)
   - for each parent in the EDGE list, check if the position is valid:
 is less than number of commits in the commit-graph
   - check that list of parents in EDGE terminates
3. If feasible, also check that
   - all entries in EDGE chunk are referenced directly or indirectly
   - number of parent list terminators (with most significant bit set)
 is equal to the number of octopus merges (no overlap of parent
 lists) -- if it is considered an error

> When running 'git commit-graph check', compare the contents of the
> commits that are loaded from the commit-graph file with commits that are
> loaded directly from the object database. This includes checking the
> root tree object ID, commit date, and parents.

All right, this part requires checking the object database.

>
> In addition, verify the generation number calculation is correct for all
> commits in the commit-graph file.

But if I understand the code correctly, this one does not require
accessing the object database.  This is entirely separate check, and in
my opinion it should be a separate commit (a separate patch).

Also, there might be a problem with handling GENERATION_NUMBER_MAX.

>
> Signed-off-by: Derrick Stolee 
> ---
>  commit-graph.c | 82 ++
>  1 file changed, 82 insertions(+)

I guess testing for this would be hard - you would need to create
invalid commit-graph file...

>
> diff --git a/commit-graph.c b/commit-graph.c
> index 80a2ac2a6a..b5bce2bac4 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -899,5 +899,87 @@ int check_commit_graph(struct commit_graph *g)
>graph_commit->graph_pos, i);
>   }
>  
> + for (i = 0; i < g->num_commits; i++) {
> + struct commit *graph_commit, *odb_commit;
> + struct commit_list *graph_parents, *odb_parents;
> + int num_parents = 0;
> +
> + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
> +
> + graph_commit = lookup_commit(_oid);
> + odb_commit = (struct commit *)create_object(cur_oid.hash, 
> alloc_commit_node());
> + if (parse_commit_internal(odb_commit, 0, 0))
> + graph_report(_("failed to parse %s from object 
> database"), oid_to_hex(_oid));

Is it an error to have commit in the commit-graph that is not present in
the object database?

It may happen (if commit-graph is not automatically updated on gc and
pruning), that since creating the commit-graph, the user have deleted
the branch and pruned object database -- then commit-graph can contain
objects that are not present in the object database.

> +
> + if (oidcmp(_commit_tree_in_graph_one(g, 
> graph_commit)->object.oid,
> +get_commit_tree_oid(odb_commit)))
> + graph_report(_("root tree object ID for commit %s in 
> commit-graph is %s != %s"),
> +  oid_to_hex(_oid),
> +  
> oid_to_hex(get_commit_tree_oid(graph_commit)),
> +  
> oid_to_hex(get_commit_tree_oid(odb_commit)));

Looks good to me, nicely detailed error message.

> +
> + if (graph_commit->date != odb_commit->date)
> + graph_report(_("commit date for commit %s in 
> commit-graph is %"PRItime" != %"PRItime""),
> +  oid_to_hex(_oid),
> +  graph_commit->date,
> +  odb_commit->date);

Looks good to me, nicely detailed error message.
It is good to have the same format of the message.

> +
> +
> + graph_parents = graph_commit->parents;
> + odb_parents = odb_commit->parents;
> +
> + while (graph_parents) {
> + num_parents++;
> +
> + if (odb_parents == NULL)
> + graph_report(_("commit-graph parent list for 
> commit %s is too long (%d)"),
> +  oid_to_hex(_oid),
> +  num_parents);
> +
> + if (oidcmp(_parents->item->object.oid, 
> _parents->item->object.oid))
> + graph_report(_("commit-graph parent for %s is 
> %s != %s"),
> +   

[RFC PATCH 08/12] commit-graph: verify commit contents against odb

2018-04-17 Thread Derrick Stolee
When running 'git commit-graph check', compare the contents of the
commits that are loaded from the commit-graph file with commits that are
loaded directly from the object database. This includes checking the
root tree object ID, commit date, and parents.

In addition, verify the generation number calculation is correct for all
commits in the commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 82 ++
 1 file changed, 82 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 80a2ac2a6a..b5bce2bac4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -899,5 +899,87 @@ int check_commit_graph(struct commit_graph *g)
 graph_commit->graph_pos, i);
}
 
+   for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit, *odb_commit;
+   struct commit_list *graph_parents, *odb_parents;
+   int num_parents = 0;
+
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   graph_commit = lookup_commit(_oid);
+   odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
+   if (parse_commit_internal(odb_commit, 0, 0))
+   graph_report(_("failed to parse %s from object 
database"), oid_to_hex(_oid));
+
+   if (oidcmp(_commit_tree_in_graph_one(g, 
graph_commit)->object.oid,
+  get_commit_tree_oid(odb_commit)))
+   graph_report(_("root tree object ID for commit %s in 
commit-graph is %s != %s"),
+oid_to_hex(_oid),
+
oid_to_hex(get_commit_tree_oid(graph_commit)),
+
oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+   if (graph_commit->date != odb_commit->date)
+   graph_report(_("commit date for commit %s in 
commit-graph is %"PRItime" != %"PRItime""),
+oid_to_hex(_oid),
+graph_commit->date,
+odb_commit->date);
+
+
+   graph_parents = graph_commit->parents;
+   odb_parents = odb_commit->parents;
+
+   while (graph_parents) {
+   num_parents++;
+
+   if (odb_parents == NULL)
+   graph_report(_("commit-graph parent list for 
commit %s is too long (%d)"),
+oid_to_hex(_oid),
+num_parents);
+
+   if (oidcmp(_parents->item->object.oid, 
_parents->item->object.oid))
+   graph_report(_("commit-graph parent for %s is 
%s != %s"),
+oid_to_hex(_oid),
+
oid_to_hex(_parents->item->object.oid),
+
oid_to_hex(_parents->item->object.oid));
+
+   graph_parents = graph_parents->next;
+   odb_parents = odb_parents->next;
+   }
+
+   if (odb_parents != NULL)
+   graph_report(_("commit-graph parent list for commit %s 
terminates early"),
+oid_to_hex(_oid));
+
+   if (graph_commit->generation) {
+   uint32_t max_generation = 0;
+   graph_parents = graph_commit->parents;
+
+   while (graph_parents) {
+   if (graph_parents->item->generation == 
GENERATION_NUMBER_ZERO ||
+   graph_parents->item->generation == 
GENERATION_NUMBER_INFINITY)
+   graph_report(_("commit-graph has valid 
generation for %s but not its parent, %s"),
+oid_to_hex(_oid),
+
oid_to_hex(_parents->item->object.oid));
+   if (graph_parents->item->generation > 
max_generation)
+   max_generation = 
graph_parents->item->generation;
+   graph_parents = graph_parents->next;
+   }
+
+   if (graph_commit->generation != max_generation + 1)
+   graph_report(_("commit-graph has incorrect 
generation for %s"),
+oid_to_hex(_oid));
+   } else {
+   graph_parents = graph_commit->parents;
+
+   while (graph_parents) {
+   if (graph_parents->item->generation)
+   graph_report(_("commit-graph has 
generation ZERO for %s but not its parent, %s"),