Re: [PATCH 2/2] patch-ids: skip merge commits

2016-09-07 Thread Jeff King
On Wed, Sep 07, 2016 at 02:46:53PM -0400, Jeff King wrote:

> > With this change, commit_patch_id() will return 0 for merge commits
> > (indicating success) but it will not have touched the sha1! Which means it
> > may very well have all kinds of crap in the sha1 that may, or may not,
> > match another, real patch ID randomly.
> 
> Eek, thanks. Somehow I got it into my head that diff_flush_patch_id()
> below was what added it to the list, but clearly that is not the case.
> Looking at it again, I can't imagine how that is the case.

Ah, I see. I initially was looking at an older git (v2.6.x), in which
commit_patch_id() is a static function inside patch-ids.c, and we do not
do any lazy-load trickery. But note that the patch is still wrong even
there; it should return "-1" from commit_patch_id() to instruct the
caller not to add it to the hash.

Anyway...

> > I would suggest to simply copy the merge commit's SHA-1. It is no patch
> > ID, of course, but collisions are as unlikely as commit name collisions,
> > and it would make the "patch ID" of a merge commit deterministic again.
> 
> I agree that would work, though it does mean carrying extra useless
> entries in the patch_id hash. I'll see how bad it would be to simply
> omit them entirely, but this seems like a good fallback plan.

It's not too hard to do so, but it raises a question of what
"format-patch --base" would want. I've just sent another RFC cc-ing
folks interested in that area.

Thanks again for the review.

-Peff


Re: [PATCH 2/2] patch-ids: skip merge commits

2016-09-07 Thread Jeff King
On Wed, Sep 07, 2016 at 02:52:04PM +0200, Johannes Schindelin wrote:

> > diff --git a/patch-ids.c b/patch-ids.c
> > index 77e4663..b1f8514 100644
> > --- a/patch-ids.c
> > +++ b/patch-ids.c
> > @@ -7,10 +7,12 @@
> >  int commit_patch_id(struct commit *commit, struct diff_options *options,
> > unsigned char *sha1, int diff_header_only)
> >  {
> > -   if (commit->parents)
> > +   if (commit->parents) {
> > +   if (commit->parents->next)
> > +   return 0;
> > diff_tree_sha1(commit->parents->item->object.oid.hash,
> >commit->object.oid.hash, "", options);
> > -   else
> > +   } else
> 
> With this change, commit_patch_id() will return 0 for merge commits
> (indicating success) but it will not have touched the sha1! Which means it
> may very well have all kinds of crap in the sha1 that may, or may not,
> match another, real patch ID randomly.

Eek, thanks. Somehow I got it into my head that diff_flush_patch_id()
below was what added it to the list, but clearly that is not the case.
Looking at it again, I can't imagine how that is the case.

> I would suggest to simply copy the merge commit's SHA-1. It is no patch
> ID, of course, but collisions are as unlikely as commit name collisions,
> and it would make the "patch ID" of a merge commit deterministic again.

I agree that would work, though it does mean carrying extra useless
entries in the patch_id hash. I'll see how bad it would be to simply
omit them entirely, but this seems like a good fallback plan.

Thanks, and sorry for the obviously braindead patch.

-Peff


Re: [PATCH 2/2] patch-ids: skip merge commits

2016-09-07 Thread Johannes Schindelin
Hi Peff,

On Wed, 7 Sep 2016, Jeff King wrote:

> [...]
> This patch just ignores merge commits entirely when
> generating patch-ids, meaning they will never be matched
> (from either side of a symmetric-diff traversal).

... except it does not ignore merge commits:

> diff --git a/patch-ids.c b/patch-ids.c
> index 77e4663..b1f8514 100644
> --- a/patch-ids.c
> +++ b/patch-ids.c
> @@ -7,10 +7,12 @@
>  int commit_patch_id(struct commit *commit, struct diff_options *options,
>   unsigned char *sha1, int diff_header_only)
>  {
> - if (commit->parents)
> + if (commit->parents) {
> + if (commit->parents->next)
> + return 0;
>   diff_tree_sha1(commit->parents->item->object.oid.hash,
>  commit->object.oid.hash, "", options);
> - else
> + } else

With this change, commit_patch_id() will return 0 for merge commits
(indicating success) but it will not have touched the sha1! Which means it
may very well have all kinds of crap in the sha1 that may, or may not,
match another, real patch ID randomly.

See e.g. the call site in builtin/log.c's prepare_bases():

[...]
if (commit_patch_id(commit, , sha1, 0))
die(_("cannot get patch id"));
ALLOC_GROW(bases->patch_id, bases->nr_patch_id + 1, 
bases->alloc_patch_id);
patch_id = bases->patch_id + bases->nr_patch_id;
hashcpy(patch_id->hash, sha1);
bases->nr_patch_id++;

So the sha1 is actually used, even if it was not initialized.

I would suggest to simply copy the merge commit's SHA-1. It is no patch
ID, of course, but collisions are as unlikely as commit name collisions,
and it would make the "patch ID" of a merge commit deterministic again.

I.e. something like

if (commit->parents) {
-   if (commit->parents->next)
+   if (commit->parents->next) {
+   hashcpy(sha1, commit->object.oid.hash);
return 0;
+   }
diff_tree_sha1(commit->parents->item->object.oid.hash,

on top.

Ciao,
Dscho