Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-08 Thread Junio C Hamano
Michael Haggerty  writes:

> So `ref_transaction_update()` *does* need to set or clear the `HAVE_NEW`
> and `HAVE_OLD` bits as I sketched, to impedance-match between the two
> conventions.

OK, so ignoring HAVE_NEW/HAVE_OLD bits that the callers of
ref_transaction_update() may set in flags, and having
ref_transaction_update() compute these bits based on new/old_sha1
pointers from scratch, would be the right thing to do.

IOW

flags &= ~(REF_HAVE_NEW|REF_HAVE_OLD);
if (new_sha1)
flags |= REF_HAVE_NEW;
if (old_sha1)
flags |= REF_HAVE_OLD;

and your earlier "Does the warning go away if you change the line
to" does essentially the same thing.

> It's a shame how much time we've wasted discussing this. Maybe the code
> is trying to be too clever/efficient and needs a rethink.

It might be the case, but I do not know what to blame is "the two
conventions", an over-eager compiler, or a confused commenter on the
thread (that's me), though ;-).


Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-08 Thread Michael Haggerty
On 09/08/2017 02:46 AM, Junio C Hamano wrote:
> Michael Haggerty  writes:
> 
>> I did just realize one thing: `ref_transaction_update()` takes `flags`
>> as an argument and alters it using
>>
>>> flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 
>>> 0);
>>
>> Perhaps gcc is *more* intelligent than we give it credit for, and is
>> actually worried that the `flags` argument passed in by the caller
>> might *already* have one of these bits set. In that case
>> `ref_transaction_add_update()` would indeed be called incorrectly.
>> Does the warning go away if you change that line to
>>
>>> if (new_sha1)
>>> flags |=REF_HAVE_NEW;
>>> else
>>> flags &= ~REF_HAVE_NEW;
>>> if (old_sha1)
>>> flags |=REF_HAVE_OLD;
>>> else
>>> flags &= ~REF_HAVE_OLD;
>>
>> ? This might be a nice change to have anyway, to isolate
>> `ref_transaction_update()` from mistakes by its callers.
> 
> I understand "drop HAVE_NEW bit if new_sha1 is NULL" part, but not
> the other side "add HAVE_NEW if new_SHA1 is not NULL"---doesn't the
> NEW/OLD flag exist exactly because some callers pass the address of
> an embedded oid.hash[] or null_sha1, instead of NULL, when one side 
> does not exist?  So new|old being NULL is a definite signal that we
> need to drop HAVE_NEW|OLD, but the reverse may not be true, no?  Is
> it OK to overwrite null_sha1[] that is passed from some codepaths?
> 
> ref_transaction_create and _delete pass null_sha1 on the missing
> side, while ref_transaction_verify passes NULL, while calling
> _update().  Should this distinction affect how _add_update() gets
> called?

There are two functions under discussion:

* `ref_transaction_add_update()` is the low-level, private function that
uses the `HAVE_{NEW,OLD}` bits to decide what to do.

* `ref_transaction_update()` (like
`ref_transaction_{create,delete,verify}()`) are public functions that
ignore the `HAVE_{NEW,OLD}` bits and base their behavior on whether
`new_sha1` and `old_sha1` are NULL.

Each of these functions has to support three possibilities for its SHA-1
arguments:

1. The SHA-1 is provided and not `null_sha1`—in this case it must match
the old value (if `old_sha1`) or it is the value to be set as the new
value (if `new_sha1`).

2. The SHA-1 is provided and is equal to `null_sha1`—in this case the
reference must not already exist (if `old_sha1` is `null_sha1`) or it
will be deleted (if `new_sha1` is `null_sha1`).

3. The SHA-1 is not provided at all—in this case the old value is
ignored (if `old_sha1` is not provided) or the reference is left
unchanged (if `new_sha1` is not provided).

Much of the current confusion stems because
`ref_transaction_add_update()` encodes the third condition using the
`REF_HAVE_*` bits, whereas `ref_transaction_update()` and its friends
encode the third condition by setting `old_sha1` or `new_sha1` to `NULL`.

So `ref_transaction_update()` *does* need to set or clear the `HAVE_NEW`
and `HAVE_OLD` bits as I sketched, to impedance-match between the two
conventions.

It's a shame how much time we've wasted discussing this. Maybe the code
is trying to be too clever/efficient and needs a rethink.

Michael


Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-07 Thread Junio C Hamano
Michael Haggerty  writes:

> I did just realize one thing: `ref_transaction_update()` takes `flags`
> as an argument and alters it using
>
>> flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 
>> 0);
>
> Perhaps gcc is *more* intelligent than we give it credit for, and is
> actually worried that the `flags` argument passed in by the caller
> might *already* have one of these bits set. In that case
> `ref_transaction_add_update()` would indeed be called incorrectly.
> Does the warning go away if you change that line to
>
>> if (new_sha1)
>> flags |=REF_HAVE_NEW;
>> else
>> flags &= ~REF_HAVE_NEW;
>> if (old_sha1)
>> flags |=REF_HAVE_OLD;
>> else
>> flags &= ~REF_HAVE_OLD;
>
> ? This might be a nice change to have anyway, to isolate
> `ref_transaction_update()` from mistakes by its callers.

I understand "drop HAVE_NEW bit if new_sha1 is NULL" part, but not
the other side "add HAVE_NEW if new_SHA1 is not NULL"---doesn't the
NEW/OLD flag exist exactly because some callers pass the address of
an embedded oid.hash[] or null_sha1, instead of NULL, when one side 
does not exist?  So new|old being NULL is a definite signal that we
need to drop HAVE_NEW|OLD, but the reverse may not be true, no?  Is
it OK to overwrite null_sha1[] that is passed from some codepaths?

ref_transaction_create and _delete pass null_sha1 on the missing
side, while ref_transaction_verify passes NULL, while calling
_update().  Should this distinction affect how _add_update() gets
called?



Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-07 Thread Thomas Gummerer
On 09/07, Michael Haggerty wrote:
> On Wed, Sep 6, 2017 at 3:26 AM, Junio C Hamano  wrote:
> > Thomas Gummerer  writes:
> >
> >> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed
> >> as the second parameter of memcpy.
> >> [...]
> >
> > It is hugely annoying to see a halfway-intelligent compiler forces
> > you to add such pointless asserts.
> >
> > The only way the compiler could error on this is by inferring the
> > fact that new_sha1/old_sha1 could be NULL by looking at the callsite
> > in ref_transaction_update() where these are used as conditionals to
> > set HAVE_NEW/HAVE_OLD that are passed.  Even if the compiler were
> > doing the whole-program analysis, the other two callsites of the
> > function pass the address of oid.hash[] in an oid structure so it
> > should know these won't be NULL.
> >
> > [...]
> >
> > I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these
> > codepaths, though.  Perhaps we can instead declare !!new_sha1 means
> > we have the new side and rewrite the above part to
> >
> > if (new_sha1)
> > hashcpy(update->new_oid.hash, new_sha1);
> >
> > without an extra and totally pointless assert()?
> 
> The ultimate reason for those flags is that `struct ref_update` embeds
> `new_oid` and `old_oid` directly in the struct, so there is no way to
> set it to "NULL". (The `is_null_sha1` value is used for a different
> purpose.) So those flags keep track of whether the corresponding value
> is specified or absent.
> 
> Four of the five callers of `ref_transaction_add_update()` are
> constructing a new `ref_update` from an old one. They currently don't
> have to look into `flags`; they just pass it on (possibly changing a
> bit or two). Implementing your proposal would oblige those callers to
> change from something like

Thanks for the explanation!

> > new_update = ref_transaction_add_update(
> > transaction, "HEAD",
> > update->flags | REF_LOG_ONLY | REF_NODEREF,
> > update->new_oid.hash, update->old_oid.hash,
> > update->msg);
> 
> to
> 
> > new_update = ref_transaction_add_update(
> > transaction, "HEAD",
> > update->flags | REF_LOG_ONLY | REF_NODEREF,
> > (update->flags & REF_HAVE_NEW) ? update->new_oid.hash : NULL,
> > (update->flags & REF_HAVE_OLD) ? update->old_oid.hash : NULL,
> > update->msg);
> 
> It's not the end of the world, but it's annoying.
> `ref_transaction_add_update()` was meant to be a low-level,
> low-overhead way of allocating a `struct ref_update` and add it to a
> transaction.
> 
> Another solution (also annoying, but maybe a tad less so) would be to
> change the one iffy caller, `ref_transaction_update()`, to pass in a
> pointer to the null OID for `new_sha1` and `old_sha1` when the
> corresponding flags are turned off. That value would never be looked
> at, but it would hopefully reassure gcc.
> 
> I did just realize one thing: `ref_transaction_update()` takes `flags`
> as an argument and alters it using
> 
> > flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 
> > 0);
> 
> Perhaps gcc is *more* intelligent than we give it credit for, and is
> actually worried that the `flags` argument passed in by the caller
> might *already* have one of these bits set. In that case
> `ref_transaction_add_update()` would indeed be called incorrectly.
> Does the warning go away if you change that line to
> 
> > if (new_sha1)
> > flags |=REF_HAVE_NEW;
> > else
> > flags &= ~REF_HAVE_NEW;
> > if (old_sha1)
> > flags |=REF_HAVE_OLD;
> > else
> > flags &= ~REF_HAVE_OLD;
> 
> ?

Indeed that fixes it, great catch!  gcc is indeed smarter than we gave
it credit for, this makes a lot of sense.

Interestingly stripping away the flags fixes the compiler warning:

diff --git a/refs.c b/refs.c
index ba22f4acef..2e6871beac 100644
--- a/refs.c
+++ b/refs.c
@@ -921,6 +921,9 @@ int ref_transaction_update(struct ref_transaction 
*transaction,
return -1;
}
 
+   flags &= ~REF_HAVE_NEW;
+   flags &= ~REF_HAVE_OLD;
+
flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 0);
 
ref_transaction_add_update(transaction, refname, flags,

while checking that the flags are not there in the first place still
leaves the compiler warning (whether I use "die()" or just "return -1"
doesn't matter in this case):

diff --git a/refs.c b/refs.c
index ba22f4acef..62ff283755 100644
--- a/refs.c
+++ b/refs.c
@@ -921,6 +921,9 @@ int ref_transaction_update(struct ref_transaction 
*transaction,
return -1;
}
 
+   if ((flags & REF_HAVE_NEW) != 0 || (flags & REF_HAVE_OLD) != 0)
+   die("BUG: passed invalid flag to ref_transaction_update");
+
flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 0);
 

Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-07 Thread Michael Haggerty
On Wed, Sep 6, 2017 at 3:26 AM, Junio C Hamano  wrote:
> Thomas Gummerer  writes:
>
>> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed
>> as the second parameter of memcpy.
>> [...]
>
> It is hugely annoying to see a halfway-intelligent compiler forces
> you to add such pointless asserts.
>
> The only way the compiler could error on this is by inferring the
> fact that new_sha1/old_sha1 could be NULL by looking at the callsite
> in ref_transaction_update() where these are used as conditionals to
> set HAVE_NEW/HAVE_OLD that are passed.  Even if the compiler were
> doing the whole-program analysis, the other two callsites of the
> function pass the address of oid.hash[] in an oid structure so it
> should know these won't be NULL.
>
> [...]
>
> I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these
> codepaths, though.  Perhaps we can instead declare !!new_sha1 means
> we have the new side and rewrite the above part to
>
> if (new_sha1)
> hashcpy(update->new_oid.hash, new_sha1);
>
> without an extra and totally pointless assert()?

The ultimate reason for those flags is that `struct ref_update` embeds
`new_oid` and `old_oid` directly in the struct, so there is no way to
set it to "NULL". (The `is_null_sha1` value is used for a different
purpose.) So those flags keep track of whether the corresponding value
is specified or absent.

Four of the five callers of `ref_transaction_add_update()` are
constructing a new `ref_update` from an old one. They currently don't
have to look into `flags`; they just pass it on (possibly changing a
bit or two). Implementing your proposal would oblige those callers to
change from something like

> new_update = ref_transaction_add_update(
> transaction, "HEAD",
> update->flags | REF_LOG_ONLY | REF_NODEREF,
> update->new_oid.hash, update->old_oid.hash,
> update->msg);

to

> new_update = ref_transaction_add_update(
> transaction, "HEAD",
> update->flags | REF_LOG_ONLY | REF_NODEREF,
> (update->flags & REF_HAVE_NEW) ? update->new_oid.hash : NULL,
> (update->flags & REF_HAVE_OLD) ? update->old_oid.hash : NULL,
> update->msg);

It's not the end of the world, but it's annoying.
`ref_transaction_add_update()` was meant to be a low-level,
low-overhead way of allocating a `struct ref_update` and add it to a
transaction.

Another solution (also annoying, but maybe a tad less so) would be to
change the one iffy caller, `ref_transaction_update()`, to pass in a
pointer to the null OID for `new_sha1` and `old_sha1` when the
corresponding flags are turned off. That value would never be looked
at, but it would hopefully reassure gcc.

I did just realize one thing: `ref_transaction_update()` takes `flags`
as an argument and alters it using

> flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 
> 0);

Perhaps gcc is *more* intelligent than we give it credit for, and is
actually worried that the `flags` argument passed in by the caller
might *already* have one of these bits set. In that case
`ref_transaction_add_update()` would indeed be called incorrectly.
Does the warning go away if you change that line to

> if (new_sha1)
> flags |=REF_HAVE_NEW;
> else
> flags &= ~REF_HAVE_NEW;
> if (old_sha1)
> flags |=REF_HAVE_OLD;
> else
> flags &= ~REF_HAVE_OLD;

? This might be a nice change to have anyway, to isolate
`ref_transaction_update()` from mistakes by its callers. For that
matter, one might want to be even more selective about what bits are
allowed in the `flags` argument to `ref_transaction_update()`'s
callers:

> flags &= REF_ALLOWED_FLAGS; /* value would need to be determined */

Michael


Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-06 Thread Thomas Gummerer
On Wed, Sep 6, 2017 at 2:26 AM, Junio C Hamano  wrote:
> Thomas Gummerer  writes:
>
>> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed
>> as the second parameter of memcpy.
>>
>> In file included from refs.c:5:0:
>> refs.c: In function ‘ref_transaction_verify’:
>> cache.h:948:2: error: argument 2 null where non-null expected 
>> [-Werror=nonnull]
>>   memcpy(sha_dst, sha_src, GIT_SHA1_RAWSZ);
>>   ^~~~
>> In file included from git-compat-util.h:165:0,
>>  from cache.h:4,
>>  from refs.c:5:
>> /usr/include/string.h:43:14: note: in a call to function ‘memcpy’ declared 
>> here
>>  extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
>>   ^~
>> ...
>> diff --git a/refs.c b/refs.c
>> index ba22f4acef..d8c12a9c44 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -896,10 +896,14 @@ struct ref_update *ref_transaction_add_update(
>>
>>   update->flags = flags;
>>
>> - if (flags & REF_HAVE_NEW)
>> + if (flags & REF_HAVE_NEW) {
>> + assert(new_sha1);
>>   hashcpy(update->new_oid.hash, new_sha1);
>> - if (flags & REF_HAVE_OLD)
>> + }
>> + if (flags & REF_HAVE_OLD) {
>> + assert(old_sha1);
>>   hashcpy(update->old_oid.hash, old_sha1);
>> + }
>
> It is hugely annoying to see a halfway-intelligent compiler forces
> you to add such pointless asserts.
>
> The only way the compiler could error on this is by inferring the
> fact that new_sha1/old_sha1 could be NULL by looking at the callsite
> in ref_transaction_update() where these are used as conditionals to
> set HAVE_NEW/HAVE_OLD that are passed.  Even if the compiler were
> doing the whole-program analysis, the other two callsites of the
> function pass the address of oid.hash[] in an oid structure so it
> should know these won't be NULL.
>
> Or is the compiler being really dumb and triggering an error for
> every use of
>
> memcpy(dst, src, size);
>
> that must now be written as
>
> assert(src);
> memcpy(dst, src, size);
>
> ???  That would be doubly annoying

No, I think it can't quite deal with the flags that are passed in.
I'm on a different
machine today, so I can't actually check, but that's what I would
expect at least.

> I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these
> codepaths, though.  Perhaps we can instead declare !!new_sha1 means
> we have the new side and rewrite the above part to
>
> if (new_sha1)
> hashcpy(update->new_oid.hash, new_sha1);
>
> without an extra and totally pointless assert()?

Yeah, that seems much nicer.  I'll try that and send a new a patch
(though I won't
get to it before tomorrow).  Thanks for the review.

>>   update->msg = xstrdup_or_null(msg);
>>   return update;
>>  }


Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-05 Thread Junio C Hamano
Thomas Gummerer  writes:

> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed
> as the second parameter of memcpy.
>
> In file included from refs.c:5:0:
> refs.c: In function ‘ref_transaction_verify’:
> cache.h:948:2: error: argument 2 null where non-null expected 
> [-Werror=nonnull]
>   memcpy(sha_dst, sha_src, GIT_SHA1_RAWSZ);
>   ^~~~
> In file included from git-compat-util.h:165:0,
>  from cache.h:4,
>  from refs.c:5:
> /usr/include/string.h:43:14: note: in a call to function ‘memcpy’ declared 
> here
>  extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
>   ^~
> ...
> diff --git a/refs.c b/refs.c
> index ba22f4acef..d8c12a9c44 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -896,10 +896,14 @@ struct ref_update *ref_transaction_add_update(
>  
>   update->flags = flags;
>  
> - if (flags & REF_HAVE_NEW)
> + if (flags & REF_HAVE_NEW) {
> + assert(new_sha1);
>   hashcpy(update->new_oid.hash, new_sha1);
> - if (flags & REF_HAVE_OLD)
> + }
> + if (flags & REF_HAVE_OLD) {
> + assert(old_sha1);
>   hashcpy(update->old_oid.hash, old_sha1);
> + }

It is hugely annoying to see a halfway-intelligent compiler forces
you to add such pointless asserts.

The only way the compiler could error on this is by inferring the
fact that new_sha1/old_sha1 could be NULL by looking at the callsite
in ref_transaction_update() where these are used as conditionals to
set HAVE_NEW/HAVE_OLD that are passed.  Even if the compiler were
doing the whole-program analysis, the other two callsites of the
function pass the address of oid.hash[] in an oid structure so it
should know these won't be NULL.

Or is the compiler being really dumb and triggering an error for
every use of

memcpy(dst, src, size);

that must now be written as

assert(src);
memcpy(dst, src, size);

???  That would be doubly annoying.

I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these
codepaths, though.  Perhaps we can instead declare !!new_sha1 means
we have the new side and rewrite the above part to

if (new_sha1)
hashcpy(update->new_oid.hash, new_sha1);

without an extra and totally pointless assert()?

>   update->msg = xstrdup_or_null(msg);
>   return update;
>  }