Re: [PATCH] refs: make sure we never pass NULL to hashcpy
Michael Haggertywrites: > So `ref_transaction_update()` *does* need to set or clear the `HAVE_NEW` > and `HAVE_OLD` bits as I sketched, to impedance-match between the two > conventions. OK, so ignoring HAVE_NEW/HAVE_OLD bits that the callers of ref_transaction_update() may set in flags, and having ref_transaction_update() compute these bits based on new/old_sha1 pointers from scratch, would be the right thing to do. IOW flags &= ~(REF_HAVE_NEW|REF_HAVE_OLD); if (new_sha1) flags |= REF_HAVE_NEW; if (old_sha1) flags |= REF_HAVE_OLD; and your earlier "Does the warning go away if you change the line to" does essentially the same thing. > It's a shame how much time we've wasted discussing this. Maybe the code > is trying to be too clever/efficient and needs a rethink. It might be the case, but I do not know what to blame is "the two conventions", an over-eager compiler, or a confused commenter on the thread (that's me), though ;-).
Re: [PATCH] refs: make sure we never pass NULL to hashcpy
On 09/08/2017 02:46 AM, Junio C Hamano wrote: > Michael Haggertywrites: > >> I did just realize one thing: `ref_transaction_update()` takes `flags` >> as an argument and alters it using >> >>> flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : >>> 0); >> >> Perhaps gcc is *more* intelligent than we give it credit for, and is >> actually worried that the `flags` argument passed in by the caller >> might *already* have one of these bits set. In that case >> `ref_transaction_add_update()` would indeed be called incorrectly. >> Does the warning go away if you change that line to >> >>> if (new_sha1) >>> flags |=REF_HAVE_NEW; >>> else >>> flags &= ~REF_HAVE_NEW; >>> if (old_sha1) >>> flags |=REF_HAVE_OLD; >>> else >>> flags &= ~REF_HAVE_OLD; >> >> ? This might be a nice change to have anyway, to isolate >> `ref_transaction_update()` from mistakes by its callers. > > I understand "drop HAVE_NEW bit if new_sha1 is NULL" part, but not > the other side "add HAVE_NEW if new_SHA1 is not NULL"---doesn't the > NEW/OLD flag exist exactly because some callers pass the address of > an embedded oid.hash[] or null_sha1, instead of NULL, when one side > does not exist? So new|old being NULL is a definite signal that we > need to drop HAVE_NEW|OLD, but the reverse may not be true, no? Is > it OK to overwrite null_sha1[] that is passed from some codepaths? > > ref_transaction_create and _delete pass null_sha1 on the missing > side, while ref_transaction_verify passes NULL, while calling > _update(). Should this distinction affect how _add_update() gets > called? There are two functions under discussion: * `ref_transaction_add_update()` is the low-level, private function that uses the `HAVE_{NEW,OLD}` bits to decide what to do. * `ref_transaction_update()` (like `ref_transaction_{create,delete,verify}()`) are public functions that ignore the `HAVE_{NEW,OLD}` bits and base their behavior on whether `new_sha1` and `old_sha1` are NULL. Each of these functions has to support three possibilities for its SHA-1 arguments: 1. The SHA-1 is provided and not `null_sha1`—in this case it must match the old value (if `old_sha1`) or it is the value to be set as the new value (if `new_sha1`). 2. The SHA-1 is provided and is equal to `null_sha1`—in this case the reference must not already exist (if `old_sha1` is `null_sha1`) or it will be deleted (if `new_sha1` is `null_sha1`). 3. The SHA-1 is not provided at all—in this case the old value is ignored (if `old_sha1` is not provided) or the reference is left unchanged (if `new_sha1` is not provided). Much of the current confusion stems because `ref_transaction_add_update()` encodes the third condition using the `REF_HAVE_*` bits, whereas `ref_transaction_update()` and its friends encode the third condition by setting `old_sha1` or `new_sha1` to `NULL`. So `ref_transaction_update()` *does* need to set or clear the `HAVE_NEW` and `HAVE_OLD` bits as I sketched, to impedance-match between the two conventions. It's a shame how much time we've wasted discussing this. Maybe the code is trying to be too clever/efficient and needs a rethink. Michael
Re: [PATCH] refs: make sure we never pass NULL to hashcpy
Michael Haggertywrites: > I did just realize one thing: `ref_transaction_update()` takes `flags` > as an argument and alters it using > >> flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : >> 0); > > Perhaps gcc is *more* intelligent than we give it credit for, and is > actually worried that the `flags` argument passed in by the caller > might *already* have one of these bits set. In that case > `ref_transaction_add_update()` would indeed be called incorrectly. > Does the warning go away if you change that line to > >> if (new_sha1) >> flags |=REF_HAVE_NEW; >> else >> flags &= ~REF_HAVE_NEW; >> if (old_sha1) >> flags |=REF_HAVE_OLD; >> else >> flags &= ~REF_HAVE_OLD; > > ? This might be a nice change to have anyway, to isolate > `ref_transaction_update()` from mistakes by its callers. I understand "drop HAVE_NEW bit if new_sha1 is NULL" part, but not the other side "add HAVE_NEW if new_SHA1 is not NULL"---doesn't the NEW/OLD flag exist exactly because some callers pass the address of an embedded oid.hash[] or null_sha1, instead of NULL, when one side does not exist? So new|old being NULL is a definite signal that we need to drop HAVE_NEW|OLD, but the reverse may not be true, no? Is it OK to overwrite null_sha1[] that is passed from some codepaths? ref_transaction_create and _delete pass null_sha1 on the missing side, while ref_transaction_verify passes NULL, while calling _update(). Should this distinction affect how _add_update() gets called?
Re: [PATCH] refs: make sure we never pass NULL to hashcpy
On 09/07, Michael Haggerty wrote: > On Wed, Sep 6, 2017 at 3:26 AM, Junio C Hamanowrote: > > Thomas Gummerer writes: > > > >> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed > >> as the second parameter of memcpy. > >> [...] > > > > It is hugely annoying to see a halfway-intelligent compiler forces > > you to add such pointless asserts. > > > > The only way the compiler could error on this is by inferring the > > fact that new_sha1/old_sha1 could be NULL by looking at the callsite > > in ref_transaction_update() where these are used as conditionals to > > set HAVE_NEW/HAVE_OLD that are passed. Even if the compiler were > > doing the whole-program analysis, the other two callsites of the > > function pass the address of oid.hash[] in an oid structure so it > > should know these won't be NULL. > > > > [...] > > > > I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these > > codepaths, though. Perhaps we can instead declare !!new_sha1 means > > we have the new side and rewrite the above part to > > > > if (new_sha1) > > hashcpy(update->new_oid.hash, new_sha1); > > > > without an extra and totally pointless assert()? > > The ultimate reason for those flags is that `struct ref_update` embeds > `new_oid` and `old_oid` directly in the struct, so there is no way to > set it to "NULL". (The `is_null_sha1` value is used for a different > purpose.) So those flags keep track of whether the corresponding value > is specified or absent. > > Four of the five callers of `ref_transaction_add_update()` are > constructing a new `ref_update` from an old one. They currently don't > have to look into `flags`; they just pass it on (possibly changing a > bit or two). Implementing your proposal would oblige those callers to > change from something like Thanks for the explanation! > > new_update = ref_transaction_add_update( > > transaction, "HEAD", > > update->flags | REF_LOG_ONLY | REF_NODEREF, > > update->new_oid.hash, update->old_oid.hash, > > update->msg); > > to > > > new_update = ref_transaction_add_update( > > transaction, "HEAD", > > update->flags | REF_LOG_ONLY | REF_NODEREF, > > (update->flags & REF_HAVE_NEW) ? update->new_oid.hash : NULL, > > (update->flags & REF_HAVE_OLD) ? update->old_oid.hash : NULL, > > update->msg); > > It's not the end of the world, but it's annoying. > `ref_transaction_add_update()` was meant to be a low-level, > low-overhead way of allocating a `struct ref_update` and add it to a > transaction. > > Another solution (also annoying, but maybe a tad less so) would be to > change the one iffy caller, `ref_transaction_update()`, to pass in a > pointer to the null OID for `new_sha1` and `old_sha1` when the > corresponding flags are turned off. That value would never be looked > at, but it would hopefully reassure gcc. > > I did just realize one thing: `ref_transaction_update()` takes `flags` > as an argument and alters it using > > > flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : > > 0); > > Perhaps gcc is *more* intelligent than we give it credit for, and is > actually worried that the `flags` argument passed in by the caller > might *already* have one of these bits set. In that case > `ref_transaction_add_update()` would indeed be called incorrectly. > Does the warning go away if you change that line to > > > if (new_sha1) > > flags |=REF_HAVE_NEW; > > else > > flags &= ~REF_HAVE_NEW; > > if (old_sha1) > > flags |=REF_HAVE_OLD; > > else > > flags &= ~REF_HAVE_OLD; > > ? Indeed that fixes it, great catch! gcc is indeed smarter than we gave it credit for, this makes a lot of sense. Interestingly stripping away the flags fixes the compiler warning: diff --git a/refs.c b/refs.c index ba22f4acef..2e6871beac 100644 --- a/refs.c +++ b/refs.c @@ -921,6 +921,9 @@ int ref_transaction_update(struct ref_transaction *transaction, return -1; } + flags &= ~REF_HAVE_NEW; + flags &= ~REF_HAVE_OLD; + flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 0); ref_transaction_add_update(transaction, refname, flags, while checking that the flags are not there in the first place still leaves the compiler warning (whether I use "die()" or just "return -1" doesn't matter in this case): diff --git a/refs.c b/refs.c index ba22f4acef..62ff283755 100644 --- a/refs.c +++ b/refs.c @@ -921,6 +921,9 @@ int ref_transaction_update(struct ref_transaction *transaction, return -1; } + if ((flags & REF_HAVE_NEW) != 0 || (flags & REF_HAVE_OLD) != 0) + die("BUG: passed invalid flag to ref_transaction_update"); + flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 0);
Re: [PATCH] refs: make sure we never pass NULL to hashcpy
On Wed, Sep 6, 2017 at 3:26 AM, Junio C Hamanowrote: > Thomas Gummerer writes: > >> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed >> as the second parameter of memcpy. >> [...] > > It is hugely annoying to see a halfway-intelligent compiler forces > you to add such pointless asserts. > > The only way the compiler could error on this is by inferring the > fact that new_sha1/old_sha1 could be NULL by looking at the callsite > in ref_transaction_update() where these are used as conditionals to > set HAVE_NEW/HAVE_OLD that are passed. Even if the compiler were > doing the whole-program analysis, the other two callsites of the > function pass the address of oid.hash[] in an oid structure so it > should know these won't be NULL. > > [...] > > I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these > codepaths, though. Perhaps we can instead declare !!new_sha1 means > we have the new side and rewrite the above part to > > if (new_sha1) > hashcpy(update->new_oid.hash, new_sha1); > > without an extra and totally pointless assert()? The ultimate reason for those flags is that `struct ref_update` embeds `new_oid` and `old_oid` directly in the struct, so there is no way to set it to "NULL". (The `is_null_sha1` value is used for a different purpose.) So those flags keep track of whether the corresponding value is specified or absent. Four of the five callers of `ref_transaction_add_update()` are constructing a new `ref_update` from an old one. They currently don't have to look into `flags`; they just pass it on (possibly changing a bit or two). Implementing your proposal would oblige those callers to change from something like > new_update = ref_transaction_add_update( > transaction, "HEAD", > update->flags | REF_LOG_ONLY | REF_NODEREF, > update->new_oid.hash, update->old_oid.hash, > update->msg); to > new_update = ref_transaction_add_update( > transaction, "HEAD", > update->flags | REF_LOG_ONLY | REF_NODEREF, > (update->flags & REF_HAVE_NEW) ? update->new_oid.hash : NULL, > (update->flags & REF_HAVE_OLD) ? update->old_oid.hash : NULL, > update->msg); It's not the end of the world, but it's annoying. `ref_transaction_add_update()` was meant to be a low-level, low-overhead way of allocating a `struct ref_update` and add it to a transaction. Another solution (also annoying, but maybe a tad less so) would be to change the one iffy caller, `ref_transaction_update()`, to pass in a pointer to the null OID for `new_sha1` and `old_sha1` when the corresponding flags are turned off. That value would never be looked at, but it would hopefully reassure gcc. I did just realize one thing: `ref_transaction_update()` takes `flags` as an argument and alters it using > flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : > 0); Perhaps gcc is *more* intelligent than we give it credit for, and is actually worried that the `flags` argument passed in by the caller might *already* have one of these bits set. In that case `ref_transaction_add_update()` would indeed be called incorrectly. Does the warning go away if you change that line to > if (new_sha1) > flags |=REF_HAVE_NEW; > else > flags &= ~REF_HAVE_NEW; > if (old_sha1) > flags |=REF_HAVE_OLD; > else > flags &= ~REF_HAVE_OLD; ? This might be a nice change to have anyway, to isolate `ref_transaction_update()` from mistakes by its callers. For that matter, one might want to be even more selective about what bits are allowed in the `flags` argument to `ref_transaction_update()`'s callers: > flags &= REF_ALLOWED_FLAGS; /* value would need to be determined */ Michael
Re: [PATCH] refs: make sure we never pass NULL to hashcpy
On Wed, Sep 6, 2017 at 2:26 AM, Junio C Hamanowrote: > Thomas Gummerer writes: > >> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed >> as the second parameter of memcpy. >> >> In file included from refs.c:5:0: >> refs.c: In function ‘ref_transaction_verify’: >> cache.h:948:2: error: argument 2 null where non-null expected >> [-Werror=nonnull] >> memcpy(sha_dst, sha_src, GIT_SHA1_RAWSZ); >> ^~~~ >> In file included from git-compat-util.h:165:0, >> from cache.h:4, >> from refs.c:5: >> /usr/include/string.h:43:14: note: in a call to function ‘memcpy’ declared >> here >> extern void *memcpy (void *__restrict __dest, const void *__restrict __src, >> ^~ >> ... >> diff --git a/refs.c b/refs.c >> index ba22f4acef..d8c12a9c44 100644 >> --- a/refs.c >> +++ b/refs.c >> @@ -896,10 +896,14 @@ struct ref_update *ref_transaction_add_update( >> >> update->flags = flags; >> >> - if (flags & REF_HAVE_NEW) >> + if (flags & REF_HAVE_NEW) { >> + assert(new_sha1); >> hashcpy(update->new_oid.hash, new_sha1); >> - if (flags & REF_HAVE_OLD) >> + } >> + if (flags & REF_HAVE_OLD) { >> + assert(old_sha1); >> hashcpy(update->old_oid.hash, old_sha1); >> + } > > It is hugely annoying to see a halfway-intelligent compiler forces > you to add such pointless asserts. > > The only way the compiler could error on this is by inferring the > fact that new_sha1/old_sha1 could be NULL by looking at the callsite > in ref_transaction_update() where these are used as conditionals to > set HAVE_NEW/HAVE_OLD that are passed. Even if the compiler were > doing the whole-program analysis, the other two callsites of the > function pass the address of oid.hash[] in an oid structure so it > should know these won't be NULL. > > Or is the compiler being really dumb and triggering an error for > every use of > > memcpy(dst, src, size); > > that must now be written as > > assert(src); > memcpy(dst, src, size); > > ??? That would be doubly annoying No, I think it can't quite deal with the flags that are passed in. I'm on a different machine today, so I can't actually check, but that's what I would expect at least. > I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these > codepaths, though. Perhaps we can instead declare !!new_sha1 means > we have the new side and rewrite the above part to > > if (new_sha1) > hashcpy(update->new_oid.hash, new_sha1); > > without an extra and totally pointless assert()? Yeah, that seems much nicer. I'll try that and send a new a patch (though I won't get to it before tomorrow). Thanks for the review. >> update->msg = xstrdup_or_null(msg); >> return update; >> }
Re: [PATCH] refs: make sure we never pass NULL to hashcpy
Thomas Gummererwrites: > gcc on arch linux (version 7.1.1) warns that a NULL argument is passed > as the second parameter of memcpy. > > In file included from refs.c:5:0: > refs.c: In function ‘ref_transaction_verify’: > cache.h:948:2: error: argument 2 null where non-null expected > [-Werror=nonnull] > memcpy(sha_dst, sha_src, GIT_SHA1_RAWSZ); > ^~~~ > In file included from git-compat-util.h:165:0, > from cache.h:4, > from refs.c:5: > /usr/include/string.h:43:14: note: in a call to function ‘memcpy’ declared > here > extern void *memcpy (void *__restrict __dest, const void *__restrict __src, > ^~ > ... > diff --git a/refs.c b/refs.c > index ba22f4acef..d8c12a9c44 100644 > --- a/refs.c > +++ b/refs.c > @@ -896,10 +896,14 @@ struct ref_update *ref_transaction_add_update( > > update->flags = flags; > > - if (flags & REF_HAVE_NEW) > + if (flags & REF_HAVE_NEW) { > + assert(new_sha1); > hashcpy(update->new_oid.hash, new_sha1); > - if (flags & REF_HAVE_OLD) > + } > + if (flags & REF_HAVE_OLD) { > + assert(old_sha1); > hashcpy(update->old_oid.hash, old_sha1); > + } It is hugely annoying to see a halfway-intelligent compiler forces you to add such pointless asserts. The only way the compiler could error on this is by inferring the fact that new_sha1/old_sha1 could be NULL by looking at the callsite in ref_transaction_update() where these are used as conditionals to set HAVE_NEW/HAVE_OLD that are passed. Even if the compiler were doing the whole-program analysis, the other two callsites of the function pass the address of oid.hash[] in an oid structure so it should know these won't be NULL. Or is the compiler being really dumb and triggering an error for every use of memcpy(dst, src, size); that must now be written as assert(src); memcpy(dst, src, size); ??? That would be doubly annoying. I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these codepaths, though. Perhaps we can instead declare !!new_sha1 means we have the new side and rewrite the above part to if (new_sha1) hashcpy(update->new_oid.hash, new_sha1); without an extra and totally pointless assert()? > update->msg = xstrdup_or_null(msg); > return update; > }