Re: [PATCH 2/7] get-tar-commit-id: check write_in_full() return against 0

2017-09-14 Thread Jeff King
On Wed, Sep 13, 2017 at 02:09:27PM -0700, Jonathan Nieder wrote:

> > We ask to write 41 bytes and make sure that the return value
> > is at least 41. This is the same "dangerous" pattern that
> > was fixed in the prior commit (wherein a negative return
> > value is promoted to unsigned), though it is not dangerous
> > here because our "41" is a constant, not an unsigned
> > variable.
> >
> > But we should convert it anyway to avoid modeling a
> > dangerous construct.
> >
> > Signed-off-by: Jeff King 
> > ---
> >  builtin/get-tar-commit-id.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> I kind of disagree with calling this dangerous (and I think that is
> what you alluded to above by putting it in quotes), but I like the
> postimage more than the preimage.

Right, this instance is fine, but the pattern of using "<" is not. If
you swapped out "41" for:

  size_t len = 41;

then it would be a bug. Which I think would surprise most people.

> The variable 'n' could be eliminated to simplify this further.  I
> realize that would go against the spirit of this patch, but (1) it's
> on-topic for the patch, since it is another ssize_t vs constant
> comparison and (2) as mentioned elsewhere in this thread, it's a very
> common idiom with read_in_full.  If we want to eliminate it then we
> could introduce a separate helper to distinguish between
> read_this_much_i_mean_it and read_this_much_or_to_eof.

Yes, I noticed that, too, after you brought up read_in_full() as a
potential source of problems. But I would rather deal with
read_in_full() separately on top. Can you do it as a separate patch on
top (possibly with other read_in_full() cleanups, though I think this is
the only "<" case that exists).

-Peff


Re: [PATCH 2/7] get-tar-commit-id: check write_in_full() return against 0

2017-09-13 Thread Jonathan Nieder
Hi,

Jeff King wrote:

> We ask to write 41 bytes and make sure that the return value
> is at least 41. This is the same "dangerous" pattern that
> was fixed in the prior commit (wherein a negative return
> value is promoted to unsigned), though it is not dangerous
> here because our "41" is a constant, not an unsigned
> variable.
>
> But we should convert it anyway to avoid modeling a
> dangerous construct.
>
> Signed-off-by: Jeff King 
> ---
>  builtin/get-tar-commit-id.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

I kind of disagree with calling this dangerous (and I think that is
what you alluded to above by putting it in quotes), but I like the
postimage more than the preimage.

The variable 'n' could be eliminated to simplify this further.  I
realize that would go against the spirit of this patch, but (1) it's
on-topic for the patch, since it is another ssize_t vs constant
comparison and (2) as mentioned elsewhere in this thread, it's a very
common idiom with read_in_full.  If we want to eliminate it then we
could introduce a separate helper to distinguish between
read_this_much_i_mean_it and read_this_much_or_to_eof.

Anyway, I think the below is an improvement.

With or without this tweak,
Reviewed-by: Jonathan Nieder 

Thanks.

diff --git i/builtin/get-tar-commit-id.c w/builtin/get-tar-commit-id.c
index 6d9a79f9b3..03ef7c5ba4 100644
--- i/builtin/get-tar-commit-id.c
+++ w/builtin/get-tar-commit-id.c
@@ -20,13 +20,11 @@ int cmd_get_tar_commit_id(int argc, const char **argv, 
const char *prefix)
struct ustar_header *header = (struct ustar_header *)buffer;
char *content = buffer + RECORDSIZE;
const char *comment;
-   ssize_t n;
 
if (argc != 1)
usage(builtin_get_tar_commit_id_usage);
 
-   n = read_in_full(0, buffer, HEADERSIZE);
-   if (n < HEADERSIZE)
+   if (read_in_full(0, buffer, HEADERSIZE) < HEADERSIZE)
die("git get-tar-commit-id: read error");
if (header->typeflag[0] != 'g')
return 1;


Re: [PATCH 2/7] get-tar-commit-id: check write_in_full() return against 0

2017-09-13 Thread Jeff King
On Wed, Sep 13, 2017 at 02:02:32PM -0400, Jeff King wrote:

> > Does read_in_full need a similar treatment?
> 
> It might actually return fewer than the requested number of bytes, so it
> can't just use "< 0" in the same way (nor be adapted to return 0 on
> success).  But I think it's still a bug to do:
> 
>   char buf[20];
>   size_t len = sizeof(buf);
>   if (read_in_full(fd, buf, len) < len)
>   die(...);
> 
> since that will promote the -1 to a size_t. So it's probably worth
> auditing.

I looked it over and found one potentially buggy case. In
read-pack_header(), we do:

  if (read_in_full(fd, header, sizeof(*header)) < sizeof(*header))
  /* "eof before pack header was fully read" */
  return PH_ERROR_EOF;

which will treat a read error as a silent success. I don't think it's
harmful in practice because the next line checks that the header
matches the PACK signature (which yes, means we're reading uninitialized
bytes, but the chances are 1 in 2^32 that they match the signature.
Unless the buffer had an old PACK signature in it, of course ;) ).

There's one other harmless "< len" check. There are a bunch of "!="
checks which are OK as-is. Converting them to something equally short is
hard, though we could introduce a helper similar to your write_fully(),
like:

  int read_exactly(int fd, char *buf, size_t len)
  {
ssize_t ret = read_in_full(fd, buf, len);
if (ret < 0)
return -1; /* real error */
else if (ret < len)
return -1; /* early eof */
return 0;
  }

But the trouble is that a _good_ caller actually handles those cases
separately to provide better error messages (by doing that same
if-cascade themselves). If those if's were turned into error() or die()
calls, then we'd actually be improving the callsites. But of course not
all of them actually print an error or die. And when they do, they
usually say something specific about _what_ they were trying to read.

So it may not be worth trying to do a mass-cleanup in the same way here.

-Peff


Re: [PATCH 2/7] get-tar-commit-id: check write_in_full() return against 0

2017-09-13 Thread Jeff King
On Wed, Sep 13, 2017 at 10:53:57AM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > We ask to write 41 bytes and make sure that the return value
> > is at least 41. This is the same "dangerous" pattern that
> > was fixed in the prior commit (wherein a negative return
> > value is promoted to unsigned), though it is not dangerous
> > here because our "41" is a constant, not an unsigned
> > variable.
> >
> > But we should convert it anyway to avoid modeling a
> > dangerous construct.
> 
> If the above logic is correct, then I suspect this series does not go
> far enough.  write_in_full() would be one of those APIs that is easy
> to misuse and difficult to use correctly, and if so we should fix that
> at the source instead of trying to teach callers not to hold it wrong.

Yes, this series is just removing bad examples. It doesn't do anything
to make write_in_full() less potentially dangerous.

On the other hand, it's no more or less dangerous than write(), which
has the same return-value semantics.

> E.g. what would you think of
> 
>  1. Introduce a write_fully (sorry, I am bad at names) function
> that returns 0 on success and a coccinelle semantic patch in
> contrib/coccinelle to migrate callers in "make coccicheck":

Yes, I considered that, though some callers really do care about
assigning the number of bytes written. The fact that write() has the
same problem, plus the fact that there were only 2 buggy instances
across the code base made me think there's not a huge gain to that extra
step.

> @@
> expression E;
> expression F;
> expression G;
> @@
> -write_in_full(E, F, G) < G
> +write_fully(E, F, G)
> 
>  2. Run "make coccicheck" and apply the result.
>  3. Remove the write_in_full function.

There's a step between those where you have to update all of the
write_in_full() callers that store the result. Some of them would be
trivial conversions, but some of them actually care about the length
E.g., the one in imap-send.c, which is the only one I didn't convert
away from "!= len" because it's half of an #ifdef with SSL_write()
(which uses an "int" as the return value!).

> Does read_in_full need a similar treatment?

It might actually return fewer than the requested number of bytes, so it
can't just use "< 0" in the same way (nor be adapted to return 0 on
success).  But I think it's still a bug to do:

  char buf[20];
  size_t len = sizeof(buf);
  if (read_in_full(fd, buf, len) < len)
  die(...);

since that will promote the -1 to a size_t. So it's probably worth
auditing.

-Peff


Re: [PATCH 2/7] get-tar-commit-id: check write_in_full() return against 0

2017-09-13 Thread Jonathan Nieder
Jeff King wrote:

> We ask to write 41 bytes and make sure that the return value
> is at least 41. This is the same "dangerous" pattern that
> was fixed in the prior commit (wherein a negative return
> value is promoted to unsigned), though it is not dangerous
> here because our "41" is a constant, not an unsigned
> variable.
>
> But we should convert it anyway to avoid modeling a
> dangerous construct.

If the above logic is correct, then I suspect this series does not go
far enough.  write_in_full() would be one of those APIs that is easy
to misuse and difficult to use correctly, and if so we should fix that
at the source instead of trying to teach callers not to hold it wrong.

E.g. what would you think of

 1. Introduce a write_fully (sorry, I am bad at names) function
that returns 0 on success and a coccinelle semantic patch in
contrib/coccinelle to migrate callers in "make coccicheck":

@@
expression E;
expression F;
expression G;
@@
-write_in_full(E, F, G) < G
+write_fully(E, F, G)

 2. Run "make coccicheck" and apply the result.
 3. Remove the write_in_full function.

Does read_in_full need a similar treatment?

Thanks,
Jonathan