Re: [PATCH v2 1/4] automatically ban strcpy()

2018-07-26 Thread Jeff King
On Tue, Jul 24, 2018 at 01:20:58PM -0400, Eric Sunshine wrote:

> On Tue, Jul 24, 2018 at 5:26 AM Jeff King  wrote:
> >   1. We'll only trigger with -Wimplicit-function-declaration
> >  (and only stop compilation with -Werror). These are
> >  generally enabled by DEVELOPER=1. If you _don't_ have
> >  that set, we'll still catch the problem, but only at
> >  link-time, with a slightly less useful message:
> >
> >  If instead we convert this to a reference to an
> >  undefined variable, that always dies immediately. But
> >  gcc seems to print the set of errors twice, which
> >  clutters things up.
> 
> The above does a pretty good job of convincing me that this ought to
> be implemented via an undefined variable rather than undefined
> function, exactly because it is the newcomer or casual contributor who
> is most likely to trip over a banned function, and almost certainly
> won't have DEVELOPER=1 set. The gcc clutter seems a minor point
> against the benefit this provides to that audience.

OK. I was on the fence, but it should be pretty trivial to switch. Let
me see if I can just make a replacement for patch 1, or if the whole
thing needs to be rebased on top.

-Peff


Re: [PATCH v2 1/4] automatically ban strcpy()

2018-07-24 Thread Eric Sunshine
On Tue, Jul 24, 2018 at 5:26 AM Jeff King  wrote:
>   1. We'll only trigger with -Wimplicit-function-declaration
>  (and only stop compilation with -Werror). These are
>  generally enabled by DEVELOPER=1. If you _don't_ have
>  that set, we'll still catch the problem, but only at
>  link-time, with a slightly less useful message:
>
>  If instead we convert this to a reference to an
>  undefined variable, that always dies immediately. But
>  gcc seems to print the set of errors twice, which
>  clutters things up.

The above does a pretty good job of convincing me that this ought to
be implemented via an undefined variable rather than undefined
function, exactly because it is the newcomer or casual contributor who
is most likely to trip over a banned function, and almost certainly
won't have DEVELOPER=1 set. The gcc clutter seems a minor point
against the benefit this provides to that audience.


[PATCH v2 1/4] automatically ban strcpy()

2018-07-24 Thread Jeff King
There are a few standard C functions (like strcpy) which are
easy to misuse. E.g.:

  char path[PATH_MAX];
  strcpy(path, arg);

may overflow the "path" buffer. Sometimes there's an earlier
constraint on the size of "arg", but even in such a case
it's hard to verify that the code is correct. If the size
really is unbounded, you're better off using a dynamic
helper like strbuf:

  struct strbuf path = STRBUF_INIT;
  strbuf_addstr(path, arg);

or if it really is bounded, then use xsnprintf to show your
expectation (and get a run-time assertion):

  char path[PATH_MAX];
  xsnprintf(path, sizeof(path), "%s", arg);

which makes further auditing easier.

We'd usually catch undesirable code like this in a review,
but there's no automated enforcement. Adding that
enforcement can help us be more consistent and save effort
(and a round-trip) during review.

This patch teaches the compiler to report an error when it
sees strcpy (and will become a model for banning a few other
functions). This has a few advantages over a separate
linting tool:

  1. We know it's run as part of a build cycle, so it's
 hard to ignore. Whereas an external linter is an extra
 step the developer needs to remember to do.

  2. Likewise, it's basically free since the compiler is
 parsing the code anyway.

  3. We know it's robust against false positives (unlike a
 grep-based linter).

The two big disadvantages are:

  1. We'll only check code that is actually compiled, so it
 may miss code that isn't triggered on your particular
 system. But since presumably people don't add new code
 without compiling it (and if they do, the banned
 function list is the least of their worries), we really
 only care about failing to clean up old code when
 adding new functions to the list. And that's easy
 enough to address with a manual audit when adding a new
 function.

  2. If this ends up generating false positives, it's going
 to be harder to disable (as opposed to a separate
 linter, which may have mechanisms for overriding a
 particular case).

 But the intent is to only ban functions which are
 obviously bad, and for which we accept using an
 alternative even when this particular use isn't buggy
 (e.g., the xsnprintf alternative above).

The implementation here is simple: we'll define a macro for
the banned function which replaces it with a descriptively
named but undefined function.  Replacing it with any invalid
code would work (since we just want to break compilation).
But ideally we'd meet these goals:

 - it should be portable; ideally this would trigger
   everywhere, and does not need to be part of a DEVELOPER=1
   setup (because unlike warnings which may depend on the
   compiler or system, this is a clear indicator of
   something wrong in the code).

 - it should generate a readable error that gives the
   developer a clue what happened

 - it should avoid generating too much other cruft that
   makes it hard to see the actual error

 - it should mention the original callsite in the error

The output with this patch looks like this (using gcc 7, on
a checkout without 022d2ac1f3, which removes the final
strcpy from blame.c):

  CC builtin/blame.o
  In file included from ./git-compat-util.h:1242:0,
   from ./cache.h:4,
   from builtin/blame.c:8:
  builtin/blame.c: In function ‘cmd_blame’:
  ./banned.h:11:22: error: implicit declaration of function 
‘sorry_strcpy_is_a_banned_function’ [-Werror=implicit-function-declaration]
   #define BANNED(func) sorry_##func##_is_a_banned_function()
^
  ./banned.h:13:21: note: in expansion of macro ‘BANNED’
   #define strcpy(x,y) BANNED(strcpy)
   ^~
  builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
  strcpy(repeated_meta_color, GIT_COLOR_CYAN);
  ^~

This pretty clearly shows the original callsite along with
the descriptive function name, and it mentions banned.h,
which contains more information.

What's not perfectly ideal is:

  1. We'll only trigger with -Wimplicit-function-declaration
 (and only stop compilation with -Werror). These are
 generally enabled by DEVELOPER=1. If you _don't_ have
 that set, we'll still catch the problem, but only at
 link-time, with a slightly less useful message:

   /usr/bin/ld: builtin/blame.o: in function `cmd_blame':
   /home/peff/tmp/builtin/blame.c:1074: undefined reference to 
`sorry_strcpy_is_a_banned_function'
   collect2: error: ld returned 1 exit status

 If instead we convert this to a reference to an
 undefined variable, that always dies immediately. But
 gcc seems to print the set of errors twice, which
 clutters things up.

  2. The expansion through BANNED() adds an extra layer of
 error. Curiously, though, removing it (and just
 expanding strcpy directly to the bogus function name)
 causes gcc _not_ to report the