Re: [PATCH 0/10] towards clean leak-checker output

2017-09-06 Thread Jeff King
On Tue, Sep 05, 2017 at 10:41:51PM +0200, Martin Ågren wrote:

> FWIW, this series (combined with the other series you mentioned) makes
> t and t0001 pass for me with gcc/clang. There are actually some
> leaks in t, they're just silently being reported to stderr, since
> the exit statuses from git are hidden by pipes. Maybe you're already
> aware of it. Depending on your definition of "running clean" it might be
> out of scope for this series, which is of course still very interesting
> and enlightening as it stands.

Yeah, I saw those. I don't think they're worth hunting down at this
point. It's quite probable that the same leaks are exercised elsewhere,
and we'll catch them later.

Once the whole test suite runs without reporting any leaks via abort(),
I'll feel more compelled to start grepping stderr for other cases. :)

> One is in cmd_show (you did mention git show) where we leak data in rev.
> The other is some use of strdup. I can't immediately figure out how to
> get a useful stacktrace (you mentioned this as well) and it's past
> bed-time here. I'll try to play more with this tomorrow.

I think I got funny truncated stack traces with just LSAN that went away
with ASAN, but I didn't dig. If you try that, remember that:

  - Technically just SANITIZE=address turns on the leak detector, but
you need "leak" in the string to turn on the UNLEAK() magic. So use
"SANITIZE=address,leak".

  - You'll have to set ASAN_OPTIONS=abort_on_error=1 in the environment
manually to enable leak detection.

Once the test suite passes with leak detection on, I think we'd probably
do both of those automatically (but until then they make ASAN by itself
unusable).

> Note for self: getting rid of all pipes would probably also help flush
> out a few leaks (or "introduce" them, depending on your viewpoint).

Agreed. These leaks are really just a special form of bug. Those pipes
could be hiding other bugs, too. But again, I'm not too concerned. The
places that pipe git are doing so because they are not meant to be
testing that particular command (and it should generally be covered
elsewhere).

-Peff


Re: [PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Junio C Hamano
Jeff King  writes:

> In fact, the whole extract_argv0_path thing is pointless without
> RUNTIME_PREFIX. So I think one reasonable fix is just:
>
> diff --git a/exec_cmd.c b/exec_cmd.c
> index fb94aeba9c..09f05c3bc3 100644
> --- a/exec_cmd.c
> +++ b/exec_cmd.c
> @@ -5,7 +5,10 @@
>  #define MAX_ARGS 32
>  
>  static const char *argv_exec_path;
> +
> +#ifdef RUNTIME_PREFIX
>  static const char *argv0_path;
> +#endif
>  
>  char *system_path(const char *path)
>  {
> @@ -40,6 +43,7 @@ char *system_path(const char *path)
>  
>  void git_extract_argv0_path(const char *argv0)
>  {
> +#ifdef RUNTIME_PREFIX
>   const char *slash;
>  
>   if (!argv0 || !*argv0)
> @@ -49,6 +53,7 @@ void git_extract_argv0_path(const char *argv0)
>  
>   if (slash)
>   argv0_path = xstrndup(argv0, slash - argv0);
> +#endif
>  }
>  
>  void git_set_argv_exec_path(const char *exec_path)
>
> -Peff

Yeah, I kind of like this (I would have reduced the ifdef noise by
leaving a totally-unused argv0_path in the BSS, though).



Re: [PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Martin Ågren
On 5 September 2017 at 21:02, Jeff King  wrote:
> On Tue, Sep 05, 2017 at 07:50:10PM +0200, Martin Ågren wrote:
>
>> > That line is the setting of argv0_path, which is a global (and thus
>> > shouldn't be marked as leaking). Interestingly, it only happens with
>> > -O2. Compiling with -O0 works fine. I'm not sure if it's a bug or
>> > what.
>> >
>> > I did most of my testing with clang-6.0, which gets this case right.
>>
>> Hmmm, I got the same wrong results (IMHO) from Valgrind, which
>> classified this as "definitely lost". Like you I found that -O0 helped.
>> And yes, that was with gcc. Maybe gcc with optimization somehow manages
>> to hide the pointers from these tools. I know too little about the
>> technical details to have any real ideas, though. My searches did not
>> bring up anything useful. (gcc 5.4.0)
>
> Yeah, I think it is just optimizing out the variable entirely. If
> RUNTIME_PREFIX isn't defined (and it's not for non-Windows platforms)
> then we never look at the variable at all, and it's a dead assignment.
> And the compiler can see that easily because it's got static linkage. So
> it drops the variable completely, but it can't drop the call to
> xstrdup() with the information in exec_cmd.c. It has to call the
> function and throw away the result, resulting in the leak.

I see. Yeah, that makes sense.

FWIW, this series (combined with the other series you mentioned) makes
t and t0001 pass for me with gcc/clang. There are actually some
leaks in t, they're just silently being reported to stderr, since
the exit statuses from git are hidden by pipes. Maybe you're already
aware of it. Depending on your definition of "running clean" it might be
out of scope for this series, which is of course still very interesting
and enlightening as it stands.

One is in cmd_show (you did mention git show) where we leak data in rev.
The other is some use of strdup. I can't immediately figure out how to
get a useful stacktrace (you mentioned this as well) and it's past
bed-time here. I'll try to play more with this tomorrow.

Note for self: getting rid of all pipes would probably also help flush
out a few leaks (or "introduce" them, depending on your viewpoint).

Martin


Re: [PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Jeff King
On Tue, Sep 05, 2017 at 07:50:10PM +0200, Martin Ågren wrote:

> > That line is the setting of argv0_path, which is a global (and thus
> > shouldn't be marked as leaking). Interestingly, it only happens with
> > -O2. Compiling with -O0 works fine. I'm not sure if it's a bug or
> > what.
> >
> > I did most of my testing with clang-6.0, which gets this case right.
> 
> Hmmm, I got the same wrong results (IMHO) from Valgrind, which
> classified this as "definitely lost". Like you I found that -O0 helped.
> And yes, that was with gcc. Maybe gcc with optimization somehow manages
> to hide the pointers from these tools. I know too little about the
> technical details to have any real ideas, though. My searches did not
> bring up anything useful. (gcc 5.4.0)

Yeah, I think it is just optimizing out the variable entirely. If
RUNTIME_PREFIX isn't defined (and it's not for non-Windows platforms)
then we never look at the variable at all, and it's a dead assignment.
And the compiler can see that easily because it's got static linkage. So
it drops the variable completely, but it can't drop the call to
xstrdup() with the information in exec_cmd.c. It has to call the
function and throw away the result, resulting in the leak.

In fact, the whole extract_argv0_path thing is pointless without
RUNTIME_PREFIX. So I think one reasonable fix is just:

diff --git a/exec_cmd.c b/exec_cmd.c
index fb94aeba9c..09f05c3bc3 100644
--- a/exec_cmd.c
+++ b/exec_cmd.c
@@ -5,7 +5,10 @@
 #define MAX_ARGS   32
 
 static const char *argv_exec_path;
+
+#ifdef RUNTIME_PREFIX
 static const char *argv0_path;
+#endif
 
 char *system_path(const char *path)
 {
@@ -40,6 +43,7 @@ char *system_path(const char *path)
 
 void git_extract_argv0_path(const char *argv0)
 {
+#ifdef RUNTIME_PREFIX
const char *slash;
 
if (!argv0 || !*argv0)
@@ -49,6 +53,7 @@ void git_extract_argv0_path(const char *argv0)
 
if (slash)
argv0_path = xstrndup(argv0, slash - argv0);
+#endif
 }
 
 void git_set_argv_exec_path(const char *exec_path)

-Peff


Re: [PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Martin Ågren
On 5 September 2017 at 15:01, Jeff King  wrote:
> Using leak-checking tools like valgrind or LSAN is usually not all that
> productive with Git, because the output is far from clean. And _most_ of
> these are just not interesting, as they're either:
>
>   1. Real leaks, but ones that only be triggered a small, set number of
>  times per program.
>
>   2. Intentional leaks of variables as the main cmd_* functions go out
>  of scope (as opposed to manually cleaning).
>
> I focused here on getting t and t0001 to run clean against a
> leak-checking tool. That's a fraction of the total test suite, but my
> goal was have a tractable sample which could let us see if the path
> seems sane.
>
> I feel positive overall about the approach this series takes. The
> suppressions aren't too terrible to write, and I found some real (albeit
> minor) leaks in these few tests. I hope it can serve as a base for an
> ongoing effort to get the whole test suite clean.
>
> A few things to note:
>
>   - I switched from valgrind to ASAN/LSAN early on. It's just way
> faster, which makes the compile-test-fix cycle a lot more pleasant.
> With these patches, you should be able to do:
>
>   make SANITIZE=leak && (cd t && ./t1234-* -v -i)
>
> and get a leak report for a single script dumped to stderr.
>
> If you want to try it with t, you'll need the lock-file series I
> just posted at:
>
>   
> https://public-inbox.org/git/20170905121353.62zg3mtextmq5...@sigill.intra.peff.net/
>
> and the fix from Martin at:
>
>   
> https://public-inbox.org/git/CAN0heSqn59sFF3A-eQ593G+ZDWnO9pKM5F=sgisqk+prur-...@mail.gmail.com/
>
> to get a clean run.
>
>   - I'm using LSAN instead of the full-on ASAN because it's faster. The
> docs warn that it's a bit experimental, and I did notice a few funny
> behaviors (like truncated backtraces), but overall it seems fine.
> You can also do:
>
>   make SANITIZE=address &&
> (cd t && ASAN_OPTIONS=abort_on_error=1 ./t1234-* -v -i)
>
> to do a full ASAN run (the extra environment setting is necessary to
> override test-lib's defaults).
>
>   - gcc's leak-checker doesn't seem to handle reachability correctly (or
> maybe I'm holding it wrong). As a simple case, building with ASAN
> and running git-init complains:
>
>   $ make SANITIZE=address && ./git init foo
>   ...
>   Direct leak of 2 byte(s) in 1 object(s) allocated from:
>   #0 0x7f011dc699e0 in malloc 
> (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd99e0)
>   #1 0x558eeedbdab5 in do_xmalloc /home/peff/compile/git/wrapper.c:60
>   #2 0x558eeedbdbab in do_xmallocz 
> /home/peff/compile/git/wrapper.c:100
>   #3 0x558eeedbdd0d in xmallocz /home/peff/compile/git/wrapper.c:108
>   #4 0x558eeedbdd0d in xmemdupz /home/peff/compile/git/wrapper.c:124
>   #5 0x558eeedbdd0d in xstrndup /home/peff/compile/git/wrapper.c:130
>   #6 0x558eeea0507a in main /home/peff/compile/git/common-main.c:39
>   #7 0x7f011cf612e0 in __libc_start_main 
> (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
>
> That line is the setting of argv0_path, which is a global (and thus
> shouldn't be marked as leaking). Interestingly, it only happens with
> -O2. Compiling with -O0 works fine. I'm not sure if it's a bug or
> what.
>
> I did most of my testing with clang-6.0, which gets this case right.

Hmmm, I got the same wrong results (IMHO) from Valgrind, which
classified this as "definitely lost". Like you I found that -O0 helped.
And yes, that was with gcc. Maybe gcc with optimization somehow manages
to hide the pointers from these tools. I know too little about the
technical details to have any real ideas, though. My searches did not
bring up anything useful. (gcc 5.4.0)

I guess clang will be my next attempt. And asan/lsan. Valgrind is slow..

I'll look through this series and get back if I have any input.

>   - I have no idea who close or far this is to covering the whole suite.
> Only 98 test scripts complete with no leaks. But many of those
> failures will be hitting the same leaks over and over. It looks like
> running "git show' hits a few, which is going to affect a lot of
> scripts. But bear in mind when reading the patches that this might
> be the first step on a successful road, or it could be a dead end. :)
>
> Most of this is actual leak fixes. The interesting part, I think, is the
> UNLEAK annotation in patch 10.
>
>   [01/10]: test-lib: --valgrind should not override --verbose-log
>   [02/10]: test-lib: set LSAN_OPTIONS to abort by default
>   [03/10]: add: free leaked pathspec after add_files_to_cache()
>   [04/10]: update-index: fix cache entry leak in add_one_file()
>   [05/10]: config: plug user_config leak
>   [06/10]: reset: make tree counting less confusing
>   [07/10]: reset: free allocated tree buffers
>   [08/10]: repository: free fields before overwriting 

[PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Jeff King
Using leak-checking tools like valgrind or LSAN is usually not all that
productive with Git, because the output is far from clean. And _most_ of
these are just not interesting, as they're either:

  1. Real leaks, but ones that only be triggered a small, set number of
 times per program.

  2. Intentional leaks of variables as the main cmd_* functions go out
 of scope (as opposed to manually cleaning).

I focused here on getting t and t0001 to run clean against a
leak-checking tool. That's a fraction of the total test suite, but my
goal was have a tractable sample which could let us see if the path
seems sane.

I feel positive overall about the approach this series takes. The
suppressions aren't too terrible to write, and I found some real (albeit
minor) leaks in these few tests. I hope it can serve as a base for an
ongoing effort to get the whole test suite clean.

A few things to note:

  - I switched from valgrind to ASAN/LSAN early on. It's just way
faster, which makes the compile-test-fix cycle a lot more pleasant.
With these patches, you should be able to do:

  make SANITIZE=leak && (cd t && ./t1234-* -v -i)

and get a leak report for a single script dumped to stderr.

If you want to try it with t, you'll need the lock-file series I
just posted at:

  
https://public-inbox.org/git/20170905121353.62zg3mtextmq5...@sigill.intra.peff.net/

and the fix from Martin at:

  
https://public-inbox.org/git/CAN0heSqn59sFF3A-eQ593G+ZDWnO9pKM5F=sgisqk+prur-...@mail.gmail.com/

to get a clean run.

  - I'm using LSAN instead of the full-on ASAN because it's faster. The
docs warn that it's a bit experimental, and I did notice a few funny
behaviors (like truncated backtraces), but overall it seems fine.
You can also do:

  make SANITIZE=address &&
(cd t && ASAN_OPTIONS=abort_on_error=1 ./t1234-* -v -i)

to do a full ASAN run (the extra environment setting is necessary to
override test-lib's defaults).

  - gcc's leak-checker doesn't seem to handle reachability correctly (or
maybe I'm holding it wrong). As a simple case, building with ASAN
and running git-init complains:

  $ make SANITIZE=address && ./git init foo
  ...
  Direct leak of 2 byte(s) in 1 object(s) allocated from:
  #0 0x7f011dc699e0 in malloc 
(/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd99e0)
  #1 0x558eeedbdab5 in do_xmalloc /home/peff/compile/git/wrapper.c:60
  #2 0x558eeedbdbab in do_xmallocz /home/peff/compile/git/wrapper.c:100
  #3 0x558eeedbdd0d in xmallocz /home/peff/compile/git/wrapper.c:108
  #4 0x558eeedbdd0d in xmemdupz /home/peff/compile/git/wrapper.c:124
  #5 0x558eeedbdd0d in xstrndup /home/peff/compile/git/wrapper.c:130
  #6 0x558eeea0507a in main /home/peff/compile/git/common-main.c:39
  #7 0x7f011cf612e0 in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x202e0)

That line is the setting of argv0_path, which is a global (and thus
shouldn't be marked as leaking). Interestingly, it only happens with
-O2. Compiling with -O0 works fine. I'm not sure if it's a bug or
what.

I did most of my testing with clang-6.0, which gets this case right.

  - I have no idea who close or far this is to covering the whole suite.
Only 98 test scripts complete with no leaks. But many of those
failures will be hitting the same leaks over and over. It looks like
running "git show' hits a few, which is going to affect a lot of
scripts. But bear in mind when reading the patches that this might
be the first step on a successful road, or it could be a dead end. :)

Most of this is actual leak fixes. The interesting part, I think, is the
UNLEAK annotation in patch 10.

  [01/10]: test-lib: --valgrind should not override --verbose-log
  [02/10]: test-lib: set LSAN_OPTIONS to abort by default
  [03/10]: add: free leaked pathspec after add_files_to_cache()
  [04/10]: update-index: fix cache entry leak in add_one_file()
  [05/10]: config: plug user_config leak
  [06/10]: reset: make tree counting less confusing
  [07/10]: reset: free allocated tree buffers
  [08/10]: repository: free fields before overwriting them
  [09/10]: set_git_dir: handle feeding gitdir to itself
  [10/10]: add UNLEAK annotation for reducing leak false positives

 Makefile   |  3 +++
 builtin/add.c  |  3 +++
 builtin/commit.c   |  1 +
 builtin/config.c   | 11 +--
 builtin/init-db.c  |  2 ++
 builtin/ls-files.c |  1 +
 builtin/reset.c| 24 +---
 builtin/update-index.c |  4 +++-
 builtin/worktree.c |  2 ++
 environment.c  |  4 +++-
 git-compat-util.h  |  7 +++
 repository.c   | 14 +++---
 setup.c|  5 -
 t/test-lib.sh  |  7 ++-
 usage.c| 13 +
 15 files changed, 77 insertions(+), 24 deletions(-)

-Peff