Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-11-21 Thread Julien Moutinho
Le ven. 19 nov. 2021 17h17 -0500, Chet Ramey a écrit :
> This fix is in the most recent set of patches I released this week
> (it's patch 9).
Thank you!

They've been merged a few days ago in the staging branch of NixOS:
https://github.com/NixOS/nixpkgs/pull/146463



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-11-19 Thread Chet Ramey

On 10/12/21 3:38 PM, Julien Moutinho wrote:

Le mar. 05 oct. 2021 16h12 -0400, Chet Ramey a écrit :

On 10/5/21 1:50 PM, Dominique Martinet wrote:

If I change malloc_usable_size to return p->mh_nbytes instead of
maxbytes, then the crash disappears.[2]


That's the right fix.


Chet, when you'll have time, would you mind publishing such fix at:
https://ftp.gnu.org/gnu/bash/bash-5.1-patches/
Because then I could try to get it merged into Nixpkgs before
the next biannual release of NixOS next November.


This fix is in the most recent set of patches I released this week
(it's patch 9).

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-12 Thread Julien Moutinho
Le mar. 05 oct. 2021 16h12 -0400, Chet Ramey a écrit :
> On 10/5/21 1:50 PM, Dominique Martinet wrote:
> > If I change malloc_usable_size to return p->mh_nbytes instead of
> > maxbytes, then the crash disappears.[2]
> 
> That's the right fix.

Chet, when you'll have time, would you mind publishing such fix at:
https://ftp.gnu.org/gnu/bash/bash-5.1-patches/
Because then I could try to get it merged into Nixpkgs before
the next biannual release of NixOS next November.

Thanks in advance!



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-05 Thread Chet Ramey
On 10/5/21 1:50 PM, Dominique Martinet wrote:

> If I change malloc_usable_size to return p->mh_nbytes instead of
> maxbytes, then the crash disappears.[2]
> 
> I did not read the full bash malloc code but I suspect the buffer really
> could be grown, but we would need to fix p->mh_nbytes to maxbytes and
> also adjust the end block to pass sanity checks on free -- e.g. it
> should be considered as a lightweight inplace realloc.
> 
> I'm not sure we care enough to be honest and returning what is really
> usable feels like the simplest solution, what do you think?
> 

Thanks for your work tracking this down.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-05 Thread Chet Ramey
On 10/5/21 1:50 PM, Dominique Martinet wrote:

> Turns out I'm lucky enough on address consistency..
> 
> So,
>  - since we have a nice before/after with systemd, I took a moment to
> bisect it.
> It comes down to this commit[1] which is basically using
> malloc_usable_size() to use buffers beyond what it initially requested
> 
> [1] 
> https://github.com/systemd/systemd/commit/319a4f4bc46b230fc660321e99aaac1bc449deea
> 
> - through gdb/printf I see that the failing pointer has been allocated
> with (what comes down to) malloc(64), but malloc_usable_size returns
> 108, so systemd happily writes beyond the 64 bytes it originally
> requested -- which bash allocator treats as an overflow.

Sure. The bash malloc_usable_size basically returns the size of the chunk:
what you could realloc it to without having to go back for more memory
(that's how I interpreted `usable bytes', but that's clearly not how it's
intended). I suppose without the bounds checking that would have been fine.


> If I change malloc_usable_size to return p->mh_nbytes instead of
> maxbytes, then the crash disappears.[2]

That's the right fix.

> 
> I did not read the full bash malloc code but I suspect the buffer really
> could be grown, but we would need to fix p->mh_nbytes to maxbytes and
> also adjust the end block to pass sanity checks on free -- e.g. it
> should be considered as a lightweight inplace realloc.

If you want realloc, use realloc.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-05 Thread Dominique Martinet
Chet Ramey wrote on Mon, Oct 04, 2021 at 10:11:10PM -0400:
> > I'm running busybox sh in a unit (which starts properly), then
> > interactively test things from there.
> > 
> > Running in gdb does fail the same way as running normally, so I've also
> > been looking at that a bit, but nothing obvious poped up.
> > I'd like to trace back which allocation corresponds to the failing one,
> > and break from there next time.
> 
> Without the allocation tracing, which you don't get from allocations that
> aren't directly from bash, it's tedious, but doable. It only works if the
> memory layout is consistent enough that you get the same address failing
> every time (that is, the address being passed to the failing free() is the
> same).
> 
> If you do, you write it down, put a breakpoint in internal_malloc at the
> point where it returns the memory to its caller, and just run, breaking
> at that point, until malloc returns that address. Then run a backtrace and
> see where you are.

Turns out I'm lucky enough on address consistency..

So,
 - since we have a nice before/after with systemd, I took a moment to
bisect it.
It comes down to this commit[1] which is basically using
malloc_usable_size() to use buffers beyond what it initially requested

[1] 
https://github.com/systemd/systemd/commit/319a4f4bc46b230fc660321e99aaac1bc449deea

- through gdb/printf I see that the failing pointer has been allocated
with (what comes down to) malloc(64), but malloc_usable_size returns
108, so systemd happily writes beyond the 64 bytes it originally
requested -- which bash allocator treats as an overflow.


copying bash's malloc_usable_size here for convenience:

malloc_usable_size (mem)
 void *mem;
{
  register union mhead *p;
  register char *ap;
  register int maxbytes;


  if ((ap = (char *)mem) == 0)
return 0;

  /* Find the true start of the memory block to discover which bin */
  p = (union mhead *) ap - 1;
  if (p->mh_alloc == ISMEMALIGN)
{
  ap -= p->mh_nbytes;
  p = (union mhead *) ap - 1;
}

  /* XXX - should we return 0 if ISFREE? */
  maxbytes = binsize(p->mh_index);

  /* So the usable size is the maximum number of bytes in the bin less the
 malloc overhead */
  maxbytes -= MOVERHEAD + MSLOP;
  return (maxbytes);
}


If I change malloc_usable_size to return p->mh_nbytes instead of
maxbytes, then the crash disappears.[2]

I did not read the full bash malloc code but I suspect the buffer really
could be grown, but we would need to fix p->mh_nbytes to maxbytes and
also adjust the end block to pass sanity checks on free -- e.g. it
should be considered as a lightweight inplace realloc.

I'm not sure we care enough to be honest and returning what is really
usable feels like the simplest solution, what do you think?



[2] return p->mh_nbytes
--
diff --git a/lib/malloc/malloc.c b/lib/malloc/malloc.c
index 439f8ef11af2..ad27d793b7f4 100644
--- a/lib/malloc/malloc.c
+++ b/lib/malloc/malloc.c
@@ -1272,8 +1272,6 @@ malloc_usable_size (mem)
 {
   register union mhead *p;
   register char *ap;
-  register int maxbytes;
-
 
   if ((ap = (char *)mem) == 0)
 return 0;
@@ -1286,13 +1284,7 @@ malloc_usable_size (mem)
   p = (union mhead *) ap - 1;
 }
 
-  /* XXX - should we return 0 if ISFREE? */
-  maxbytes = binsize(p->mh_index);
-
-  /* So the usable size is the maximum number of bytes in the bin less the
- malloc overhead */
-  maxbytes -= MOVERHEAD + MSLOP;
-  return (maxbytes);
+  return p->mh_nbytes;
 }
 
 #if !defined (NO_VALLOC)

--


Thanks,
-- 
Dominique



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey

On 10/4/21 9:44 PM, Dominique Martinet wrote:

Chet Ramey wrote on Mon, Oct 04, 2021 at 09:23:11PM -0400:

   - I could reproduce the same as Julien, with -DDISABLE_MALLOC_WRAPPERS
the crash still happens when bash is run directly but nothing complains
in valgrind.


I assume you mean using systemd. Has anyone tried running a bash linked to
the systemd library that provides the getpw functions, but not as a systemd
unit? You could then run it in a debugger if it crashes, for instance.


I'm running busybox sh in a unit (which starts properly), then
interactively test things from there.

Running in gdb does fail the same way as running normally, so I've also
been looking at that a bit, but nothing obvious poped up.
I'd like to trace back which allocation corresponds to the failing one,
and break from there next time.


Without the allocation tracing, which you don't get from allocations that
aren't directly from bash, it's tedious, but doable. It only works if the
memory layout is consistent enough that you get the same address failing
every time (that is, the address being passed to the failing free() is the
same).

If you do, you write it down, put a breakpoint in internal_malloc at the
point where it returns the memory to its caller, and just run, breaking
at that point, until malloc returns that address. Then run a backtrace and
see where you are.



If you have a nixos system, not necessarily on master, this should be
enough to reproduce:


I don't.


Hm, that won't necessary work with LTO though.
If they call reallocarray, which is in libc, LTO means reallocarray can
call libc's realloc without going through symbol interposition.
(That's a discussion that came up on fedora-devel mailing list when
talking about LTO, and breaking LD_PRELOAD no longer overloading calls
internal to a lib)


Yeah, that would break a whole bunch of things that have worked forever.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Dominique Martinet
Chet Ramey wrote on Mon, Oct 04, 2021 at 09:23:11PM -0400:
> >   - I could reproduce the same as Julien, with -DDISABLE_MALLOC_WRAPPERS
> > the crash still happens when bash is run directly but nothing complains
> > in valgrind.
> 
> I assume you mean using systemd. Has anyone tried running a bash linked to
> the systemd library that provides the getpw functions, but not as a systemd
> unit? You could then run it in a debugger if it crashes, for instance.

I'm running busybox sh in a unit (which starts properly), then
interactively test things from there.

Running in gdb does fail the same way as running normally, so I've also
been looking at that a bit, but nothing obvious poped up.
I'd like to trace back which allocation corresponds to the failing one,
and break from there next time.

If you have a nixos system, not necessarily on master, this should be
enough to reproduce:
$ git clone -b master git://github.com/NixOS/nixpkgs.git
$ cd nixpkgs
$ nix-shell -A systemd
^ this will fetch systemd 249.4
  $ echo $out
  /nix/store/5iw38x4l9nyp2srnlxfi9l1q9np0pchq-systemd-249.4
  $ exit
$ nix-shell -A bash
^ likewise with bash 5.1-p8, feel free to prepare it otherwise
  $ echo $out
  /nix/store/qfb4j7w2fjjq953nd9xncz5mymj5n0kb-bash-5.1-p8
  $ exit
$ su
# systemd-run --pipe -p DynamicUser=1 -p BindPaths=/etc -p BindPaths=/nix -p 
BindPaths=/run -p MountAPIVFS=1 -p RootDirectory=/tmp $(readlink -e $(which 
bash))
[I have no name!@odin:/]$ 
LD_LIBRARY_PATH=/nix/store/5iw38x4l9nyp2srnlxfi9l1q9np0pchq-systemd-249.4/lib 
/nix/store/qfb4j7w2fjjq953nd9xncz5mymj5n0kb-bash-5.1-p8/bin/bash --norc -c 
'echo ~'
malloc: unknown:0: assertion botched
free: start and end chunk sizes differ
Aborting...


This would probably work with a non-nixos system, installing only the
nix command, but I didn't try.


> > This could mean that systemd is overflowing bash malloc safeguards as
> > you pointed out (I just don't understand why it wouldn't overflow with
> > internal malloc), but it could also mean that the memory has been
> > allocated somewhere else (e.g. libc's malloc) and freed by bash malloc.
> 
> I have a tough time with that one. If the bash free/realloc get memory that
> the bash malloc hasn't allocated, you're going to fail several sanity tests
> before you get to the point of checking for overflow.

Ok.
I was writing that with the assumption that bash would fail the check by
reading some uninialized memory after the allocated buffer, but you're
right that other sanity checks would fail before.. and valgrind would
have spotted that anyway, it definitely wasn't a good idea.

> > nss systemd has started using reallocarray() since v247 and that is not
> > tracked by bash, I would think that's a good candidate?
> 
> I can't see how? reallocarray() is not a memory allocation primitive. It's
> going to call malloc/realloc to do its work (it's essentially just a call
> to realloc(mem, nmemb * size)). Those will eventually call the bash
> malloc/realloc/free.

Hm, that won't necessary work with LTO though.
If they call reallocarray, which is in libc, LTO means reallocarray can
call libc's realloc without going through symbol interposition.
(That's a discussion that came up on fedora-devel mailing list when
talking about LTO, and breaking LD_PRELOAD no longer overloading calls
internal to a lib)

I assume nixos does not compile glibc with LTO, so in practice you are
correct here - it should fall back to bash's realloc.
(that explains my question about earlier versions of nss systemd
working)

-- 
Dominique



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey

On 10/4/21 8:39 PM, Dominique Martinet wrote:


(not sure how that works? bash internal
malloc just passes to free pointers it doesn't know about?)


What does this mean? What pointers it doesn't know about? (If bash realloc
or free gets a pointer that wasn't allocated by the bash malloc, it
throws an error.)

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey

On 10/4/21 8:15 PM, Dominique Martinet wrote:


(I've been following this with Julien as I can reproduce the behaviour
on my nixos system -- you don't have to run the latest systemd, just
install the derivation and use its path in LD_LIBRARY_PATH instead of
the system's... That also probably could bring its own set of
incompatibility but so far I'm getting the same behaviour as him running
systemd properly)


I'd love to have a simpler reproducer here. It's just hard for me to see
how just changing systemd, and nothing else, produces an error. There might
be some obscure bug in the bash malloc, but the code we're talking about
here -- the code that detects overflow -- hasn't changed in years.


OTOH, valgrind *should* complain when using the system malloc (configure
--without-bash-malloc), and it does not, so for me that means there
really is some weird thing happening. Forgive me for trusting valgrind
analysis more than bash malloc debugging here...


The bash overflow detection is much less sophisticated than valgrind,
that's for sure, but it's so simple it's fairly easy to verify.



  - I could reproduce the same as Julien, with -DDISABLE_MALLOC_WRAPPERS
the crash still happens when bash is run directly but nothing complains
in valgrind.


I assume you mean using systemd. Has anyone tried running a bash linked to
the systemd library that provides the getpw functions, but not as a systemd
unit? You could then run it in a debugger if it crashes, for instance.


This could mean that systemd is overflowing bash malloc safeguards as
you pointed out (I just don't understand why it wouldn't overflow with
internal malloc), but it could also mean that the memory has been
allocated somewhere else (e.g. libc's malloc) and freed by bash malloc.


I have a tough time with that one. If the bash free/realloc get memory that
the bash malloc hasn't allocated, you're going to fail several sanity tests
before you get to the point of checking for overflow.



nss systemd has started using reallocarray() since v247 and that is not
tracked by bash, I would think that's a good candidate?


I can't see how? reallocarray() is not a memory allocation primitive. It's
going to call malloc/realloc to do its work (it's essentially just a call
to realloc(mem, nmemb * size)). Those will eventually call the bash
malloc/realloc/free.


I don't have time right now, but I think adding an implementation for
reallocarray (wrapper around realloc which does exist) would be the next
thing to do.


You could try it, but I'm skeptical that it will have any effect.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Dominique Martinet
Dominique Martinet wrote on Tue, Oct 05, 2021 at 09:15:48AM +0900:
>  - I could reproduce the same as Julien, with -DDISABLE_MALLOC_WRAPPERS
> the crash still happens when bash is run directly but nothing complains
> in valgrind.
> This could mean that systemd is overflowing bash malloc safeguards as
> you pointed out (I just don't understand why it wouldn't overflow with
> internal malloc), but it could also mean that the memory has been
> allocated somewhere else (e.g. libc's malloc) and freed by bash malloc.
> 
> nss systemd has started using reallocarray() since v247 and that is not
> tracked by bash, I would think that's a good candidate?
> 
> I don't have time right now, but I think adding an implementation for
> reallocarray (wrapper around realloc which does exist) would be the next
> thing to do.

grmbl, curiosity killed the cat so I actually took a moment to try, and
while reallocarray *is* called, it doesn't seem to change anything, and
already was used plenty before (not sure how that works? bash internal
malloc just passes to free pointers it doesn't know about?)

So back to square one.


Here's the patch I used if anyone cares:

diff --git a/lib/malloc/malloc.c b/lib/malloc/malloc.c
index 439f8ef11af2..8819cadca3a7 100644
--- a/lib/malloc/malloc.c
+++ b/lib/malloc/malloc.c
@@ -1440,6 +1440,20 @@ realloc (mem, nbytes)
   return internal_realloc (mem, nbytes, (char *)NULL, 0, 0);
 }
 
+PTR_T
+reallocarray (mem, nmemb, size)
+   PTR_T mem;
+   size_t  nmemb;
+   size_t size;
+{
+  size_t nbytes;
+  if (__builtin_mul_overflow(nmemb, size, )) {
+ errno = ENOMEM;
+ return 0;
+  }
+  return internal_realloc (mem, nbytes, (char *)NULL, 0, 0);
+}
+
 void
 free (mem)
  PTR_T mem;

-- 
Dominique



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Dominique Martinet
Hello,

(I've been following this with Julien as I can reproduce the behaviour
on my nixos system -- you don't have to run the latest systemd, just
install the derivation and use its path in LD_LIBRARY_PATH instead of
the system's... That also probably could bring its own set of
incompatibility but so far I'm getting the same behaviour as him running
systemd properly)


So:
 - I'm with Andreas on this, valgrind would detect invalid accesses
--- except that if you stack both bash malloc and valgrind, then
valgrind will consider the slightly bigger buffer allocated by bash, so
the underlying overflow will not be detected when using the internal
malloc.

OTOH, valgrind *should* complain when using the system malloc (configure
--without-bash-malloc), and it does not, so for me that means there
really is some weird thing happening. Forgive me for trusting valgrind
analysis more than bash malloc debugging here...



 - I could reproduce the same as Julien, with -DDISABLE_MALLOC_WRAPPERS
the crash still happens when bash is run directly but nothing complains
in valgrind.
This could mean that systemd is overflowing bash malloc safeguards as
you pointed out (I just don't understand why it wouldn't overflow with
internal malloc), but it could also mean that the memory has been
allocated somewhere else (e.g. libc's malloc) and freed by bash malloc.

nss systemd has started using reallocarray() since v247 and that is not
tracked by bash, I would think that's a good candidate?

I don't have time right now, but I think adding an implementation for
reallocarray (wrapper around realloc which does exist) would be the next
thing to do.




 - Unrelated to the systemd bug, valgrind really seems thrown off by
wrappers..  I also get some invalid reads in the bash malloc code:

$ valgrind /bash --norc -c true
-> invalid free in evalstring, fixed by Andreas' patch
$ valgrind /bash --norc -c 'echo ~'
-> this aborts even without systemd in the library path for me, but only
with valgrind

==407545== Memcheck, a memory error detector
==407545== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==407545== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==407545== Command: /bash --norc -c echo\ ~
==407545== 
==407545== Invalid read of size 1
==407545==at 0x4E5DEC: internal_free.constprop.0 (malloc.c:957)
==407545==by 0x46B165: expand_word_internal (subst.c:10637)
==407545==by 0x46FCDD: shell_expand_word_list.constprop.0 (subst.c:11865)
==407545==by 0x47057E: expand_word_list_internal (subst.c:11989)
==407545==by 0x47057E: expand_words (subst.c:11345)
==407545==by 0x44199E: execute_simple_command (execute_cmd.c:4377)
==407545==by 0x44199E: execute_command_internal (execute_cmd.c:846)
==407545==by 0x498E38: parse_and_execute (evalstring.c:489)
==407545==by 0x4280AC: run_one_command.isra.0 (shell.c:1440)
==407545==by 0x426AA1: main (shell.c:741)
==407545==  Address 0x4a98090 is 16 bytes before a block of size 17 alloc'd
==407545==at 0x483E751: malloc (in 
/nix/store/fazpzv26bal3z6j0mvi8y3k54x3xxi81-valgrind-3.16.1/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==407545==by 0x4912AB: xmalloc (xmalloc.c:114)
==407545==by 0x4E4ACD: tilde_expand (tilde.c:196)
==407545==by 0x437770: bash_tilde_expand (general.c:1211)
==407545==by 0x46A424: expand_word_internal (subst.c:10285)
==407545==by 0x46FCDD: shell_expand_word_list.constprop.0 (subst.c:11865)
==407545==by 0x47057E: expand_word_list_internal (subst.c:11989)
==407545==by 0x47057E: expand_words (subst.c:11345)
==407545==by 0x44199E: execute_simple_command (execute_cmd.c:4377)
==407545==by 0x44199E: execute_command_internal (execute_cmd.c:846)
==407545==by 0x498E38: parse_and_execute (evalstring.c:489)
==407545==by 0x4280AC: run_one_command.isra.0 (shell.c:1440)
==407545==by 0x426AA1: main (shell.c:741)
==407545== 
==407545== Invalid read of size 1
==407545==at 0x4E5E05: internal_free.constprop.0 (malloc.c:968)
==407545==by 0x46B165: expand_word_internal (subst.c:10637)
==407545==by 0x46FCDD: shell_expand_word_list.constprop.0 (subst.c:11865)
==407545==by 0x47057E: expand_word_list_internal (subst.c:11989)
==407545==by 0x47057E: expand_words (subst.c:11345)
==407545==by 0x44199E: execute_simple_command (execute_cmd.c:4377)
==407545==by 0x44199E: execute_command_internal (execute_cmd.c:846)
==407545==by 0x498E38: parse_and_execute (evalstring.c:489)
==407545==by 0x4280AC: run_one_command.isra.0 (shell.c:1440)
==407545==by 0x426AA1: main (shell.c:741)
==407545==  Address 0x4a98090 is 16 bytes before a block of size 17 alloc'd
==407545==at 0x483E751: malloc (in 
/nix/store/fazpzv26bal3z6j0mvi8y3k54x3xxi81-valgrind-3.16.1/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==407545==by 0x4912AB: xmalloc (xmalloc.c:114)
==407545==by 0x4E4ACD: tilde_expand (tilde.c:196)
==407545==by 0x437770: bash_tilde_expand 

Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Andreas Schwab
On Okt 04 2021, Chet Ramey wrote:

> You'd think. This is the kind of overflow that will produce that error
> message from the bash malloc:

Only after the fact.  valgrind finds it before it is happening, and even
if the overflow hits a padding between memory blocks.

$ valgrind ./a.out 
==31974== Memcheck, a memory error detector
==31974== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31974== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==31974== Command: ./a.out
==31974== 
==31974== Invalid write of size 1
==31974==at 0x4006CB: main (in /home/andreas/a.out)
==31974==  Address 0x5213068 is 0 bytes after a block of size 40 alloc'd
==31974==at 0x4C312EF: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31974==by 0x40068F: main (in /home/andreas/a.out)
==31974== 
==31974== Invalid write of size 1
==31974==at 0x4006ED: main (in /home/andreas/a.out)
==31974==  Address 0x521318a is 0 bytes after a block of size 218 alloc'd
==31974==at 0x4C338CF: realloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31974==by 0x4006DE: main (in /home/andreas/a.out)

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey

On 10/4/21 4:44 PM, Andreas Schwab wrote:

On Okt 04 2021, Chet Ramey wrote:



I suspect this is a buffer overflow introduced between systemd-247 and
systemd-249. It's not caught when building bash without the bash malloc
because the default libc malloc probably doesn't do the bounds checking
the bash malloc does, even without malloc debugging turned on.


If it's a buffer overflow, then valgrind should be able to catch it
(when bash is configured --without-bash-malloc).  valgrind's bounds
checking is much more advanced than what a checking malloc can do.


You'd think. This is the kind of overflow that will produce that error
message from the bash malloc:

int
main(int c, char **v)
{
char*buf;

buf = (char *)malloc (40);
if (buf == 0) {
fprintf(stderr, "malloc failed\n");
exit(1);
}
buf[40]='\254';

buf = realloc(buf, 218);
buf[218]='\214';

free(buf);
exit(0);
}




--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey

On 10/4/21 5:28 PM, Julien Moutinho wrote:

On Okt 04 2021, Chet Ramey wrote:

I suspect this is a buffer overflow introduced between systemd-247 and
systemd-249. It's not caught when building bash without the bash malloc
because the default libc malloc probably doesn't do the bounds checking
the bash malloc does, even without malloc debugging turned on.


Chet, thanks for you detailed analysis,
I've opened an issue to get some inputs from systemd's devs:
https://github.com/systemd/systemd/issues/20931

Le lun. 04 oct. 2021 22h44 +0200, Andreas Schwab a écrit :

If it's a buffer overflow, then valgrind should be able to catch it
(when bash is configured --without-bash-malloc).  valgrind's bounds
checking is much more advanced than what a checking malloc can do.


Andreas, just to confirm that so far I'm unable to get a crash or error
when using --without-bash-malloc, even in valgrind (but I'm a newbie at 
valgrind).


If you want to make sure that you're checking bash with valgrind, even in
the presence of bash optimizing execution and performing an `exec' of `id'
without forking, use a bash builtin like `true' instead of `id'.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Julien Moutinho
On Okt 04 2021, Chet Ramey wrote:
> I suspect this is a buffer overflow introduced between systemd-247 and
> systemd-249. It's not caught when building bash without the bash malloc
> because the default libc malloc probably doesn't do the bounds checking
> the bash malloc does, even without malloc debugging turned on.

Chet, thanks for you detailed analysis,
I've opened an issue to get some inputs from systemd's devs:
https://github.com/systemd/systemd/issues/20931

Le lun. 04 oct. 2021 22h44 +0200, Andreas Schwab a écrit :
> If it's a buffer overflow, then valgrind should be able to catch it
> (when bash is configured --without-bash-malloc).  valgrind's bounds
> checking is much more advanced than what a checking malloc can do.

Andreas, just to confirm that so far I'm unable to get a crash or error
when using --without-bash-malloc, even in valgrind (but I'm a newbie at 
valgrind).

# systemd-run --pipe -p DynamicUser=1 -E LD_LIBRARY_PATH=$(nix-store -q $(which 
systemctl))/lib -pBindReadOnlyPaths={/etc,/nix,/run} -p RootDirectory=/run/bash 
-- $(readlink $(which valgrind)) --trace-children=yes -- $(readlink -e 
bash5-without-bash-malloc/bin/bash) --norc -c $(readlink $(which id))
> Running as unit: run-u3128.service
> ==669426== Memcheck, a memory error detector
> ==669426== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==669426== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
> ==669426== Command: 
> /nix/store/2kw8gj9lm1kn6zbpw5nf68h7msm1y716-bash-5.1-p8/bin/bash --norc -c 
> /nix/store/j93py7g2fd0qmxq5q2mhnvc6ziijkjb8-coreutils-8.32/bin/id
> ==669426== 
> ==669426== Memcheck, a memory error detector
> ==669426== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==669426== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
> ==669426== Command: 
> /nix/store/j93py7g2fd0qmxq5q2mhnvc6ziijkjb8-coreutils-8.32/bin/id
> ==669426== 
> ==669426== 
> ==669426== HEAP SUMMARY:
> ==669426== in use at exit: 3,550 bytes in 10 blocks
> ==669426==   total heap usage: 903 allocs, 893 frees, 5,165,001 bytes 
> allocated
> ==669426== 
> ==669426== LEAK SUMMARY:
> ==669426==definitely lost: 0 bytes in 0 blocks
> ==669426==indirectly lost: 0 bytes in 0 blocks
> ==669426==  possibly lost: 0 bytes in 0 blocks
> ==669426==still reachable: 3,446 bytes in 9 blocks
> ==669426== suppressed: 104 bytes in 1 blocks
> ==669426== Rerun with --leak-check=full to see details of leaked memory
> ==669426== 
> ==669426== For lists of detected and suppressed errors, rerun with: -s
> ==669426== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

# systemd-run --pipe -p DynamicUser=1 -E LD_LIBRARY_PATH=$(nix-store -q $(which 
systemctl))/lib -pBindReadOnlyPaths={/etc,/nix,/run} -p RootDirectory=/run/bash 
-- $(readlink -e bash5-without-bash-malloc/bin/bash) --norc -c $(readlink 
$(which id))
> Running as unit: run-u3109.service
> uid=62878(run-u3109) gid=62878(run-u3109) groups=62878(run-u3109)



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Andreas Schwab
On Okt 04 2021, Chet Ramey wrote:

> On 10/3/21 11:59 PM, Julien Moutinho wrote:
>> Bash Version: 5.1
>> Patch Level: 8
>> Release Status: release
>> Architecture: x86_64-linux
>> 
>> Description:
>> 
>> bash-5.1 reaches crashing code paths
>> when launched by systemd-249 or valgrind.
>> I cannot get such crashes when bash is built using:
>> ./configure --without-bash-malloc
>
> I suspect this is a buffer overflow introduced between systemd-247 and
> systemd-249. It's not caught when building bash without the bash malloc
> because the default libc malloc probably doesn't do the bounds checking
> the bash malloc does, even without malloc debugging turned on.

If it's a buffer overflow, then valgrind should be able to catch it
(when bash is configured --without-bash-malloc).  valgrind's bounds
checking is much more advanced than what a checking malloc can do.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Julien Moutinho
Le lun. 04 oct. 2021 14h51 -0400, Chet Ramey a écrit :
> It's a problem with valgrind, described in another thread with this
> subject. Build bash with -DDISABLE_MALLOC_WRAPPERS to work around it.
Thanks Chet, that flag makes those crashes disappear.
However the crash after _nss_systemd_getpwuid_r() remains:

# systemd-run --pipe -p DynamicUser=1 -E LD_LIBRARY_PATH=$(nix-store -q $(which 
systemctl))/lib -pBindReadOnlyPaths={/etc,/nix,/run} -p RootDirectory=/run/bash 
-- $(readlink -e bash5-with-bash-malloc/bin/bash) --norc -c $(readlink $(which 
id))
> Running as unit: run-u2893.service
> malloc: unknown:0: assertion botched
> realloc: start and end chunk sizes differ
> Aborting...

Strangely that crash no longer happens when bash is run within valgrind,
and the correct dynamic username is retrieved by libnss_systemd.so.2:

# systemd-run --pipe -p DynamicUser=1 -E LD_LIBRARY_PATH=$(nix-store -q $(which 
systemctl))/lib -pBindReadOnlyPaths={/etc,/nix,/run} -p RootDirectory=/run/bash 
-- $(readlink $(which valgrind)) -- $(readlink -e 
bash5-with-bash-malloc/bin/bash) --norc -c $(readlink $(which id))
> Running as unit: run-u2961.service
> ==649969== Memcheck, a memory error detector
> ==649969== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==649969== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
> ==649969== Command: 
> /nix/store/kjf6d8a28jia291s6vf1a1qi3apbk252-bash-5.1-p8/bin/bash --norc -c 
> /nix/store/j93py7g2fd0qmxq5q2mhnvc6ziijkjb8-coreutils-8.32/bin/id
> ==649969== 
> uid=63383(run-u2961) gid=63383(run-u2961) groups=63383(run-u2961)



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey
On 10/3/21 11:59 PM, Julien Moutinho wrote:
> Bash Version: 5.1
> Patch Level: 8
> Release Status: release
> Architecture: x86_64-linux
> 
> Description:
> 
> bash-5.1 reaches crashing code paths
> when launched by systemd-249 or valgrind.
> I cannot get such crashes when bash is built using:
> ./configure --without-bash-malloc

I suspect this is a buffer overflow introduced between systemd-247 and
systemd-249. It's not caught when building bash without the bash malloc
because the default libc malloc probably doesn't do the bounds checking
the bash malloc does, even without malloc debugging turned on.

It may be, but probably is not, due to the single change between the
bash-5.0 and bash-5.1 malloc: aligning returned memory on 16-byte
boundaries on systems with 64-bit pointers. (There is another change
between bash-4.4 and bash-5.0, using mmap for large allocations, but it's
not a factor here.)

I'll explain my reasoning as we go.

> Here's a Nix recipe for reproducible builds,
> but you should be able to reproduce the crash with your own build method.

I use the devel branch, which already has these defines, but I don't have
any handy systems running systemd-249. Anyway, onward.

>> malloc: unknown:0: assertion botched
>> malloc: 0x37c8fa10: allocated: last allocated from unknown:0
>> realloc: start and end chunk sizes differ
>> last command: (null)
>> Aborting...

This error tells us that realloc failed because the stored sizes of the
allocated block, which are stored before and after the block, don't match.
The bash malloc keeps track of the requested size of the block and uses
it in various checks.

The fact that the last allocation location is `unknown:0' means that
the allocation wasn't called directly by bash.

The important thing looks to be that bash is linked against systemd-249's
version of the nss library, and goes through systemd for the password
library functions. Non-systemd versions of the getpw functions don't
seem to exhibit this problem. systemd-247 doesn't exhibit this problem.

>> #3  0x0049e408 in xbotch (mem=mem@entry=0x37c8fa10, e=e@entry=8, 
>> s=0x4b9708 "realloc: start and end chunk sizes differ", 
>> file=file@entry=0x0, line=line@entry=0) at malloc.c:376
>> #4  0x0049f5c3 in internal_realloc (mem=0x37c8fa10, n=218, 
>> file=file@entry=0x0, line=line@entry=0, 
>> flags=flags@entry=0) at malloc.c:1150
>> #5  0x0049f867 in realloc (mem=, nbytes=> out>) at malloc.c:1440
>> #6  0x619871c02526 in greedy_realloc ()
>>from 
>> /nix/store/5iw38x4l9nyp2srnlxfi9l1q9np0pchq-systemd-249.4/lib/libnss_systemd.so.2
>> #7  0x619871c04c44 in read_line_full ()
>>from 
>> /nix/store/5iw38x4l9nyp2srnlxfi9l1q9np0pchq-systemd-249.4/lib/libnss_systemd.so.2
>> #8  0x619871c04eb6 in read_one_line_file ()
>>from 
>> /nix/store/5iw38x4l9nyp2srnlxfi9l1q9np0pchq-systemd-249.4/lib/libnss_systemd.so.2
>> #9  0x619871bfd1b8 in proc_cmdline_parse ()
>>from 
>> /nix/store/5iw38x4l9nyp2srnlxfi9l1q9np0pchq-systemd-249.4/lib/libnss_systemd.so.2
>> #10 0x619871bfa939 in log_parse_environment ()
>>from 
>> /nix/store/5iw38x4l9nyp2srnlxfi9l1q9np0pchq-systemd-249.4/lib/libnss_systemd.so.2
>> #11 0x619871b965a7 in __pthread_once_slow (once_control=0x619871c490a4 
>> , 
>> init_routine=0x619871bf3b20 ) at pthread_once.c:116
>> #12 0x619871bf40ff in _nss_systemd_getpwuid_r ()
>>from 
>> /nix/store/5iw38x4l9nyp2srnlxfi9l1q9np0pchq-systemd-249.4/lib/libnss_systemd.so.2
>> #13 0x619871ee7b46 in __getpwuid_r (uid=uid@entry=65483, 
>> resbuf=resbuf@entry=0x619871fdbf40 , 
>> buffer=, buflen=buflen@entry=1024, 
>> result=result@entry=0x7b3c23e18658) at ../nss/getXXbyYY_r.c:274
>> #14 0x619871ee7378 in getpwuid (uid=65483) at ../nss/getXXbyYY.c:135
>> #15 0x0041c9d2 in get_current_user_info () at shell.c:1869
>> #16 0x0041cc8a in shell_initialize () at shell.c:1932
>> #17 0x0041d1e8 in main (argc=2, argv=0x7b3c23e18918, 
>> env=0x7b3c23e18930) at shell.c:572

This tells us that this is internal to the systemd-249 getpwuid -- the
buffer is allocated there and reallocated there. There's no bash code
executed.

That points the finger at systemd-249 a little more -- bash built with
its internal malloc doesn't have a problem with realloc unless systemd
is running its getpwuid implementation. This is the case when bash is
linked against the functions from systemd-247.

> AFAICS:
> - No crash happens with bash5-with-bash-malloc on systemd-247.

So it doesn't appear to be bash (nor the bash malloc), since the bash
version is the same, the bash malloc is the same, and the code paths
are the same (getpwuid -> systemd nss code). The only thing that's
changed is the systemd version.

> - No crash happens with bash5-without-bash-malloc on systemd-247 or 
> systemd-249.

Because the default malloc doesn't do this checking.

> - A slightly different crash, still involving _nss_systemd_getpwuid_r,
>   happens with 

Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey
On 10/4/21 12:23 PM, Julien Moutinho wrote:
> Le lun. 04 oct. 2021 10h34 +0200, Andreas Schwab a écrit :
>> Here is a patch:
> Thanks Andreas, that particular crash disappears with this patch.
> However the crash after _nss_systemd_getpwuid_r() is still happening for me,
> and valgrind can still find a similar crash after source_builtin():

It's a problem with valgrind, described in another thread with this
subject. Build bash with -DDISABLE_MALLOC_WRAPPERS to work around it.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey
On 10/4/21 12:27 PM, Andreas Schwab wrote:
> On Okt 04 2021, Chet Ramey wrote:
> 
>> Nope. These are all functions internal to bash and the bash malloc, and
>> they are all defined.
> 
> The problem is that the xmalloc macro redirects directly to the internal
> malloc implementation, whereas the xfree function calls it indirectly
> through the free function.  The latter is seen by valgrind, the former
> isn't, so it didn't track it.  Thus the xmalloc, xrealloc, xfree macros
> need to be disabled if valgrind is used.

OK, now we're back to the invalidating some of valgrind's assumptions.
For this, you can define DISABLE_MALLOC_WRAPPERS while building bash,
which has been there since at least bash-2.05b. I haven't built it that
way in quite a while, but it shouldn't have any problems.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Andreas Schwab
On Okt 04 2021, Chet Ramey wrote:

> Nope. These are all functions internal to bash and the bash malloc, and
> they are all defined.

The problem is that the xmalloc macro redirects directly to the internal
malloc implementation, whereas the xfree function calls it indirectly
through the free function.  The latter is seen by valgrind, the former
isn't, so it didn't track it.  Thus the xmalloc, xrealloc, xfree macros
need to be disabled if valgrind is used.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Julien Moutinho
Le lun. 04 oct. 2021 10h34 +0200, Andreas Schwab a écrit :
> Here is a patch:
Thanks Andreas, that particular crash disappears with this patch.
However the crash after _nss_systemd_getpwuid_r() is still happening for me,
and valgrind can still find a similar crash after source_builtin():

$ nix -L build .#bash5-with-bash-malloc
$ valgrind result/bin/bash -c '. /dev/null'
> ==547049== Memcheck, a memory error detector
> ==547049== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==547049== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
> ==547049== Command: result/bin/bash -c .\ /dev/null
> ==547049== 
> ==547049== Invalid free() / delete / delete[] / realloc()
> ==547049==at 0x483F8E9: free (in 
> /nix/store/7s7hzqaf5imxmpjlxh2n6fs7ixml98ya-valgrind-3.16.1/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==547049==by 0x47330F: xfree (xmalloc.c:150)
> ==547049==by 0x4644FA: unwind_frame_run_internal (unwind_prot.c:325)
> ==547049==by 0x4640B6: without_interrupts (unwind_prot.c:117)
> ==547049==by 0x464656: run_unwind_frame (unwind_prot.c:143)
> ==547049==by 0x483E59: source_builtin (source.def:197)
> ==547049==by 0x430960: execute_builtin (execute_cmd.c:4828)
> ==547049==by 0x4377AC: execute_builtin_or_function (execute_cmd.c:5341)
> ==547049==by 0x433DC4: execute_simple_command (execute_cmd.c:4597)
> ==547049==by 0x434A03: execute_command_internal (execute_cmd.c:846)
> ==547049==by 0x479E75: parse_and_execute (evalstring.c:495)
> ==547049==by 0x41C0A5: run_one_command (shell.c:1440)
> ==547049==  Address 0x4051f70 is in the brk data segment 0x4033000-0x4055fff
> ==547049== 
> ==547049== 
> ==547049== HEAP SUMMARY:
> ==547049== in use at exit: 0 bytes in 0 blocks
> ==547049==   total heap usage: 2,410 allocs, 2,411 frees, 156,759 bytes 
> allocated
> ==547049== 
> ==547049== All heap blocks were freed -- no leaks are possible
> ==547049== 
> ==547049== For lists of detected and suppressed errors, rerun with: -s
> ==547049== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)




Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey
On 10/4/21 11:08 AM, Andreas Schwab wrote:
> On Okt 04 2021, Chet Ramey wrote:
> 
>> Nope, I don't buy that as the reason. xfree (name of function) and xfree(x)
>> (macro defined in xmalloc.h) are not the same thing.
> 
> That's exactly the problem.  You cannot pass the return value from
> sh_xmalloc to xfree, only sh_xfree.

That is absolutely not true with the bash malloc. It might be that valgrind
makes certain assumptions about these functions, but those are not valid
assumptions against the bash malloc implementation. You can freely mix
calls to xmalloc/sh_xmalloc/malloc and xfree/sh_xfree/free. They all end up
using the same set of internal functions and the same allocation pool.

I've explained this previously, most recently in

https://lists.gnu.org/archive/html/bug-bash/2017-04/msg00053.html

>> and everything works correctly.
> 
> Nope, it's undefined behaviour, as pointed out by valgrind.

Nope. These are all functions internal to bash and the bash malloc, and
they are all defined.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Andreas Schwab
On Okt 04 2021, Chet Ramey wrote:

> Nope, I don't buy that as the reason. xfree (name of function) and xfree(x)
> (macro defined in xmalloc.h) are not the same thing.

That's exactly the problem.  You cannot pass the return value from
sh_xmalloc to xfree, only sh_xfree.

> and everything works correctly.

Nope, it's undefined behaviour, as pointed out by valgrind.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey
On 10/4/21 10:17 AM, Andreas Schwab wrote:
> On Okt 04 2021, Chet Ramey wrote:
> 
>> On 10/4/21 4:34 AM, Andreas Schwab wrote:
>>> On Okt 04 2021, Julien Moutinho wrote:
>>>
 - bash crashes inside valgrind too,
   but apparently something different is happening
   because it crashes even without systemd being involved:

 $ nix build .#bash5-with-bash-malloc
 $ valgrind result/bin/bash --norc -c true
> ==307088== Memcheck, a memory error detector
> ==307088== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==307088== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright 
> info
> ==307088== Command: result/bin/bash --norc -c true
> ==307088== 
> ==307088== Invalid free() / delete / delete[] / realloc()
> ==307088==at 0x483F8E9: free (in 
> /nix/store/7s7hzqaf5imxmpjlxh2n6fs7ixml98ya-valgrind-3.16.1/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==307088==by 0x47330F: xfree (xmalloc.c:150)
> ==307088==by 0x4644FA: unwind_frame_run_internal (unwind_prot.c:325)
> ==307088==by 0x4640B6: without_interrupts (unwind_prot.c:117)
> ==307088==by 0x464656: run_unwind_frame (unwind_prot.c:143)
> ==307088==by 0x479ACA: parse_and_execute (evalstring.c:523)
> ==307088==by 0x41C0A5: run_one_command (shell.c:1440)
> ==307088==by 0x41D6A1: main (shell.c:741)
> ==307088==  Address 0x404be10 is in the brk data segment 
> 0x4033000-0x4054fff
>>>
>>> Here is a patch:
>>
>> How does this fix the problem with valgrind? How does wrapping xfree in a
>> local function help?
> 
> Because xfree is a function-like macro, so taking the address does not
> work.

Nope, I don't buy that as the reason. xfree (name of function) and xfree(x)
(macro defined in xmalloc.h) are not the same thing. Unless you mean that
the existence of the macro confuses valgrind? Or something else?

Indeed, when you step through the code in a debugger, breaking at the spot
in parse_prologue that installs the unwind-protect, you see:

(gdb) b evalstring.c:257
Breakpoint 1 at 0x48d1b5: file
/usr/homes/chet/src/bash/src/builtins/evalstring.c, line 257.
(gdb) r -c 'echo a'
Starting program:
/mnt/src-build/build/linux/chet/bash/bash-current.rhe7/bash -c 'echo a'

Breakpoint 1, parse_prologue (string=0x7c6150 "echo a", flags=20,
tag=0x4f4c33 "parse_and_execute top")
at /usr/homes/chet/src/bash/src/builtins/evalstring.c:257
257 add_unwind_protect (xfree, orig_string);
Missing separate debuginfos, use: debuginfo-install
glibc-2.17-324.el7_9.x86_64 ncurses-libs-5.9-14.20130511.el7_4.x86_64
(gdb) p orig_string
$1 = 0x7c6150 "echo a"
(gdb) s
add_unwind_protect (cleanup=0x486220 , arg=0x7c6150 "echo a")
at /usr/homes/chet/src/bash/src/unwind_prot.c:152
152   without_interrupts (add_unwind_protect_internal, (char *)cleanup, 
arg);

and everything works correctly.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Andreas Schwab
On Okt 04 2021, Chet Ramey wrote:

> On 10/4/21 4:34 AM, Andreas Schwab wrote:
>> On Okt 04 2021, Julien Moutinho wrote:
>> 
>>> - bash crashes inside valgrind too,
>>>   but apparently something different is happening
>>>   because it crashes even without systemd being involved:
>>>
>>> $ nix build .#bash5-with-bash-malloc
>>> $ valgrind result/bin/bash --norc -c true
 ==307088== Memcheck, a memory error detector
 ==307088== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
 ==307088== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright 
 info
 ==307088== Command: result/bin/bash --norc -c true
 ==307088== 
 ==307088== Invalid free() / delete / delete[] / realloc()
 ==307088==at 0x483F8E9: free (in 
 /nix/store/7s7hzqaf5imxmpjlxh2n6fs7ixml98ya-valgrind-3.16.1/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
 ==307088==by 0x47330F: xfree (xmalloc.c:150)
 ==307088==by 0x4644FA: unwind_frame_run_internal (unwind_prot.c:325)
 ==307088==by 0x4640B6: without_interrupts (unwind_prot.c:117)
 ==307088==by 0x464656: run_unwind_frame (unwind_prot.c:143)
 ==307088==by 0x479ACA: parse_and_execute (evalstring.c:523)
 ==307088==by 0x41C0A5: run_one_command (shell.c:1440)
 ==307088==by 0x41D6A1: main (shell.c:741)
 ==307088==  Address 0x404be10 is in the brk data segment 
 0x4033000-0x4054fff
>> 
>> Here is a patch:
>
> How does this fix the problem with valgrind? How does wrapping xfree in a
> local function help?

Because xfree is a function-like macro, so taking the address does not
work.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Chet Ramey
On 10/4/21 4:34 AM, Andreas Schwab wrote:
> On Okt 04 2021, Julien Moutinho wrote:
> 
>> - bash crashes inside valgrind too,
>>   but apparently something different is happening
>>   because it crashes even without systemd being involved:
>>
>> $ nix build .#bash5-with-bash-malloc
>> $ valgrind result/bin/bash --norc -c true
>>> ==307088== Memcheck, a memory error detector
>>> ==307088== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
>>> ==307088== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright 
>>> info
>>> ==307088== Command: result/bin/bash --norc -c true
>>> ==307088== 
>>> ==307088== Invalid free() / delete / delete[] / realloc()
>>> ==307088==at 0x483F8E9: free (in 
>>> /nix/store/7s7hzqaf5imxmpjlxh2n6fs7ixml98ya-valgrind-3.16.1/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> ==307088==by 0x47330F: xfree (xmalloc.c:150)
>>> ==307088==by 0x4644FA: unwind_frame_run_internal (unwind_prot.c:325)
>>> ==307088==by 0x4640B6: without_interrupts (unwind_prot.c:117)
>>> ==307088==by 0x464656: run_unwind_frame (unwind_prot.c:143)
>>> ==307088==by 0x479ACA: parse_and_execute (evalstring.c:523)
>>> ==307088==by 0x41C0A5: run_one_command (shell.c:1440)
>>> ==307088==by 0x41D6A1: main (shell.c:741)
>>> ==307088==  Address 0x404be10 is in the brk data segment 0x4033000-0x4054fff
> 
> Here is a patch:

How does this fix the problem with valgrind? How does wrapping xfree in a
local function help?

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Using systemd-249's libnss_systemd.so.2 triggers a crash in bash-5.1's malloc.c

2021-10-04 Thread Andreas Schwab
On Okt 04 2021, Julien Moutinho wrote:

> - bash crashes inside valgrind too,
>   but apparently something different is happening
>   because it crashes even without systemd being involved:
>
> $ nix build .#bash5-with-bash-malloc
> $ valgrind result/bin/bash --norc -c true
>> ==307088== Memcheck, a memory error detector
>> ==307088== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
>> ==307088== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
>> ==307088== Command: result/bin/bash --norc -c true
>> ==307088== 
>> ==307088== Invalid free() / delete / delete[] / realloc()
>> ==307088==at 0x483F8E9: free (in 
>> /nix/store/7s7hzqaf5imxmpjlxh2n6fs7ixml98ya-valgrind-3.16.1/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==307088==by 0x47330F: xfree (xmalloc.c:150)
>> ==307088==by 0x4644FA: unwind_frame_run_internal (unwind_prot.c:325)
>> ==307088==by 0x4640B6: without_interrupts (unwind_prot.c:117)
>> ==307088==by 0x464656: run_unwind_frame (unwind_prot.c:143)
>> ==307088==by 0x479ACA: parse_and_execute (evalstring.c:523)
>> ==307088==by 0x41C0A5: run_one_command (shell.c:1440)
>> ==307088==by 0x41D6A1: main (shell.c:741)
>> ==307088==  Address 0x404be10 is in the brk data segment 0x4033000-0x4054fff

Here is a patch:

diff --git i/builtins/evalstring.c w/builtins/evalstring.c
index 18928a17..ae684d26 100644
--- i/builtins/evalstring.c
+++ w/builtins/evalstring.c
@@ -197,6 +197,12 @@ parse_and_execute_cleanup (old_running_trap)
 parse_and_execute_level = 0;   /* XXX */
 }
 
+static void
+free_string (char *string)
+{
+  xfree (string);
+}
+
 static void
 parse_prologue (string, flags, tag)
  char *string;
@@ -247,7 +253,7 @@ parse_prologue (string, flags, tag)
 add_unwind_protect (parser_restore_alias, (char *)NULL);
 
   if (orig_string && ((flags & SEVAL_NOFREE) == 0))
-add_unwind_protect (xfree, orig_string);
+add_unwind_protect (free_string, orig_string);
   end_unwind_frame ();
 
   if (flags & (SEVAL_NONINT|SEVAL_INTERACT))

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."