Re: getdelim: Work around buggy implementation on macOS 10.13

2022-10-24 Thread Jeffrey Walton
On Sun, Oct 23, 2022 at 11:57 PM Simon Josefsson via Gnulib discussion
list  wrote:
>
> Bruno Haible  writes:
>
> > While testing a GNU sed snapshot on macOS 10.13, I see this test failure:
> ...
> > ==85029== Invalid read of size 16
> ...
> > An out-of-bounds read. Oh oh. When I reconfigure and recompile with the
> > environment variable
> >   gl_cv_func_working_getdelim=no
> > this test succeeds. Assuming that the GNU sed code is correct (it's not a
> > particularly complex code), this proves that the out-of-bounds read comes
> > from the macOS getdelim() function.
> ...
> > +  [case "$host_os" in
> > + darwin*)
> > +   dnl On macOS 10.13, valgrind detected an out-of-bounds read 
> > during
> > +   dnl the GNU sed test suite:
> > +   dnl   Invalid read of size 16
> > +   dnl  at 0x100EE6A05: _platform_memchr$VARIANT$Base (in 
> > /usr/lib/system/libsystem_platform.dylib)
> > +   dnl  by 0x100B7B0BD: getdelim (in 
> > /usr/lib/system/libsystem_c.dylib)
> > +   dnl  by 0x1B0BE: ck_getdelim (utils.c:254)
> > +   gl_cv_func_working_getdelim=no ;;
>
> I don't care strongly about this issue, but this brings up a design
> perspective for gnulib wrt valgrind: we have several valgrind
> suppression files (see lib/*.valgrind) to silence valgrind complaints
> already.  Your solution here choses a different path.
>
> It can be difficult to assess wether a valgrind complaint is a false
> positive or not, especially for system functions (and we've had valgrind
> complaints for libc issues that looked problematic in theory but not in
> practice).  Unless we can trigger a real bug in the code and test for
> that, we have a choice how to handle valgrind complaints: 1) add a
> valgrind suppressions file for the complaint, or 2) unconditionally
> bring in a gnulib replacement code without testing for that behaviour.
>
> I prefer 1) since 2) will over time leads to us bringing in the entire
> gnulib replacement code on all systems, which is really bloated and
> leads to other problems.
>
> What do you think about adding a valgrind suppressions file for the
> output you found instead?  And possibly improve documentation about how
> gnulib intends these suppression files to be used by people who run the
> test-suite under valgrind, if anything is missing in that area today.

The Valgrind guide says to build at -O0 or -O1. At -O2 or above
memcheck is not accurate. From [1]:

Compile your program with -g to include debugging
information so that Memchecks error messages include exact
line numbers. Using -O0 is also a good idea, if you can
tolerate the slowdown. With -O1 line numbers in error
messages can be inaccurate, although generally speaking
running Memcheck on code compiled at -O1 works fairly well,
and the speed improvement compared to running -O0 is quite
significant. Use of -O2 and above is not recommended as
Memcheck occasionally reports uninitialised-value errors
which dont really exist.

Before you attempt to classify as a false positive (or not), you
should build at -O1 or possibly -O0.

[1] https://valgrind.org/docs/manual/quick-start.html

Jeff



Re: getdelim: Work around buggy implementation on macOS 10.13

2022-10-23 Thread Bruno Haible
I wrote:
> TODO: Someone should test the same thing on the *BSD systems as well.

Done: On FreeBSD 13.1, with valgrind 3.19.0, in a 'sed' binary that uses
getdelim from FreeBSD's libc, there is no failure in sed's test suite.

valgrind is not ported to NetBSD and OpenBSD; let's assume that these
systems work like FreeBSD.

Bruno






Re: getdelim: Work around buggy implementation on macOS 10.13

2022-10-23 Thread Bruno Haible
Hi Simon,

> > +  [case "$host_os" in
> > + darwin*)
> > +   dnl On macOS 10.13, valgrind detected an out-of-bounds read 
> > during
> > +   dnl the GNU sed test suite:
> > +   dnl   Invalid read of size 16
> > +   dnl  at 0x100EE6A05: _platform_memchr$VARIANT$Base (in 
> > /usr/lib/system/libsystem_platform.dylib)
> > +   dnl  by 0x100B7B0BD: getdelim (in 
> > /usr/lib/system/libsystem_c.dylib)
> > +   dnl  by 0x1B0BE: ck_getdelim (utils.c:254)
> > +   gl_cv_func_working_getdelim=no ;;
> 
> ... this brings up a design
> perspective for gnulib wrt valgrind: we have several valgrind
> suppression files (see lib/*.valgrind) to silence valgrind complaints
> already.  Your solution here choses a different path.

For false positives, I would add a suppression file; for real bugs
I would add a workaround in gnulib.

> It can be difficult to assess wether a valgrind complaint is a false
> positive or not

Exactly. The source code of 'getdelim' in macOS has 120 lines of code,
with invocations of '__srefill' and 'sappend'. I don't have time to analyze
it in depth, because that would require to understand how these libc-
internal functions work.

I verified that by using the Gnulib code for 'getdelim' instead of the
system function, the valgrind error goes away. To me, that's a strong
indication that there is a bug in that macOS system function.

> Unless we can trigger a real bug in the code and test for
> that, we have a choice how to handle valgrind complaints: 1) add a
> valgrind suppressions file for the complaint, or 2) unconditionally
> bring in a gnulib replacement code without testing for that behaviour.
> 
> I prefer 1) since 2) will over time leads to us bringing in the entire
> gnulib replacement code on all systems, which is really bloated and
> leads to other problems.

I prefer 2) since I can't knowingly let Jim make a GNU sed release when
there is a high probability that this macOS system function has a bug
that can possibly be exploited like a CVE.

The "the entire gnulib replacement code on all systems" argument is
exaggerated, because
  - It's not on all systems; it's on those systems where we have seen
suspicious valgrind errors.
  - There are platforms on which a lot of Gnulib replacement code is
used. Such as Solaris 10 or mingw. Yes, it makes the binaries large.
Yes, it occasionally causes a symbol conflict here and there. But
that gets reported and fixed. It's not unmanageable.

Bruno






Re: getdelim: Work around buggy implementation on macOS 10.13

2022-10-23 Thread Simon Josefsson via Gnulib discussion list
Bruno Haible  writes:

> While testing a GNU sed snapshot on macOS 10.13, I see this test failure:
...
> ==85029== Invalid read of size 16
...
> An out-of-bounds read. Oh oh. When I reconfigure and recompile with the
> environment variable
>   gl_cv_func_working_getdelim=no
> this test succeeds. Assuming that the GNU sed code is correct (it's not a
> particularly complex code), this proves that the out-of-bounds read comes
> from the macOS getdelim() function.
...
> +  [case "$host_os" in
> + darwin*)
> +   dnl On macOS 10.13, valgrind detected an out-of-bounds read during
> +   dnl the GNU sed test suite:
> +   dnl   Invalid read of size 16
> +   dnl  at 0x100EE6A05: _platform_memchr$VARIANT$Base (in 
> /usr/lib/system/libsystem_platform.dylib)
> +   dnl  by 0x100B7B0BD: getdelim (in 
> /usr/lib/system/libsystem_c.dylib)
> +   dnl  by 0x1B0BE: ck_getdelim (utils.c:254)
> +   gl_cv_func_working_getdelim=no ;;

I don't care strongly about this issue, but this brings up a design
perspective for gnulib wrt valgrind: we have several valgrind
suppression files (see lib/*.valgrind) to silence valgrind complaints
already.  Your solution here choses a different path.

It can be difficult to assess wether a valgrind complaint is a false
positive or not, especially for system functions (and we've had valgrind
complaints for libc issues that looked problematic in theory but not in
practice).  Unless we can trigger a real bug in the code and test for
that, we have a choice how to handle valgrind complaints: 1) add a
valgrind suppressions file for the complaint, or 2) unconditionally
bring in a gnulib replacement code without testing for that behaviour.

I prefer 1) since 2) will over time leads to us bringing in the entire
gnulib replacement code on all systems, which is really bloated and
leads to other problems.

What do you think about adding a valgrind suppressions file for the
output you found instead?  And possibly improve documentation about how
gnulib intends these suppression files to be used by people who run the
test-suite under valgrind, if anything is missing in that area today.

/Simon


signature.asc
Description: PGP signature


getdelim: Work around buggy implementation on macOS 10.13

2022-10-16 Thread Bruno Haible
While testing a GNU sed snapshot on macOS 10.13, I see this test failure:

FAIL: testsuite/bug32082


valgrind report for 'posix' test:
==
--85029-- run: /usr/bin/dsymutil 
"/Users/haible/devel/sed-4.8.39-23ea/build-64/./sed/sed"
--85029-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--85029-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--85029-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
--85029-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 8 times)
==85029== Invalid read of size 16
==85029==at 0x100EE6A05: _platform_memchr$VARIANT$Base (in 
/usr/lib/system/libsystem_platform.dylib)
==85029==by 0x100B7B0BD: getdelim (in /usr/lib/system/libsystem_c.dylib)
==85029==by 0x1B0BE: ck_getdelim (utils.c:254)
==85029==by 0x190A9: read_file_line (execute.c:388)
==85029==by 0x18A12: read_pattern_space (execute.c:718)
==85029==by 0x166DA: process_files (execute.c:1682)
==85029==by 0x1A5D9: main (sed.c:378)
==85029==  Address 0x1019bf1a0 is 16 bytes before a block of size 65,536 alloc'd
==85029==at 0x1000E7545: malloc (vg_replace_malloc.c:302)
==85029==by 0x100B7B500: __smakebuf (in /usr/lib/system/libsystem_c.dylib)
==85029==by 0x100B7EC75: __srefill0 (in /usr/lib/system/libsystem_c.dylib)
==85029==by 0x100B7ED65: __srefill (in /usr/lib/system/libsystem_c.dylib)
==85029==by 0x100B7B136: getdelim (in /usr/lib/system/libsystem_c.dylib)
==85029==by 0x1B0BE: ck_getdelim (utils.c:254)
==85029==by 0x190A9: read_file_line (execute.c:388)
==85029==by 0x18A12: read_pattern_space (execute.c:718)
==85029==by 0x166DA: process_files (execute.c:1682)
==85029==by 0x1A5D9: main (sed.c:378)
==85029== 
...
FAIL testsuite/bug32082.sh (exit status: 1)

An out-of-bounds read. Oh oh. When I reconfigure and recompile with the
environment variable
  gl_cv_func_working_getdelim=no
this test succeeds. Assuming that the GNU sed code is correct (it's not a
particularly complex code), this proves that the out-of-bounds read comes
from the macOS getdelim() function.

This patch adds a workaround.
TODO: Someone should test the same thing on the *BSD systems as well.


2022-10-16  Bruno Haible  

    getdelim: Work around buggy implementation on macOS 10.13.
* doc/posix-functions/getdelim.texi: Mention the macOS bug.
* m4/getdelim.m4 (gl_FUNC_GETDELIM): Let the "checking for working
getdelim function" test answer 'no' on macOS.

diff --git a/doc/posix-functions/getdelim.texi 
b/doc/posix-functions/getdelim.texi
index b076658574..7e1851fac3 100644
--- a/doc/posix-functions/getdelim.texi
+++ b/doc/posix-functions/getdelim.texi
@@ -12,6 +12,9 @@ Portability problems fixed by Gnulib:
 This function is missing on some platforms:
 Mac OS X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8, Minix 3.1.8, AIX 5.1, 
HP-UX 11, IRIX 6.5, Solaris 10, mingw, MSVC 14, Android 4.2.
 @item
+This function makes out-of-bounds reads on some platforms:
+macOS 10.13.
+@item
 This function crashes when passed a pointer to a NULL buffer together with a
 pointer to a non-zero buffer size on some platforms:
 FreeBSD 8.0.
diff --git a/m4/getdelim.m4 b/m4/getdelim.m4
index 0b63b55351..703c1f876f 100644
--- a/m4/getdelim.m4
+++ b/m4/getdelim.m4
@@ -1,4 +1,4 @@
-# getdelim.m4 serial 15
+# getdelim.m4 serial 16
 
 dnl Copyright (C) 2005-2007, 2009-2022 Free Software Foundation, Inc.
 dnl
@@ -11,7 +11,7 @@ AC_PREREQ([2.59])
 AC_DEFUN([gl_FUNC_GETDELIM],
 [
   AC_REQUIRE([gl_STDIO_H_DEFAULTS])
-  AC_REQUIRE([AC_CANONICAL_HOST]) dnl for cross-compiles
+  AC_REQUIRE([AC_CANONICAL_HOST])
 
   dnl Persuade glibc  to declare getdelim().
   AC_REQUIRE([AC_USE_SYSTEM_EXTENSIONS])
@@ -24,8 +24,18 @@ AC_DEFUN([gl_FUNC_GETDELIM],
 dnl Found it in some library.  Verify that it works.
 AC_CACHE_CHECK([for working getdelim function],
   [gl_cv_func_working_getdelim],
-  [echo fooNbarN | tr -d '\012' | tr N '\012' > conftest.data
-   AC_RUN_IFELSE([AC_LANG_SOURCE([[
+  [case "$host_os" in
+ darwin*)
+   dnl On macOS 10.13, valgrind detected an out-of-bounds read during
+   dnl the GNU sed test suite:
+   dnl   Invalid read of size 16
+   dnl  at 0x100EE6A05: _platform_memchr$VARIANT$Base (in 
/usr/lib/system/libsystem_platform.dylib)
+   dnl  by 0x100B7B0BD: getdelim (in 
/usr/lib/system/libsystem_c.dylib)
+   dnl  by 0x1B0BE: ck_getdelim (utils.c:254)
+   gl_cv_func_working_getdelim=no ;;
+ *)
+   echo fooNbarN | tr -d '\012' | tr N '\012' > conftest.data
+   AC_RUN_IFELSE([AC_LANG_SOURCE([[
 #include 
 #include 
 #include 
@@ -57,26 +67,28 @@ AC_DEFUN([gl_FUNC_GETDELIM],
   return 0;
 }
 ]])],
- [gl_cv_func_working_getdelim=yes],
-