bug#55212: GNU Linux "sort -g" can hang indefinitely when run on standard input if NaNs are involved

2022-05-10 Thread Paul Eggert

On 5/10/22 12:08, coreut...@tlinx.org wrote:

    Unless there is some magic about -n1238095,


The test is random and there's no magic, just luck. The larger the 
random test, the more likely you'll run into the unlucky situation where 
the unpatched 'sort' infloops.






bug#55212: GNU Linux "sort -g" can hang indefinitely when run on standard input if NaNs are involved

2022-05-10 Thread coreutils

On 2022/05/01 10:16, Giulio Genovese wrote:

As explained here
,
when running "sort -g" from standard input, if NaNs are involved, this can
cause "sort -g" to hang indefinitely while consuming 100% of the CPU. This
seems to be system dependent. I cannot reproduce the bug on a RHEL7
machine. However, multiple users seem to be able to reproduce the bug. The
following command can provoke the bug:

yes nan | head -n128095 | timeout 5 sort -g
  


   Unless there is some magic about -n1238095,
Using a smaller number like -n1 or -n12 both terminate after 1 or 12 lines.
I wasn't willing to wait for 1.2M lines being output, though the -n128085
case did start to output until I terminated it -- which would seem to
indicate that the sort had finished and was dumping its output.  I.e.
no indefinite hang.

sort from 8.26.18-5e871
from suse old tumbleweed (out of sync - no longer updates)







bug#55212: GNU Linux "sort -g" can hang indefinitely when run on standard input if NaNs are involved

2022-05-02 Thread Paul Eggert

On 5/2/22 06:31, Pádraig Brady wrote:

This is a bit slower of course, but since an edge case not a big concern:


Yes, my thoughts too. There are ways to speed up common lots-o-NaN cases 
portably (I toyed with the idea of using ieee754.h), but I went with the 
simple approach for now.


A nit: 'time' needs to be at the end of the pipeline:

$ yes nan | head -n128095 | bash -c 'time src/sort -g' >/dev/null

real0m0.552s
user0m0.551s
sys 0m0.001s
$ yes nan | head -n128095 | bash -c 'time sort -g' >/dev/null

real0m0.392s
user0m0.382s
sys 0m0.009s
512-day $ yes nan | head -n128095 | bash -c 'time sort -g' >/dev/null
[Here I had to control-C since 'sort' inflooped.]





bug#55212: GNU Linux "sort -g" can hang indefinitely when run on standard input if NaNs are involved

2022-05-02 Thread Pádraig Brady

On 02/05/2022 07:03, Paul Eggert wrote:

Thanks for the bug report. This bug is entertaining, as it comes from
GCC now being so smart that it optimizes away a memset that cleared
padding bits. We added the memset in coreutils 8.14 (2011) to try to fix
the sort -g infinite loop bug (introduced in 1999), but the memset isn't
guaranteed to fix the bug because the memset can be optimized away.

If the padding bits happen to be clear already sort is OK, but if not
the results can be inconsistent when you compare two NaNs to each other,
and inconsistent results can make sort infloop.

The C standard allows this level of intelligence in the compiler, so
it's a bug in GNU 'sort'.

I installed the attached patch; please give it a try. For now I'll
boldly close the bug report; we can easily reopen it if this patch
doesn't actually fix the problem.


This is a bit slower of course, but since an edge case not a big concern:

  $ time yes nan | head -n128095 | timeout 10 sort -g >/dev/null
  real  0m0.693s

  $ time yes nan | head -n128095 | timeout 10 src/sort -g >/dev/null
  real  0m0.924s

I'll add the test case I think (attached)
since the existing one didn't trigger this.

thanks!
PádraigFrom bf12c9128523631d7f0ec251734fa05405f70c1c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Mon, 2 May 2022 14:27:34 +0100
Subject: [PATCH] tests: sort-NaN-infloop: augment testing for recent fix

* tests/misc/sort-NaN-infloop.sh: Add test case from
https://unix.stackexchange.com/a/700967/37127
* src/sort.c: Avoid syntax-check failure.
---
 src/sort.c | 2 +-
 tests/misc/sort-NaN-infloop.sh | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/sort.c b/src/sort.c
index b2a465cf5..8af356c66 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -2006,7 +2006,7 @@ numcompare (char const *a, char const *b)
 static int
 nan_compare (long double a, long double b)
 {
-  char buf[2][sizeof "-nan()" + CHAR_BIT * sizeof a];
+  char buf[2][sizeof "-nan""()" + CHAR_BIT * sizeof a];
   snprintf (buf[0], sizeof buf[0], "%Lf", a);
   snprintf (buf[1], sizeof buf[1], "%Lf", b);
   return strcmp (buf[0], buf[1]);
diff --git a/tests/misc/sort-NaN-infloop.sh b/tests/misc/sort-NaN-infloop.sh
index 93cf9bd77..3e459884e 100755
--- a/tests/misc/sort-NaN-infloop.sh
+++ b/tests/misc/sort-NaN-infloop.sh
@@ -23,6 +23,9 @@ echo nan > F || framework_failure_
 printf 'nan\nnan\n' > exp || framework_failure_
 timeout 10 sort -g -m F F > out || fail=1
 
+# This was seen to infloop on some systems until coreutils v9.2 (bug 55212)
+yes nan | head -n128095 | timeout 10 sort -g > /dev/null || fail=1
+
 compare exp out || fail=1
 
 Exit $fail
-- 
2.26.2



bug#55212: GNU Linux "sort -g" can hang indefinitely when run on standard input if NaNs are involved

2022-05-02 Thread Paul Eggert
Thanks for the bug report. This bug is entertaining, as it comes from 
GCC now being so smart that it optimizes away a memset that cleared 
padding bits. We added the memset in coreutils 8.14 (2011) to try to fix 
the sort -g infinite loop bug (introduced in 1999), but the memset isn't 
guaranteed to fix the bug because the memset can be optimized away.


If the padding bits happen to be clear already sort is OK, but if not 
the results can be inconsistent when you compare two NaNs to each other, 
and inconsistent results can make sort infloop.


The C standard allows this level of intelligence in the compiler, so 
it's a bug in GNU 'sort'.


I installed the attached patch; please give it a try. For now I'll 
boldly close the bug report; we can easily reopen it if this patch 
doesn't actually fix the problem.From 2f56f5a42033dc6db15d8963e54566f01fa0d61d Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sun, 1 May 2022 22:46:21 -0700
Subject: [PATCH] sort: fix sort -g infloop again

Problem reported by Giulio Genovese (Bug#55212).
* src/sort.c (nan_compare): To compare NaNs, simply printf+strcmp.
This avoids the problem of padding bits and unspecified behavior.
Args are now long double instead of char *; caller changed.
---
 NEWS   |  6 ++
 src/sort.c | 21 ++---
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/NEWS b/NEWS
index 26eb52ca0..3a9148637 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,12 @@ GNU coreutils NEWS-*- outline -*-
   'mv --backup=simple f d/' no longer mistakenly backs up d/f to f~.
   [bug introduced in coreutils-9.1]
 
+  'sort -g' no longer infloops when given multiple NaNs on platforms
+  like x86-64 where 'long double' has padding bits in memory.
+  Although the fix alters sort -g's NaN ordering, that ordering has
+  long been documented to be platform-dependent.
+  [bug introduced 1999-05-02 and only partly fixed in coreutils-8.14]
+
 
 * Noteworthy changes in release 9.1 (2022-04-15) [stable]
 
diff --git a/src/sort.c b/src/sort.c
index 3b775d6bb..b2a465cf5 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -2003,22 +2003,13 @@ numcompare (char const *a, char const *b)
   return strnumcmp (a, b, decimal_point, thousands_sep);
 }
 
-/* Work around a problem whereby the long double value returned by glibc's
-   strtold ("NaN", ...) contains uninitialized bits: clear all bytes of
-   A and B before calling strtold.  FIXME: remove this function if
-   gnulib guarantees that strtold's result is always well defined.  */
 static int
-nan_compare (char const *sa, char const *sb)
+nan_compare (long double a, long double b)
 {
-  long double a;
-  memset (, 0, sizeof a);
-  a = strtold (sa, NULL);
-
-  long double b;
-  memset (, 0, sizeof b);
-  b = strtold (sb, NULL);
-
-  return memcmp (, , sizeof a);
+  char buf[2][sizeof "-nan()" + CHAR_BIT * sizeof a];
+  snprintf (buf[0], sizeof buf[0], "%Lf", a);
+  snprintf (buf[1], sizeof buf[1], "%Lf", b);
+  return strcmp (buf[0], buf[1]);
 }
 
 static int
@@ -2046,7 +2037,7 @@ general_numcompare (char const *sa, char const *sb)
   : a == b ? 0
   : b == b ? -1
   : a == a ? 1
-  : nan_compare (sa, sb));
+  : nan_compare (a, b));
 }
 
 /* Return an integer in 1..12 of the month name MONTH.
-- 
2.34.1



bug#55212: GNU Linux "sort -g" can hang indefinitely when run on standard input if NaNs are involved

2022-05-01 Thread Giulio Genovese
As explained here
,
when running "sort -g" from standard input, if NaNs are involved, this can
cause "sort -g" to hang indefinitely while consuming 100% of the CPU. This
seems to be system dependent. I cannot reproduce the bug on a RHEL7
machine. However, multiple users seem to be able to reproduce the bug. The
following command can provoke the bug:

yes nan | head -n128095 | timeout 5 sort -g