bug#66253: sort manpage should be more explicit

2023-09-28 Thread Paul Eggert

On 9/28/23 04:22, Pádraig Brady wrote:


   -n, --numeric-sort  compare according to string numerical value.
     leading blanks, negative sign, decimal 
point,

     and thousands separators are supported.


Although a valiant effort this is likely to cause other trouble, as it 
uses multiple terms (blanks, decimal point, thousands separator) without 
explanation, and it omits the role of the locale. I suggest instead that 
we simply say "see the manual", and tighten up the manual to explain 
these and, while we're at it, other things (e.g., -0 vs 0).


I gave that a shot by installing the attached.

PS to Jorge: Changing behavior as you suggested would likely cause 
trouble, as many programs depend on the current behavior, which is 
standardized by POSIX here:


https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#tag_20_119_04From a2434d3e58e8ead6c4c92fd989da32fe648e1545 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Thu, 28 Sep 2023 18:02:25 -0700
Subject: [PATCH] sort: improve --help

Problem reported by Jorge Stolfi (bug#66253).
* src/sort.c (usage): Suggest looking at the manual for -n details.
---
 doc/coreutils.texi | 12 
 src/sort.c |  3 ++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index ee3b1ce11..be4b610be 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -4678,18 +4678,22 @@ can change this.
 @opindex --numeric-sort
 @opindex --sort
 @cindex numeric sort
+@vindex LC_CTYPE
 @vindex LC_NUMERIC
 Sort numerically.  The number begins each line and consists
 of optional blanks, an optional @samp{-} sign, and zero or more
 digits possibly separated by thousands separators, optionally followed
 by a decimal-point character and zero or more digits.  An empty
-number is treated as @samp{0}.  The @env{LC_NUMERIC}
-locale specifies the decimal-point character and thousands separator.
-By default a blank is a space or a tab, but the @env{LC_CTYPE} locale
-can change this.
+number is treated as @samp{0}.  Signs on zeros and leading zeros do
+not affect ordering.
 
 Comparison is exact; there is no rounding error.
 
+The @env{LC_CTYPE} locale specifies which characters are blanks and
+the @env{LC_NUMERIC} locale specifies the thousands separator and
+decimal-point character.  In the C locale, spaces and tabs are blanks,
+there is no thousands separator, and @samp{.} is the decimal point.
+
 Neither a leading @samp{+} nor exponential notation is recognized.
 To compare such strings numerically, use the
 @option{--general-numeric-sort} (@option{-g}) option.
diff --git a/src/sort.c b/src/sort.c
index abee57d7a..5c86b8332 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -444,7 +444,8 @@ Ordering options:\n\
   -h, --human-numeric-sortcompare human readable numbers (e.g., 2K 1G)\n\
 "), stdout);
   fputs (_("\
-  -n, --numeric-sort  compare according to string numerical value\n\
+  -n, --numeric-sort  compare according to string numerical value;\n\
+see manual for which strings are supported\n\
   -R, --random-sort   shuffle, but group identical keys.  See shuf(1)\n\
   --random-source=FILEget random bytes from FILE\n\
   -r, --reverse   reverse the result of comparisons\n\
-- 
2.41.0



bug#66253: sort manpage should be more explicit

2023-09-28 Thread Pádraig Brady

On 28/09/2023 11:11, Jorge Stolfi wrote:

The full documentation of sort explains that numeric sorting (as in
"sort -n") accepts a leading "-" sign, decimal points, thousands
separators, etc, but does not accept an explicit "+" sign. Values with
explicit "+" are treated as numeric 0 and ties are broken by alpha sort.

However, the manpage only says that "-n" "compares according to string
numerical value" -- and one would expect the numerical value of "+100"
to be 100, not zero.

It took me an hour to figure out that my "sort -n" was failing because
of this "feature".  Surely many users have wasted time too, or worse.
So please either

1) explain precisely IN THE MANPAGE what is a valid number;

2) make numeric sort accept a leading "+", as users would expect;

3) make numeric sort abort with an error message if any field that is
supposed to be sorted numerically is not a valid number.

I think the best solution for users would be to implement all three of
these...

Thank you, and all the best


Note the --debug option really helps with all this:

  $ printf '%s\n' '+4' ' 5' '-1,2.3' | sort -s -n --debug
  sort: note numbers use ‘.’ as a decimal point in this locale
  -1,2.3
  __
  +4
  ^ no match for key
   5
   _


In saying that, sorting numbers is such a common use case,
it's probably worth adding an extra couple of lines to the man page.
I think I'll apply the following later:

  -n, --numeric-sort  compare according to string numerical value.
leading blanks, negative sign, decimal point,
and thousands separators are supported.

cheers,
Pádraig





bug#66253: sort manpage should be more explicit

2023-09-28 Thread Jorge Stolfi
The full documentation of sort explains that numeric sorting (as in  
"sort -n") accepts a leading "-" sign, decimal points, thousands  
separators, etc, but does not accept an explicit "+" sign. Values with  
explicit "+" are treated as numeric 0 and ties are broken by alpha sort.


However, the manpage only says that "-n" "compares according to string  
numerical value" -- and one would expect the numerical value of "+100"  
to be 100, not zero.


It took me an hour to figure out that my "sort -n" was failing because  
of this "feature".  Surely many users have wasted time too, or worse.  
So please either


1) explain precisely IN THE MANPAGE what is a valid number;

2) make numeric sort accept a leading "+", as users would expect;

3) make numeric sort abort with an error message if any field that is  
supposed to be sorted numerically is not a valid number.


I think the best solution for users would be to implement all three of  
these...


Thank you, and all the best

--jorge

--
Jorge Stolfi - Professor Titular/Full Professor
Instituto de Computação/Computer Science Dept
Universidade Estadual de Campinas/State University of Campinas
Campinas, SP - Brazil