Re: problem with command sort after uniq -c

2008-03-11 Thread Damien ANCELIN
You're right, my locale is set to fr_FR. I've tried with en_EN and 
en_US, and it works fine (and with -k1,1 too).
I think I understand the problem with the locale fr_FR : in french, to 
write 123456.78 in a easily readable form you write 123 456,78 (and in 
english it's 123,456.78).


Thanks Philip and Andreas for your answers (and sorry for polluting the 
bug mailing list).

Damien

Andreas Schwab a écrit :

Damien ANCELIN [EMAIL PROTECTED] writes:

  

I met a problem with the sort command : I've used the uniq command with
the -c option to count some numbers, and then applying sort -n don't sort
lines by numeric order of the first field.
Here is an example (my sort version is 5.97) :
$ cat bug_sort | sort -n



This is a useless use of cat, you can just redirect sort's standard
input from the file.

  

  1320 51970
  1692 12345
 22681 8060
 26063 8649
  2668 33603
  3487 44496
  4350 23246
 47013 8000
  5447 2
 81724 5000



I assume that you use the fr_FR locale.  In this locale a number can be
grouped with a space, thus it is considered part of the number.  If you
want to be sure that sort only considers the first field as sort key you
should use -k1,1 to limit it.  The default is to always use the the
whole line as sort key, and sort -n will take as much as possible from
the key to match a number.

Andreas.

  


--
Damien ANCELIN
INRIA - ENS-Lyon, LIP (RESO)
Bureau 322 Sud
Tel : +33 4 72 72 85 02



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


problem with command sort after uniq -c

2008-03-10 Thread Damien ANCELIN

Hello,

I met a problem with the sort command : I've used the uniq command with 
the -c option to count some numbers, and then applying sort -n don't 
sort lines by numeric order of the first field.

Here is an example (my sort version is 5.97) :
$ cat bug_sort | sort -n
  1320 51970
  1692 12345
 22681 8060
 26063 8649
  2668 33603
  3487 44496
  4350 23246
 47013 8000
  5447 2
 81724 5000
If I add a non-numeric and non-space character between the 2 fields, 
sort -n works properly :

$ cat bug_sort | sed s/\([0-9]\) \([0-9]\)/\1 -\2/ | sort -n
  1320 -51970
  1692 -12345
  2668 -33603
  3487 -44496
  4350 -23246
  5447 -2
 22681 -8060
 26063 -8649
 47013 -8000
 81724 -5000

With only spaces between the 2 fields, sort -n read 1 number per line 
and use it to do the sort : 2668 33603 is read as 266833603. With this 
consideration, the result of sort is correct, but it's not what I 
expected (and I didn't see this behaviour in the documentation).


Regards,
Damien

--
Damien ANCELIN
INRIA - ENS-Lyon, LIP (RESO)
Bureau 322 Sud
Tel : +33 4 72 72 85 02



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: problem with command sort after uniq -c

2008-03-10 Thread Philip Rowlands

On Mon, 10 Mar 2008, Damien ANCELIN wrote:

I met a problem with the sort command : I've used the uniq command with the -c 
option to count some numbers, and then applying sort -n don't sort lines by 
numeric order of the first field.

Here is an example (my sort version is 5.97) :
$ cat bug_sort | sort -n
  1320 51970
  1692 12345
 22681 8060
 26063 8649
  2668 33603
  3487 44496
  4350 23246
 47013 8000
 5447 2
81724 5000


You don't say which locale your environment is configured to use for 
sorting, but I'd bet it's one which treats whitespace differently to how 
you expect.


With only spaces between the 2 fields, sort -n read 1 number per line 
and use it to do the sort : 2668 33603 is read as 266833603. With this 
consideration, the result of sort is correct, but it's not what I 
expected (and I didn't see this behaviour in the documentation).


The command sort -n treats the whole line as the sort key. Specifying 
sort -k1,1n will use just the first field, in ascending numerical 
order.



Cheers,
Phil


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: problem with command sort after uniq -c

2008-03-10 Thread Andreas Schwab
Damien ANCELIN [EMAIL PROTECTED] writes:

 I met a problem with the sort command : I've used the uniq command with
 the -c option to count some numbers, and then applying sort -n don't sort
 lines by numeric order of the first field.
 Here is an example (my sort version is 5.97) :
 $ cat bug_sort | sort -n

This is a useless use of cat, you can just redirect sort's standard
input from the file.

   1320 51970
   1692 12345
  22681 8060
  26063 8649
   2668 33603
   3487 44496
   4350 23246
  47013 8000
   5447 2
  81724 5000

I assume that you use the fr_FR locale.  In this locale a number can be
grouped with a space, thus it is considered part of the number.  If you
want to be sure that sort only considers the first field as sort key you
should use -k1,1 to limit it.  The default is to always use the the
whole line as sort key, and sort -n will take as much as possible from
the key to match a number.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: problem with command sort after uniq -c

2008-03-10 Thread Bauke Jan Douma

Andreas Schwab wrote on 10-03-08 19:54:

Damien ANCELIN [EMAIL PROTECTED] writes:


I met a problem with the sort command : I've used the uniq command with
the -c option to count some numbers, and then applying sort -n don't sort
lines by numeric order of the first field.
Here is an example (my sort version is 5.97) :
$ cat bug_sort | sort -n


This is a useless use of cat, you can just redirect sort's standard
input from the file.


True, but such constructs do happen.

What might have been the case here, and which is a
situation that I find myself in sometimes, is this:
you want to do 'filter1 FILE | filter2'
(or 'filter1 FILE | filter2').  Somehow the output
isn't what's to be expected.  You investigate, and
part of that is temporarily substituting filter1 for
plain cat and the command becomes 'cat FILE | filter2'.

Most of the time this is on the command-line.

bjd



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: problem with command sort after uniq -c

2008-03-10 Thread Bob Proulx
Bauke Jan Douma wrote:
 What might have been the case here, and which is a
 situation that I find myself in sometimes, is this:
 you want to do 'filter1 FILE | filter2'
 (or 'filter1 FILE | filter2').  Somehow the output
 isn't what's to be expected.  You investigate, and
 part of that is temporarily substituting filter1 for
 plain cat and the command becomes 'cat FILE | filter2'.
 
 Most of the time this is on the command-line.

On your own command line is fine.  It is your command line.  No one
else would ever see it.

The objections come in when people write these into scripts and into
test cases and share these around to other people.  Many people have a
belief that cat into a pipe is the only way to do it.  I have seen
hundreds of lines written this way in a single script!  It is a
misunderstanding.

Educating users to improve their programming abilities is just one of
the many burdens that must be endured, or else endure the endless
burden of even more programs written poorly with these misconceptions.

Bob


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils