bug#26741: Numeric sort in ls

2017-05-02 Thread Tony Malykh
It should be possible from technical point of view to add a new option
--numeric-sort or --natural-sort that wouldn't conflict with -X or any
other sort options of ls.

I am a C++ developer myself, so I can implement this new options
myself. Would the community of coreutils developers be interested in
accepting a patch like this?

I've looked at the source code, and it seems that the function that
does lexicographic sorting is strcoll. The implementation of this
function is quite complicated, and it is implemented in glibc. What I
would propose is to write a new function strcoll_natural in ls code,
that would identify which characters are digits in the input strings.
Then it would pass non-digit chunks to strcoll to preserve the
handling of locales, and it would do the smart comparison of digits as
numbers.

As a separate note, what I dislike about ls -v option is that it
handles capital letters in a different way than without the -v option:
$ ls -1
a
B
c
D

$ ls -1v
B
D
a
c

I have no particular preference among these two, but I'd rather see
them consistent. It seems like this is a known "caveat" of -v option,
since it ignores the locale.

Tony


On 5/2/17, Andreas Schwab  wrote:
> On Mai 01 2017, Tony Malykh  wrote:
>
>> It appears that version sort (-v) option achieves exactly that.
>> However, the problem with it is that it seems to be incompatible with
>> extension sort (-X) option.
>
> This isn't limited to these two option.  Generally, ls only supports a
> single sort option, with later options overriding previous ones.
>
> Andreas.
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
>





bug#26741: Numeric sort in ls

2017-05-02 Thread Andreas Schwab
On Mai 01 2017, Tony Malykh  wrote:

> It appears that version sort (-v) option achieves exactly that.
> However, the problem with it is that it seems to be incompatible with
> extension sort (-X) option.

This isn't limited to these two option.  Generally, ls only supports a
single sort option, with later options overriding previous ones.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





bug#26741: Numeric sort in ls

2017-05-01 Thread Eric Blake
tag 26741 notabug
thanks

On 05/01/2017 07:50 PM, Tony Malykh wrote:
> Hi all,
> 
> I am wondering if there is a way to do a numeric sort in ls?

Thanks for the report; the quick answer is that sort can already do what
you want, so post-process your ls output with sort.

> Here is a concrete example.

Thanks, that makes it much easier to reproduce what you want, and
demonstrate along the way what is going on.

> 
> Here are some files
> $ ls -1
> 1.py
> 11.py
> 2.pl
> 2.py

I'll compress that set of data to:

$ printf %s\\n 1.py 11.py 2.pl 2.py

> 
> Desired output:
> $ ls -1 --
> 2.pl
> 1.py
> 2.py
> 11.py

If you are okay with:

ls -1 | sort 

then we are good to go. In fact, in that configuration, you can use
plain 'ls' rather than 'ls -1' (since POSIX requires -1 to be the
default behavior when stdout of ls is a pipeline).

> # Note that 2.pl goes to the top because of its extension.
> # Note also that the *.py files are sorted numerically and not
> lexicographically.

So that says that you want your primary sort key to be the extension
field, sorted lexically; and your secondary sort field to be the rest of
the name, sorted numerically.  How about the following?

$ printf %s\\n 1.py 11.py 2.pl 2.py | sort -t . -k2,2 -k1,1n
2.pl
1.py
2.py
11.py

Again, with your directory layout, you could use:

$ ls | sort -t . -k2,2 -k1,1n

for the same results.

Now, is that entirely robust? Not really - all it takes is one file with
no extension, or one file with embedded '.' in the name beyond the
extension, and you are no longer able to reliably specify which sort
field is the extension.

$ printf %s\\n 4.4.py 3 1.py 11.py 2.pl 2.py | sort -t . -k2,2 -k1,1n
3
4.4.py
2.pl
1.py
2.py
11.py

But never fear - we can use the decorate-sort-undecorate pattern to
temporarily swap things around:

$ printf %s\\n 4.4.py 3 1.py 11.py 2.pl 2.py \
 | sed 's/\(.*\)\.\([^.]*\)$/\2.\1/' \
 | sort -t . -k1,1 -k2n \
 | sed 's/\(^[^.]*\)\.\(.*\)/\2.\1/'
3
2.pl
1.py
2.py
4.4.py
11.py

(although when you start getting that complex, awk, perl, or a dedicated
C program start to sound more appealing).

> 
> Attempt #1:
> $ ls -1Xv
> 1.py
> 2.pl
> 2.py
> 11.py
> # Note that the files are sorted numerically, but the extension sort
> (-X) option seem to have been ignored, since the 2.pl file is among
> the *.py files.

That's because 'ls' has exactly ONE level of sort. You cannot specify a
primary and secondary key to ls, but rather the last sort type requested
overrules all earlier requests.  The only program with multiple levels
of sort is, not surprisingly, 'sort'.  'ls -Xv' is identical to 'ls -v'.

> 
> Attempt #2:
> $ ls -1vX
> 2.pl
> 1.py
> 11.py
> 2.py
> # Now the extension sort is working, but the files are sorted
> lexicographically and not numerically.
> 

'ls -vX' is identical to 'ls -X'.

It appears that your request is to modify ls directly to subsume the
ability that sort already has to list multiple sort keys, and break ties
under one key by resorting to the next.  However, the bar is VERY high
to add any new features to ls, and unless you can point to existing
practice of some other ls implementation that does the same, we are
probably going to leave it at requiring you to post-process the data.
Besides, you'd have to wait for a new version of ls to be build and land
in your distro, while you already have post-processing tools at your
disposal that are more portable.

So, for now, I'm closing this as not a bug, although you should feel
free to continue the conversation if you have more to add.

> Here is my version:
> $ ls --version
> ls (GNU coreutils) 8.26
> Packaged by Cygwin (8.26-2)

[hmm - a good reminder that my TODO list includes packaging an updated
coreutils for cygwin...]

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


bug#26741: Numeric sort in ls

2017-05-01 Thread Tony Malykh
Hi all,

I am wondering if there is a way to do a numeric sort in ls? I am
aware of version sort option -v and blow I explain the problem with
it.

What I'd like to achieve:
Suppose I have files with numbers in their names.:
p1.py
p2.py
p11.py

It appears that version sort (-v) option achieves exactly that.
However, the problem with it is that it seems to be incompatible with
extension sort (-X) option. This limitation seems artificial to me: it
is fairly easy to sort files by the extension, and then sort them
numerically within every extension group.

If these files are sorted lexicographically, p2.py would come on the
last position. I would like p2.py to appear in between p1.py and
p11.py.

Maybe I don't understand something about the version sort algorithm;
maybe it does something more than numeric sort. But in this case, is
it possible to add a new flag for numeric sort, that is one that would
treat digits as numbers and would not conflict with the extension sort
-X ?

Here is a concrete example.

Here are some files
$ ls -1
1.py
11.py
2.pl
2.py

Desired output:
$ ls -1 --
2.pl
1.py
2.py
11.py
# Note that 2.pl goes to the top because of its extension.
# Note also that the *.py files are sorted numerically and not
lexicographically.

Attempt #1:
$ ls -1Xv
1.py
2.pl
2.py
11.py
# Note that the files are sorted numerically, but the extension sort
(-X) option seem to have been ignored, since the 2.pl file is among
the *.py files.

Attempt #2:
$ ls -1vX
2.pl
1.py
11.py
2.py
# Now the extension sort is working, but the files are sorted
lexicographically and not numerically.

Here is my version:
$ ls --version
ls (GNU coreutils) 8.26
Packaged by Cygwin (8.26-2)

Any suggestions will be appreciated!
Thanks
Tony