bug#26741: Numeric sort in ls
It should be possible from technical point of view to add a new option --numeric-sort or --natural-sort that wouldn't conflict with -X or any other sort options of ls. I am a C++ developer myself, so I can implement this new options myself. Would the community of coreutils developers be interested in accepting a patch like this? I've looked at the source code, and it seems that the function that does lexicographic sorting is strcoll. The implementation of this function is quite complicated, and it is implemented in glibc. What I would propose is to write a new function strcoll_natural in ls code, that would identify which characters are digits in the input strings. Then it would pass non-digit chunks to strcoll to preserve the handling of locales, and it would do the smart comparison of digits as numbers. As a separate note, what I dislike about ls -v option is that it handles capital letters in a different way than without the -v option: $ ls -1 a B c D $ ls -1v B D a c I have no particular preference among these two, but I'd rather see them consistent. It seems like this is a known "caveat" of -v option, since it ignores the locale. Tony On 5/2/17, Andreas Schwabwrote: > On Mai 01 2017, Tony Malykh wrote: > >> It appears that version sort (-v) option achieves exactly that. >> However, the problem with it is that it seems to be incompatible with >> extension sort (-X) option. > > This isn't limited to these two option. Generally, ls only supports a > single sort option, with later options overriding previous ones. > > Andreas. > > -- > Andreas Schwab, sch...@linux-m68k.org > GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 > "And now for something completely different." >
bug#26741: Numeric sort in ls
On Mai 01 2017, Tony Malykhwrote: > It appears that version sort (-v) option achieves exactly that. > However, the problem with it is that it seems to be incompatible with > extension sort (-X) option. This isn't limited to these two option. Generally, ls only supports a single sort option, with later options overriding previous ones. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
bug#26741: Numeric sort in ls
tag 26741 notabug thanks On 05/01/2017 07:50 PM, Tony Malykh wrote: > Hi all, > > I am wondering if there is a way to do a numeric sort in ls? Thanks for the report; the quick answer is that sort can already do what you want, so post-process your ls output with sort. > Here is a concrete example. Thanks, that makes it much easier to reproduce what you want, and demonstrate along the way what is going on. > > Here are some files > $ ls -1 > 1.py > 11.py > 2.pl > 2.py I'll compress that set of data to: $ printf %s\\n 1.py 11.py 2.pl 2.py > > Desired output: > $ ls -1 -- > 2.pl > 1.py > 2.py > 11.py If you are okay with: ls -1 | sort then we are good to go. In fact, in that configuration, you can use plain 'ls' rather than 'ls -1' (since POSIX requires -1 to be the default behavior when stdout of ls is a pipeline). > # Note that 2.pl goes to the top because of its extension. > # Note also that the *.py files are sorted numerically and not > lexicographically. So that says that you want your primary sort key to be the extension field, sorted lexically; and your secondary sort field to be the rest of the name, sorted numerically. How about the following? $ printf %s\\n 1.py 11.py 2.pl 2.py | sort -t . -k2,2 -k1,1n 2.pl 1.py 2.py 11.py Again, with your directory layout, you could use: $ ls | sort -t . -k2,2 -k1,1n for the same results. Now, is that entirely robust? Not really - all it takes is one file with no extension, or one file with embedded '.' in the name beyond the extension, and you are no longer able to reliably specify which sort field is the extension. $ printf %s\\n 4.4.py 3 1.py 11.py 2.pl 2.py | sort -t . -k2,2 -k1,1n 3 4.4.py 2.pl 1.py 2.py 11.py But never fear - we can use the decorate-sort-undecorate pattern to temporarily swap things around: $ printf %s\\n 4.4.py 3 1.py 11.py 2.pl 2.py \ | sed 's/\(.*\)\.\([^.]*\)$/\2.\1/' \ | sort -t . -k1,1 -k2n \ | sed 's/\(^[^.]*\)\.\(.*\)/\2.\1/' 3 2.pl 1.py 2.py 4.4.py 11.py (although when you start getting that complex, awk, perl, or a dedicated C program start to sound more appealing). > > Attempt #1: > $ ls -1Xv > 1.py > 2.pl > 2.py > 11.py > # Note that the files are sorted numerically, but the extension sort > (-X) option seem to have been ignored, since the 2.pl file is among > the *.py files. That's because 'ls' has exactly ONE level of sort. You cannot specify a primary and secondary key to ls, but rather the last sort type requested overrules all earlier requests. The only program with multiple levels of sort is, not surprisingly, 'sort'. 'ls -Xv' is identical to 'ls -v'. > > Attempt #2: > $ ls -1vX > 2.pl > 1.py > 11.py > 2.py > # Now the extension sort is working, but the files are sorted > lexicographically and not numerically. > 'ls -vX' is identical to 'ls -X'. It appears that your request is to modify ls directly to subsume the ability that sort already has to list multiple sort keys, and break ties under one key by resorting to the next. However, the bar is VERY high to add any new features to ls, and unless you can point to existing practice of some other ls implementation that does the same, we are probably going to leave it at requiring you to post-process the data. Besides, you'd have to wait for a new version of ls to be build and land in your distro, while you already have post-processing tools at your disposal that are more portable. So, for now, I'm closing this as not a bug, although you should feel free to continue the conversation if you have more to add. > Here is my version: > $ ls --version > ls (GNU coreutils) 8.26 > Packaged by Cygwin (8.26-2) [hmm - a good reminder that my TODO list includes packaging an updated coreutils for cygwin...] -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
bug#26741: Numeric sort in ls
Hi all, I am wondering if there is a way to do a numeric sort in ls? I am aware of version sort option -v and blow I explain the problem with it. What I'd like to achieve: Suppose I have files with numbers in their names.: p1.py p2.py p11.py It appears that version sort (-v) option achieves exactly that. However, the problem with it is that it seems to be incompatible with extension sort (-X) option. This limitation seems artificial to me: it is fairly easy to sort files by the extension, and then sort them numerically within every extension group. If these files are sorted lexicographically, p2.py would come on the last position. I would like p2.py to appear in between p1.py and p11.py. Maybe I don't understand something about the version sort algorithm; maybe it does something more than numeric sort. But in this case, is it possible to add a new flag for numeric sort, that is one that would treat digits as numbers and would not conflict with the extension sort -X ? Here is a concrete example. Here are some files $ ls -1 1.py 11.py 2.pl 2.py Desired output: $ ls -1 -- 2.pl 1.py 2.py 11.py # Note that 2.pl goes to the top because of its extension. # Note also that the *.py files are sorted numerically and not lexicographically. Attempt #1: $ ls -1Xv 1.py 2.pl 2.py 11.py # Note that the files are sorted numerically, but the extension sort (-X) option seem to have been ignored, since the 2.pl file is among the *.py files. Attempt #2: $ ls -1vX 2.pl 1.py 11.py 2.py # Now the extension sort is working, but the files are sorted lexicographically and not numerically. Here is my version: $ ls --version ls (GNU coreutils) 8.26 Packaged by Cygwin (8.26-2) Any suggestions will be appreciated! Thanks Tony