Re: [PATCH] ls: add --files0-from=FILE option
On Wed, 21 Apr 2021, Carl Edquist wrote: Thanks Pádraig for the thoughtful reply! You bring up some good points, which for the sake of interesting discussion i'd like to follow up on also. (Maybe later this week...) So to follow up - i don't have any action items here, just some details that i thought might make for interesting discussion. On Tue, 20 Apr 2021, p...@draigbrady.com wrote: In this case, xargs batches *13* separate invocations of ls; so the overall sorting is completely lost. But with the new option: [linux]$ find -name '*.[ch]' -print0 | ls -lrSh --files0-from=- The sizes all scroll in order One can also implement this functionality with the DSU pattern like: nlargest=10 find . -printf '%s\t%p\0' | sort -z -k1,1n | tail -z -n"$nlargest" | cut -z -f2 | xargs -r0 ls -lUd --color=auto -- I appreciate you taking the time to write out a full DSU example. I was going to save this topic for a separate thread, but yeah one of my observations over the years is i would see people make (i would say oversimplified) comments that "you can do that with find", but they rarely go through the effort to show it (and thus demonstrate just how much more complicated the equivalent find-based solution is to type out). (More in this vein a bit later.) Arguably that's more scalable as the sort operation will not be restricted by available RAM, and will use temp storage. Agreed! This is a good point to keep in mind in general for ls. On the one hand, ls already has file metadata information available to it in raw format (eg, size, timestamps), so there isn't the overhead to convert relevant keys to sortable strings and write it all across pipes for IPC. But on the other hand, as you are saying, for the things that find(1) can give sort(1) to sort (which is _most_ of the ls --sort modes), sort(1) can sort a bit more efficiently and can handle situations with very many file names and very limited system memory. But by the same argument, find . -mindepth 1 -maxdepth 1 ! -name .\* -printf '%f\0' | sort -z | xargs -r0 ls $LS_OPTIONS -dU -- is more scalable than a bare 'ls' invocation, since there can be literally millions of entries in a single directory. But 'ls' is certainly easier to type. And if you want to sort on anything more interesting than the name itself, the DSU pipeline just gets more complex. So at some point you can weigh scalability vs usability based on the situation. (In the linux tree example above, the "ls -lrSh --files0-from=-" run with 45648 input source files has a maxrss of ~16M, so for instance this use case is small enough for me not to worry about mem use.) [By the same token, the sorting of shell globs in general (which in bash can be expanded to an arbitrary number of arguments), can be more scalably done outside of the shell (with find|sort) than in a shell glob itself. And while that is not a coreutils issue, the point is that despite this, (as with ls) shell globs can still be more convenient to use for sorting a set of files, perhaps in most cases, than an equivalent multi-tool pipeline.] But doing the sorting in ls (rather than find|sort|cut|xargs) is not just easier to type -- it's also easier to _get it right_ without a lot of trial and error. And for casual interactive use, that's kind of a feature, too. For instance, in your DSU example, i spy a subtle bug in the undecorate step: looks like it should be "cut -zf2-" instead of "cut -zf2" ... because after all, tabs are legal in filenames, too. The DSU in pixelbeat's "newest" script (mentioned later) is also not free from filename handling issues [1]. (Point being - even for experienced users, writing a replacement for ls's internal sorting with an equivalent DSU pipeline can be tricky & time consuming to get right, and easy to get subtly wrong.) Also there are some other subtle differences that might be worth keeping in mind: - To get the same sort order as ls, you actually need "sort -rzk1,1n" instead of just "sort -zk1,1n", since for tie-breaking ls sorts the keys and the names in opposite directions for ls -t and -S (whether or not you pass -r to ls). (Likewise if you want the same order as ls -S without the -r, you need a slightly different "sort -zk1,1nr".) - when you put 'ls' after a pipe (as in "... | ls -lrSh --files0-from=-") you typically get the alias (eg, slackware has ls='/bin/ls $LS_OPTIONS'). Meanwhile 'xargs ls' is whichever 'ls' is first in PATH, and does not include LS_OPTIONS. Not a big deal, but, buyer beware. - lastly (and perhaps the only thing you can't do anything about), when you use 'xargs ls' rather than a single ls invocation, you lose the aggregate width alignments for any ls formats with column output (in this case -l, but it's also true for -C and -x). Also it's more robust as ls is the last step in the pipeline, presenting the files to the user. If you wanted the
Re: [PATCH] ls: add --files0-from=FILE option
On Tue, 20 Apr 2021, p...@draigbrady.com wrote: > One can also implement this functionality with the DSU pattern like: > > nlargest=10 > find . -printf '%s\t%p\0' | > sort -z -k1,1n | tail -z -n"$nlargest" | cut -z -f2 | > xargs -r0 ls -lUd --color=auto -- > > Arguably that's more scalable as the sort operation will not > be restricted by available RAM, and will use temp storage. > > Also it's more robust as ls is the last step in the pipeline, > presenting the files to the user. If you wanted the largest 10 > with ls --files0-from then file names containing a newline > would mess up further processing by tail etc. > >> Similarly, say you would like to view / scroll through your extensive mp3 >> collection in chronological order (based on when you added the files to >> your collection). You can do it now with the new option: >> >> [music]$ find -name \*.mp3 -print0 | ls -lrth --files0-from=- > > I've used a https://www.pixelbeat.org/scripts/newest script > for a long time with similar find|xargs technique as I described above. > > In saying all of the above, I do agree though that for consistency > commands that need to process all arguments with global context, > should have a --files0-from option. > Currently that's du and wc for total counts, and sort(1) for sorting. > Since ls has sorting functionality, it should have this option too. Thanks Pádraig for the thoughtful reply! You bring up some good points, which for the sake of interesting discussion i'd like to follow up on also. (Maybe later this week...) Have a good day! Carl
Re: [PATCH] ls: add --files0-from=FILE option
Hi Berny, On Wed, 21 Apr 2021, Bernhard Voelker wrote: shouldn't it use the 'argv-iter' gnulib module (like du.c and wc.c) instead of directly using the underlying ... +#include "readtokens0.h" I considered this, too! :) I think the short answer is that du and wc don't actually need to keep any information around from all the files they process except for the running totals, so "at scale" the argv-iter version for files0 processing allows for a constant (?) memory usage in du and wc regardless of the number of input items. But for ls, even if using argv-iter would result in one less copy of the filenames, the scaled memory requirements are still the same for sorting (fileinfo for each entry has to be kept till the end). [If anything, i would look more carefully into whether gobble_file in ls.c really needs to make a copy of each filename, rather than just using the command line arguments or names from files0 input directly.] So i ended up following the pattern closer to what's in sort.c. Note that argv-iter is not used in there either, perhaps for a similar reason (it's not going to bring the memory requirements down to O(1), the way it could for wc and du). The one case ls might possibly find an improvement with argv-iter is for unsorted, non-column output (so, specifically -U1), where the names are only needed once. But in that case there's no need to use the '--files0-from' option for global sort or summary info -- you could use 'xargs -0 ls -U1 ...' instead for identical output. It just didn't seem like there was a strong argument for it (same memory scaling regardless). And the other frank truth is (if you look at wc.c and du.c for examples), it seemed like adding argv_iterator processing would significantly complicate the code. Those were my thoughts anyway :) Have a good one, Carl
Re: [PATCH] ls: add --files0-from=FILE option
On 4/20/21 2:57 PM, Pádraig Brady wrote: > On 19/04/2021 18:08, Carl Edquist wrote: > In saying all of the above, I do agree though that for consistency > commands that need to process all arguments with global context, > should have a --files0-from option. > Currently that's du and wc for total counts, and sort(1) for sorting. > Since ls has sorting functionality, it should have this option too. I don't have a strong opinion about whether to add it or not, but if we add it, then shouldn't it use the 'argv-iter' gnulib module (like du.c and wc.c) instead of directly using the underlying ... >> +#include "readtokens0.h" ? Have a nice day, Berny
Re: [PATCH] ls: add --files0-from=FILE option
On 19/04/2021 18:08, Carl Edquist wrote: Greetings Coreutils, I'm submitting for your consideration here a patch to add the standard '--files0-from=FILE' option to ls. (To read NUL-terminated names from FILE rather than using command line arguments.) Motivation for adding this to ls is mainly to let ls sort arbitrarily many items, though it is also necessary for getting the correct aggregate column widths to align long format (-l) output across all file names. As a real example, if you want to use ls to list (say, in long format with human sizes) all the sources in the linux kernel tree according to size, you might naively try one of [linux]$ find -name '*.[ch]' -exec ls -lrSh {} + [linux]$ find -name '*.[ch]' -print0 | xargs -0 ls -lrSh but you'll see the sizes spiral over and over, finally ending somewhere in the middle: ... -rw-r--r-- 1 kx users 81K Apr 15 04:30 ./arch/arm/boot/dts/imx6sll-pinfunc.h -rw-r--r-- 1 kx users 83K Apr 15 04:30 ./arch/arm/boot/dts/imx6dl-pinfunc.h -rw-r--r-- 1 kx users 87K Apr 15 04:30 ./arch/arm/mach-imx/iomux-mx35.h -rw-r--r-- 1 kx users 107K Apr 15 04:30 ./arch/arm/boot/dts/imx7d-pinfunc.h -rw-r--r-- 1 kx users 143K Apr 15 04:30 ./arch/arm/boot/dts/imx6sx-pinfunc.h In this case, xargs batches *13* separate invocations of ls; so the overall sorting is completely lost. But with the new option: [linux]$ find -name '*.[ch]' -print0 | ls -lrSh --files0-from=- The sizes all scroll in order One can also implement this functionality with the DSU pattern like: nlargest=10 find . -printf '%s\t%p\0' | sort -z -k1,1n | tail -z -n"$nlargest" | cut -z -f2 | xargs -r0 ls -lUd --color=auto -- Arguably that's more scalable as the sort operation will not be restricted by available RAM, and will use temp storage. Also it's more robust as ls is the last step in the pipeline, presenting the files to the user. If you wanted the largest 10 with ls --files0-from then file names containing a newline would mess up further processing by tail etc. Similarly, say you would like to view / scroll through your extensive mp3 collection in chronological order (based on when you added the files to your collection). You can do it now with the new option: [music]$ find -name \*.mp3 -print0 | ls -lrth --files0-from=- I've used a https://www.pixelbeat.org/scripts/newest script for a long time with similar find|xargs technique as I described above. In saying all of the above, I do agree though that for consistency commands that need to process all arguments with global context, should have a --files0-from option. Currently that's du and wc for total counts, and sort(1) for sorting. Since ls has sorting functionality, it should have this option too. thanks! Pádraig
Re: [PATCH] ls: add --files0-from=FILE option
On Mon, 19 Apr 2021, Carl Edquist wrote: I'm submitting for your consideration here a patch to add the standard '--files0-from=FILE' option to ls. Oops! Somehow I ended up attaching an old version of the patch - this one includes some cleanups and help output. Thanks, CarlFrom 30fb711962bbd432aea4268baf93d7ba17aae4bc Mon Sep 17 00:00:00 2001 From: Carl Edquist Date: Fri, 10 May 2019 17:05:47 -0500 Subject: [PATCH] ls: add --files0-from=FILE option Useful for things like find [...] -type f -print0 | ls -lrSh --files0-from=- where an attempt to do the same with 'find -print0 | xargs -0' or 'find -exec' would fail to fit all the file names onto a single command line, and thus ls would not sort all the input items together, nor align the columns from -l across all items. * src/ls.c (files_from): New var for input filename. (long_options): Add new long option. (read_files0): Add helper function to consume files0 input. (main): Add logic for files_from handling. (decode_switches): Handle FILES0_FROM_OPTION. * tests/ls/files0-from-option.sh: Excercise new option. * tests/local.mk: Include new test. * doc/coreutils.texi: Document --files0-from=FILE option. * NEWS: Mention the new feature. --- NEWS | 3 ++ doc/coreutils.texi | 2 ++ src/ls.c | 69 +++--- tests/local.mk | 1 + tests/ls/files0-from-option.sh | 40 5 files changed, 111 insertions(+), 4 deletions(-) create mode 100755 tests/ls/files0-from-option.sh diff --git a/NEWS b/NEWS index 090fbc7..7f7f86b 100644 --- a/NEWS +++ b/NEWS @@ -73,6 +73,9 @@ GNU coreutils NEWS-*- outline -*- ls now accepts the --sort=width option, to sort by file name width. This is useful to more compactly organize the default vertical column output. + ls now accepts the --files0-from=FILE option, where FILE contains a + list of NUL-terminated file names. + nl --line-increment can now take a negative number to decrement the count. ** Improvements diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 3e3aedb..e008746 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -7467,6 +7467,8 @@ option has been specified (@option{--classify} (@option{-F}), @option{--dereference} (@option{-L}), or @option{--dereference-command-line} (@option{-H})). +@filesZeroFromOption{ls,,sorted output (and aligned columns in -l mode)} + @item --group-directories-first @opindex --group-directories-first Group all the directories before the files and then sort the diff --git a/src/ls.c b/src/ls.c index 4586b5e..0d0678b 100644 --- a/src/ls.c +++ b/src/ls.c @@ -102,6 +102,7 @@ #include "mpsort.h" #include "obstack.h" #include "quote.h" +#include "readtokens0.h" #include "smack.h" #include "stat-size.h" #include "stat-time.h" @@ -356,6 +357,9 @@ static bool align_variable_outer_quotes; static void **sorted_file; static size_t sorted_file_alloc; +/* input filename for --files0-from */ +static char *files_from; + /* When true, in a color listing, color each symlink name according to the type of file it points to. Otherwise, color them according to the 'ln' directive in LS_COLORS. Dangling (orphan) symlinks are treated specially, @@ -833,6 +837,7 @@ enum COLOR_OPTION, DEREFERENCE_COMMAND_LINE_SYMLINK_TO_DIR_OPTION, FILE_TYPE_INDICATOR_OPTION, + FILES0_FROM_OPTION, FORMAT_OPTION, FULL_TIME_OPTION, GROUP_DIRECTORIES_FIRST_OPTION, @@ -892,6 +897,7 @@ static struct option const long_options[] = {"block-size", required_argument, NULL, BLOCK_SIZE_OPTION}, {"context", no_argument, 0, 'Z'}, {"author", no_argument, NULL, AUTHOR_OPTION}, + {"files0-from", required_argument, NULL, FILES0_FROM_OPTION}, {GETOPT_HELP_OPTION_DECL}, {GETOPT_VERSION_OPTION_DECL}, {NULL, 0, NULL, 0} @@ -1626,11 +1632,38 @@ signal_restore (void) signal_setup (false); } +static void +read_files0(struct Tokens *tokp) +{ + FILE *stream; + if (STREQ (files_from, "-")) +stream = stdin; + else +{ + stream = fopen (files_from, "r"); + if (stream == NULL) +die (EXIT_FAILURE, errno, _("cannot open %s for reading"), + quoteaf (files_from)); +} + + readtokens0_init (tokp); + + if (! readtokens0 (stream, tokp) || fclose (stream) != 0) +die (LS_FAILURE, 0, _("cannot read file names from %s"), + quoteaf (files_from)); + + if (! tokp->n_tok) +die (LS_FAILURE, 0, _("no input from %s"), + quoteaf (files_from)); +} + int main (int argc, char **argv) { int i; struct pending *thispend; + struct Tokens tok; + char **files; int n_files; initialize_main (, ); @@ -1727,6 +1760,23 @@ main (int argc, char **argv) clear_files (); n_files = argc - i; + files = argv + i; + + if (files_from) +{ + if (n_files > 0) +{ + error (0, 0, _("extra operand %s"), quoteaf