bug#46060: Offer ls --limit=...
> "PE" == Paul Eggert writes: PE> That argument would apply to any program, no? "cat", "diff", "sh", PE> "node", PE> Not sure why "ls" needs a convenience flag that would complicate the PE> documentation and maintenance and be so rarely useful. OK, then I'll close the bug then.
bug#46060: Offer ls --limit=...
On 1/23/21 1:13 PM, 積丹尼 Dan Jacobson wrote: And any database command already has a --limit option these days, and does not rely on a second program to trim its output because it can't control itself. Indeed, on some remote connections one would only want to launch one program, not two. That argument would apply to any program, no? "cat", "diff", "sh", "node", Not sure why "ls" needs a convenience flag that would complicate the documentation and maintenance and be so rarely useful.
bug#46048: split -n K/N loses data, sum of output files is smaller than input file.
On 1/24/21 8:52 AM, Pádraig Brady wrote: - if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0) + if (lseek (STDIN_FILENO, start, SEEK_SET) < 0) Dumb question: will this handle the case where you're splitting from stdin and stdin is a seekable file and its initial file offset is nonzero?
bug#46048: split -n K/N loses data, sum of output files is smaller than input file.
On 24/01/2021 16:52, Pádraig Brady wrote: diff --git a/src/split.c b/src/split.c index 0660da13f..6aa8d50e9 100644 --- a/src/split.c +++ b/src/split.c @@ -1001,7 +1001,7 @@ bytes_chunk_extract (uintmax_t k, uintmax_t n, char *buf, size_t bufsize, } else { - if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0) + if (lseek (STDIN_FILENO, start, SEEK_SET) < 0) die (EXIT_FAILURE, errno, "%s", quotef (infile)); initial_read = SIZE_MAX; } The same adjustment is needed in lines_chunk_split() I'll add a test also. cheers, Pádraig
bug#46048: split -n K/N loses data, sum of output files is smaller than input file.
On 23/01/2021 04:58, Paul Hirst wrote: split --number K/N appears to lose data in, with the sum of the sizes of the output files being smaller than the original input file by 131072 bytes. $ split --version split (GNU coreutils) 8.30 ... $ head -c 100 < /dev/urandom > test.dat $ split --number=1/4 test.dat > t1 $ split --number=2/4 test.dat > t2 $ split --number=3/4 test.dat > t3 $ split --number=4/4 test.dat > t4 $ ls -l -rw-r--r-- 1 user user 25 Jan 22 18:36 t1 -rw-r--r-- 1 user user 25 Jan 22 18:36 t2 -rw-r--r-- 1 user user 25 Jan 22 18:36 t3 -rw-r--r-- 1 user user 118928 Jan 22 18:36 t4 -rw-r--r-- 1 user user 100 Jan 22 18:33 test.dat Surely this should not be the case? Ugh. This functionality was broken for all files > 128KiB due to adjustments for handling /dev/zero $ truncate -s 100 test.dat $ split --number=4/4 test.dat | wc -c 118928 The following patch fixes it here. I need to do some more testing, before committing. thanks! diff --git a/src/split.c b/src/split.c index 0660da13f..6aa8d50e9 100644 --- a/src/split.c +++ b/src/split.c @@ -1001,7 +1001,7 @@ bytes_chunk_extract (uintmax_t k, uintmax_t n, char *buf, size_t bufsize, } else { - if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0) + if (lseek (STDIN_FILENO, start, SEEK_SET) < 0) die (EXIT_FAILURE, errno, "%s", quotef (infile)); initial_read = SIZE_MAX; }
bug#46060: Offer ls --limit=...
E.g., "What is API pagination? Some APIs, such as Contacts can return millions of results. We obviously can't return all of them at once, so we need to return a subset - or a page - at a time. This technique is called paging and is common to most APIs. Paging can be implemented in many different ways, some better than others." Anyway, this ls command was built in the early years of computer science...
bug#46060: Offer ls --limit=...
Sure, it is against the https://en.wikipedia.org/wiki/Unix_philosophy, but just like SQL has LIMIT, and $ unicode --help -m MAXCOUNT, --max=MAXCOUNT Maximal number of codepoints to display... Just like "we want to stop pollution at the source", not always "clean up after it".
bug#46060: Offer ls --limit=...
Hi Dan, On 23.01.21 22:13, 積丹尼 Dan Jacobson wrote: I hereby propose "ls --limit=..." $ ls --limit=1 # Would only print one result item: A You might say: "Jacobson, just use "ls|sed q". Closed: Worksforme." Ah, but I am talking about items, not lines: You can use the ls option '-1' to print one item per line: $ touch {a..z} $ ls -1 | head -n8 a b c d e f g h You can use 'column' (from package "bsdmainutils" in Debian etc.) to columnate the result: $ ls -1 | head -n8 | column a b c d e f g h Indeed, directories might be huge. And any database command already has a --limit option these days, and does not rely on a second program to trim its output because it can't control itself. Indeed, on some remote connections one would only want to launch one program, not two. Thanks. It might be nice not to have to create all the output that is to be discarded, especially on remote and/or slow file systems. The one program requirement could be fulfilled by a script or shell function. I am sorry if my email hinders possible acceptation of an implementation of your suggestion, but I did want to show that there is a workaround (adding non-GNU software to the mix, though). Thanks, Erik