bug#46060: Offer ls --limit=...

2021-01-24 Thread 積丹尼 Dan Jacobson
> "PE" == Paul Eggert  writes:
PE> That argument would apply to any program, no? "cat", "diff", "sh",
PE> "node",

PE> Not sure why "ls" needs a convenience flag that would complicate the
PE> documentation and maintenance and be so rarely useful.

OK, then I'll close the bug then.





bug#46060: Offer ls --limit=...

2021-01-24 Thread Paul Eggert

On 1/23/21 1:13 PM, 積丹尼 Dan Jacobson wrote:

And any database command already has
a --limit option these days, and does not rely on a second program to
trim its output because it can't control itself. Indeed, on some remote
connections one would only want to launch one program, not two.


That argument would apply to any program, no? "cat", "diff", "sh", 
"node",


Not sure why "ls" needs a convenience flag that would complicate the 
documentation and maintenance and be so rarely useful.






bug#46048: split -n K/N loses data, sum of output files is smaller than input file.

2021-01-24 Thread Paul Eggert

On 1/24/21 8:52 AM, Pádraig Brady wrote:

-  if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0)
+  if (lseek (STDIN_FILENO, start, SEEK_SET) < 0)


Dumb question: will this handle the case where you're splitting from 
stdin and stdin is a seekable file and its initial file offset is nonzero?






bug#46048: split -n K/N loses data, sum of output files is smaller than input file.

2021-01-24 Thread Pádraig Brady

On 24/01/2021 16:52, Pádraig Brady wrote:

diff --git a/src/split.c b/src/split.c
index 0660da13f..6aa8d50e9 100644
--- a/src/split.c
+++ b/src/split.c
@@ -1001,7 +1001,7 @@ bytes_chunk_extract (uintmax_t k, uintmax_t n, char *buf, 
size_t bufsize,
   }
 else
   {
-  if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0)
+  if (lseek (STDIN_FILENO, start, SEEK_SET) < 0)
   die (EXIT_FAILURE, errno, "%s", quotef (infile));
 initial_read = SIZE_MAX;
   }


The same adjustment is needed in lines_chunk_split()
I'll add a test also.

cheers,
Pádraig






bug#46048: split -n K/N loses data, sum of output files is smaller than input file.

2021-01-24 Thread Pádraig Brady

On 23/01/2021 04:58, Paul Hirst wrote:

split --number K/N appears to lose data in, with the sum of the sizes of
the output files being smaller than the original input file by 131072 bytes.

$ split --version
split (GNU coreutils) 8.30
...

$ head -c 100 < /dev/urandom > test.dat
$ split --number=1/4 test.dat > t1
$ split --number=2/4 test.dat > t2
$ split --number=3/4 test.dat > t3
$ split --number=4/4 test.dat > t4

$ ls -l
-rw-r--r-- 1 user user  25 Jan 22 18:36 t1
-rw-r--r-- 1 user user  25 Jan 22 18:36 t2
-rw-r--r-- 1 user user  25 Jan 22 18:36 t3
-rw-r--r-- 1 user user  118928 Jan 22 18:36 t4
-rw-r--r-- 1 user user 100 Jan 22 18:33 test.dat

Surely this should not be the case?


Ugh. This functionality was broken for all files > 128KiB
due to adjustments for handling /dev/zero

$ truncate -s 100 test.dat
$ split --number=4/4 test.dat | wc -c
118928

The following patch fixes it here.
I need to do some more testing, before committing.

thanks!

diff --git a/src/split.c b/src/split.c
index 0660da13f..6aa8d50e9 100644
--- a/src/split.c
+++ b/src/split.c
@@ -1001,7 +1001,7 @@ bytes_chunk_extract (uintmax_t k, uintmax_t n, char *buf, 
size_t bufsize,
 }
   else
 {
-  if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0)
+  if (lseek (STDIN_FILENO, start, SEEK_SET) < 0)
 die (EXIT_FAILURE, errno, "%s", quotef (infile));
   initial_read = SIZE_MAX;
 }





bug#46060: Offer ls --limit=...

2021-01-24 Thread 積丹尼 Dan Jacobson
E.g.,
"What is API pagination? Some APIs, such as Contacts can return millions
of results. We obviously can't return all of them at once, so we need to
return a subset - or a page - at a time. This technique is called paging
and is common to most APIs. Paging can be implemented in many different
ways, some better than others."
Anyway, this ls command was built in the early years of computer science...





bug#46060: Offer ls --limit=...

2021-01-24 Thread 積丹尼 Dan Jacobson
Sure, it is against the https://en.wikipedia.org/wiki/Unix_philosophy, but
just like SQL has LIMIT,
and
$ unicode --help
  -m MAXCOUNT, --max=MAXCOUNT
Maximal number of codepoints to display...

Just like "we want to stop pollution at the source", not always "clean up after 
it".





bug#46060: Offer ls --limit=...

2021-01-24 Thread Erik Auerswald

Hi Dan,

On 23.01.21 22:13, 積丹尼 Dan Jacobson wrote:

I hereby propose "ls --limit=..."

$ ls --limit=1 # Would only print one result item:
A

You might say:
"Jacobson, just use "ls|sed q". Closed: Worksforme."

Ah, but I am talking about items, not lines:


You can use the ls option '-1' to print one item per line:

$ touch {a..z}
$ ls -1 | head -n8
a
b
c
d
e
f
g
h

You can use 'column' (from package "bsdmainutils" in Debian etc.)
to columnate the result:

$ ls -1 | head -n8 | column
a   b   c   d   e   f   g   h


Indeed, directories might be huge. And any database command already has
a --limit option these days, and does not rely on a second program to
trim its output because it can't control itself. Indeed, on some remote
connections one would only want to launch one program, not two. Thanks.


It might be nice not to have to create all the output that is to be
discarded, especially on remote and/or slow file systems.

The one program requirement could be fulfilled by a script or shell
function.

I am sorry if my email hinders possible acceptation of an implementation
of your suggestion, but I did want to show that there is a workaround
(adding non-GNU software to the mix, though).

Thanks,
Erik