bug#54112: dd seek_bytes etc. is confusing

2022-02-22 Thread Paul Eggert

On 2/22/22 09:29, Pádraig Brady wrote:

That is a more concise and direct way to achieve the same functionality.
+1

I guess we should remove docs for the other options,
but leave support there for backwards compat.


Sounds good, I installed the attached and am closing the bug report.From 155cc945db54ab541594f3a59cfe808bc9aea3fd Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Tue, 22 Feb 2022 18:27:09 -0800
Subject: [PATCH] dd: counts ending in "B" now count bytes

This implements my suggestion in Bug#54112.
* src/dd.c (usage): Document the change.
(parse_integer, scanargs): Implement the change.
Omit some now-obsolete checks for invalid flags.
* tests/dd/bytes.sh: Test the new behavior, while retaining
checks for the now-obsolete usage.
* tests/dd/nocache_eof.sh: Avoid now-obsolete usage.
---
 NEWS|   6 +++
 doc/coreutils.texi  |  53 ++-
 src/dd.c| 114 
 tests/dd/bytes.sh   |  67 ---
 tests/dd/nocache_eof.sh |   2 +-
 5 files changed, 116 insertions(+), 126 deletions(-)

diff --git a/NEWS b/NEWS
index de03f0d47..b6713bfc5 100644
--- a/NEWS
+++ b/NEWS
@@ -60,6 +60,12 @@ GNU coreutils NEWS-*- outline -*-
   dd now supports the aliases iseek=N for skip=N, and oseek=N for seek=N,
   like FreeBSD and other operating systems.
 
+  dd now counts bytes instead of blocks if a block count ends in "B".
+  For example, 'dd count=100KiB' now copies 100 KiB of data, not
+  102,400 blocks of data.  The flags count_bytes, skip_bytes and
+  seek_bytes are therefore obsolescent and are no longer documented,
+  though they still work.
+
   timeout --foreground --kill-after=... will now exit with status 137
   if the kill signal was sent, which is consistent with the behavior
   when the --foreground option is not specified.  This allows users to
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 5419c61ef..641680e11 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -9268,9 +9268,9 @@ use @var{bytes} as the fixed record length.
 @opindex skip
 @opindex iseek
 Skip @var{n} @samp{ibs}-byte blocks in the input file before copying.
-With @samp{iflag=skip_bytes}, interpret @var{n}
+If @var{n} ends in the letter @samp{B}, interpret @var{n}
 as a byte count rather than a block count.
-(The @samp{iseek=} spelling is an extension to POSIX.)
+(@samp{B} and the @samp{iseek=} spelling are GNU extensions to POSIX.)
 
 @item seek=@var{n}
 @itemx oseek=@var{n}
@@ -9278,16 +9278,17 @@ as a byte count rather than a block count.
 @opindex oseek
 Skip @var{n} @samp{obs}-byte blocks in the output file before
 truncating or copying.
-With @samp{oflag=seek_bytes}, interpret @var{n}
+If @var{n} ends in the letter @samp{B}, interpret @var{n}
 as a byte count rather than a block count.
-(The @samp{oseek=} spelling is an extension to POSIX.)
+(@samp{B} and the @samp{oseek=} spelling are GNU extensions to POSIX.)
 
 @item count=@var{n}
 @opindex count
 Copy @var{n} @samp{ibs}-byte blocks from the input file, instead
 of everything until the end of the file.
-With @samp{iflag=count_bytes}, interpret @var{n}
-as a byte count rather than a block count.
+If @var{n} ends in the letter @samp{B},
+interpret @var{n} as a byte count rather than a block count;
+this is a GNU extension to POSIX.
 If short reads occur, as could be the case
 when reading from a pipe for example, @samp{iflag=fullblock}
 ensures that @samp{count=} counts complete input blocks
@@ -9627,27 +9628,6 @@ as they may return short reads. In that case,
 this flag is needed to ensure that a @samp{count=} argument is
 interpreted as a block count rather than a count of read operations.
 
-@item count_bytes
-@opindex count_bytes
-Interpret the @samp{count=} operand as a byte count,
-rather than a block count, which allows specifying
-a length that is not a multiple of the I/O block size.
-This flag can be used only with @code{iflag}.
-
-@item skip_bytes
-@opindex skip_bytes
-Interpret the @samp{skip=} or @samp{iseek=} operand as a byte count,
-rather than a block count, which allows specifying
-an offset that is not a multiple of the I/O block size.
-This flag can be used only with @code{iflag}.
-
-@item seek_bytes
-@opindex seek_bytes
-Interpret the @samp{seek=} or @samp{oseek=} operand as a byte count,
-rather than a block count, which allows specifying
-an offset that is not a multiple of the I/O block size.
-This flag can be used only with @code{oflag}.
-
 @end table
 
 These flags are all GNU extensions to POSIX.
@@ -9680,23 +9660,22 @@ should not be too large---values larger than a few megabytes
 are generally wasteful or (as in the gigabyte..exabyte case) downright
 counterproductive or error-inducing.
 
-To process data that is at an offset or size that is not a
-multiple of the I/O@ block size, you can use the @samp{skip_bytes},
-@samp{seek_bytes} and @samp{count_bytes} flags.  Alternatively
-the traditional 

bug#54112: dd seek_bytes etc. is confusing

2022-02-22 Thread Pádraig Brady

On 22/02/2022 17:03, Paul Eggert wrote:

While looking into Bug#45648 I noticed that the GNU extensions
count_bytes, seek_bytes, and skip_bytes are confusing, and the proposed
fix to bug#45648 would make them even more confusing. To fix this
confusion, we should deprecate these options, and instead say that if
you want to use byte counts you should use a number string ending in "B".

Here's another way to put it.  Currently this:

 dd oseek=100KiB

means "seek 102,400 blocks". It should simply mean "seek 102,400 bytes",
which is what it says. And if we change oseek's meaning this way, we
don't need "oseek_bytes".

Although this is an incompatible change to GNU dd, I don't think it'll
affect real-world uses (who would use oseek in such a confusing way
now?) and overall it will be a win.


That is a more concise and direct way to achieve the same functionality.
+1

I guess we should remove docs for the other options,
but leave support there for backwards compat.

thanks,
Pádraig





bug#45648: `dd` seek/skip which way is up?

2022-02-22 Thread Paul Eggert

On 1/4/21 20:08, Paul Eggert wrote:

On 1/4/21 7:44 PM, Bela Lubkin wrote:

TLDR: *huge* existing presence of 'iseek' and 'oseek'; most OSes document
them as pure synonyms for 'skip' and 'seek'.


Thanks for doing all that research. It's compelling, and I think your 
patch (or something like it) should go in. I'll wait for a bit to hear 
other opinions.


After thinking about the patch a bit more, let's omit the part about 
adding new conversions iseek_bytes etc., as I think there's a better way 
to address that issue. I proposed something in .


So instead of your patch, I installed the attached patches. The first 
one adds the iseek and oseek operands that you suggested; the second one 
clarifies dd documentation, as I found several things were confusing 
when rereading it carefully. Something like these patches should appear 
in the next coreutils release.From 6ad981900cc170258d4914197e2796fc94a37863 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Mon, 21 Feb 2022 11:23:02 -0800
Subject: [PATCH 1/2] dd: support iseek= and oseek=

Alias iseek=N to skip=N, oseek=N to seek=N (Bug#45648).
* src/dd.c (scanargs): Parse iseek= and oseek=.
* tests/dd/skip-seek.pl (sk-seek5): New test case.
---
 NEWS  |  3 +++
 doc/coreutils.texi| 16 ++--
 src/dd.c  |  8 
 tests/dd/skip-seek.pl | 10 ++
 4 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index ef65b4ab8..de03f0d47 100644
--- a/NEWS
+++ b/NEWS
@@ -57,6 +57,9 @@ GNU coreutils NEWS-*- outline -*-
   dd conv=fsync now synchronizes output even after a write error,
   and similarly for dd conv=fdatasync.
 
+  dd now supports the aliases iseek=N for skip=N, and oseek=N for seek=N,
+  like FreeBSD and other operating systems.
+
   timeout --foreground --kill-after=... will now exit with status 137
   if the kill signal was sent, which is consistent with the behavior
   when the --foreground option is not specified.  This allows users to
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 8d2974bde..4ec998802 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -9189,8 +9189,7 @@ Read from @var{file} instead of standard input.
 @item of=@var{file}
 @opindex of
 Write to @var{file} instead of standard output.  Unless
-@samp{conv=notrunc} is given, @command{dd} truncates @var{file} to zero
-bytes (or the size specified with @samp{seek=}).
+@samp{conv=notrunc} is given, truncate @var{file} before writing it.
 
 @item ibs=@var{bytes}
 @opindex ibs
@@ -9230,15 +9229,20 @@ When converting variable-length records to fixed-length ones
 use @var{bytes} as the fixed record length.
 
 @item skip=@var{n}
+@itemx iseek=@var{n}
 @opindex skip
+@opindex iseek
 Skip @var{n} @samp{ibs}-byte blocks in the input file before copying.
 If @samp{iflag=skip_bytes} is specified, @var{n} is interpreted
 as a byte count rather than a block count.
 
 @item seek=@var{n}
+@itemx oseek=@var{n}
 @opindex seek
-Skip @var{n} @samp{obs}-byte blocks in the output file before copying.
-if @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
+@opindex oseek
+Skip @var{n} @samp{obs}-byte blocks in the output file before
+truncating or copying.
+If @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
 as a byte count rather than a block count.
 
 @item count=@var{n}
@@ -9588,14 +9592,14 @@ This flag can be used only with @code{iflag}.
 
 @item skip_bytes
 @opindex skip_bytes
-Interpret the @samp{skip=} operand as a byte count,
+Interpret the @samp{skip=} or @samp{iseek=} operand as a byte count,
 rather than a block count, which allows specifying
 an offset that is not a multiple of the I/O block size.
 This flag can be used only with @code{iflag}.
 
 @item seek_bytes
 @opindex seek_bytes
-Interpret the @samp{seek=} operand as a byte count,
+Interpret the @samp{seek=} or @samp{oseek=} operand as a byte count,
 rather than a block count, which allows specifying
 an offset that is not a multiple of the I/O block size.
 This flag can be used only with @code{oflag}.
diff --git a/src/dd.c b/src/dd.c
index 7360a4973..1c30e414d 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -562,8 +562,8 @@ Copy a file, converting and formatting according to the operands.\n\
   obs=BYTES   write BYTES bytes at a time (default: 512)\n\
   of=FILE write to FILE instead of stdout\n\
   oflag=FLAGS write as per the comma separated symbol list\n\
-  seek=N  skip N obs-sized blocks at start of output\n\
-  skip=N  skip N ibs-sized blocks at start of input\n\
+  seek=N  (or oseek=N) skip N obs-sized output blocks\n\
+  skip=N  (or iseek=N) skip N ibs-sized input blocks\n\
   status=LEVELThe LEVEL of information to print to stderr;\n\
   'none' suppresses everything but error messages,\n\
   'noxfer' suppresses the final transfer statistics,\n\
@@ -1564,9 +1564,9 @@ scanargs (int argc, 

bug#54112: dd seek_bytes etc. is confusing

2022-02-22 Thread Paul Eggert
While looking into Bug#45648 I noticed that the GNU extensions 
count_bytes, seek_bytes, and skip_bytes are confusing, and the proposed 
fix to bug#45648 would make them even more confusing. To fix this 
confusion, we should deprecate these options, and instead say that if 
you want to use byte counts you should use a number string ending in "B".


Here's another way to put it.  Currently this:

   dd oseek=100KiB

means "seek 102,400 blocks". It should simply mean "seek 102,400 bytes", 
which is what it says. And if we change oseek's meaning this way, we 
don't need "oseek_bytes".


Although this is an incompatible change to GNU dd, I don't think it'll 
affect real-world uses (who would use oseek in such a confusing way 
now?) and overall it will be a win.