bug#54112: dd seek_bytes etc. is confusing

2022-02-22 Thread Paul Eggert

On 2/22/22 09:29, Pádraig Brady wrote:

That is a more concise and direct way to achieve the same functionality.
+1

I guess we should remove docs for the other options,
but leave support there for backwards compat.


Sounds good, I installed the attached and am closing the bug report.From 155cc945db54ab541594f3a59cfe808bc9aea3fd Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Tue, 22 Feb 2022 18:27:09 -0800
Subject: [PATCH] dd: counts ending in "B" now count bytes

This implements my suggestion in Bug#54112.
* src/dd.c (usage): Document the change.
(parse_integer, scanargs): Implement the change.
Omit some now-obsolete checks for invalid flags.
* tests/dd/bytes.sh: Test the new behavior, while retaining
checks for the now-obsolete usage.
* tests/dd/nocache_eof.sh: Avoid now-obsolete usage.
---
 NEWS|   6 +++
 doc/coreutils.texi  |  53 ++-
 src/dd.c| 114 
 tests/dd/bytes.sh   |  67 ---
 tests/dd/nocache_eof.sh |   2 +-
 5 files changed, 116 insertions(+), 126 deletions(-)

diff --git a/NEWS b/NEWS
index de03f0d47..b6713bfc5 100644
--- a/NEWS
+++ b/NEWS
@@ -60,6 +60,12 @@ GNU coreutils NEWS-*- outline -*-
   dd now supports the aliases iseek=N for skip=N, and oseek=N for seek=N,
   like FreeBSD and other operating systems.
 
+  dd now counts bytes instead of blocks if a block count ends in "B".
+  For example, 'dd count=100KiB' now copies 100 KiB of data, not
+  102,400 blocks of data.  The flags count_bytes, skip_bytes and
+  seek_bytes are therefore obsolescent and are no longer documented,
+  though they still work.
+
   timeout --foreground --kill-after=... will now exit with status 137
   if the kill signal was sent, which is consistent with the behavior
   when the --foreground option is not specified.  This allows users to
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 5419c61ef..641680e11 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -9268,9 +9268,9 @@ use @var{bytes} as the fixed record length.
 @opindex skip
 @opindex iseek
 Skip @var{n} @samp{ibs}-byte blocks in the input file before copying.
-With @samp{iflag=skip_bytes}, interpret @var{n}
+If @var{n} ends in the letter @samp{B}, interpret @var{n}
 as a byte count rather than a block count.
-(The @samp{iseek=} spelling is an extension to POSIX.)
+(@samp{B} and the @samp{iseek=} spelling are GNU extensions to POSIX.)
 
 @item seek=@var{n}
 @itemx oseek=@var{n}
@@ -9278,16 +9278,17 @@ as a byte count rather than a block count.
 @opindex oseek
 Skip @var{n} @samp{obs}-byte blocks in the output file before
 truncating or copying.
-With @samp{oflag=seek_bytes}, interpret @var{n}
+If @var{n} ends in the letter @samp{B}, interpret @var{n}
 as a byte count rather than a block count.
-(The @samp{oseek=} spelling is an extension to POSIX.)
+(@samp{B} and the @samp{oseek=} spelling are GNU extensions to POSIX.)
 
 @item count=@var{n}
 @opindex count
 Copy @var{n} @samp{ibs}-byte blocks from the input file, instead
 of everything until the end of the file.
-With @samp{iflag=count_bytes}, interpret @var{n}
-as a byte count rather than a block count.
+If @var{n} ends in the letter @samp{B},
+interpret @var{n} as a byte count rather than a block count;
+this is a GNU extension to POSIX.
 If short reads occur, as could be the case
 when reading from a pipe for example, @samp{iflag=fullblock}
 ensures that @samp{count=} counts complete input blocks
@@ -9627,27 +9628,6 @@ as they may return short reads. In that case,
 this flag is needed to ensure that a @samp{count=} argument is
 interpreted as a block count rather than a count of read operations.
 
-@item count_bytes
-@opindex count_bytes
-Interpret the @samp{count=} operand as a byte count,
-rather than a block count, which allows specifying
-a length that is not a multiple of the I/O block size.
-This flag can be used only with @code{iflag}.
-
-@item skip_bytes
-@opindex skip_bytes
-Interpret the @samp{skip=} or @samp{iseek=} operand as a byte count,
-rather than a block count, which allows specifying
-an offset that is not a multiple of the I/O block size.
-This flag can be used only with @code{iflag}.
-
-@item seek_bytes
-@opindex seek_bytes
-Interpret the @samp{seek=} or @samp{oseek=} operand as a byte count,
-rather than a block count, which allows specifying
-an offset that is not a multiple of the I/O block size.
-This flag can be used only with @code{oflag}.
-
 @end table
 
 These flags are all GNU extensions to POSIX.
@@ -9680,23 +9660,22 @@ should not be too large---values larger than a few megabytes
 are generally wasteful or (as in the gigabyte..exabyte case) downright
 counterproductive or error-inducing.
 
-To process data that is at an offset or size that is not a
-multiple of the I/O@ block size, you can use the @samp{skip_bytes},
-@samp{seek_bytes} and @samp{count_bytes} flags.  Alternatively
-the traditional 

bug#54112: dd seek_bytes etc. is confusing

2022-02-22 Thread Pádraig Brady

On 22/02/2022 17:03, Paul Eggert wrote:

While looking into Bug#45648 I noticed that the GNU extensions
count_bytes, seek_bytes, and skip_bytes are confusing, and the proposed
fix to bug#45648 would make them even more confusing. To fix this
confusion, we should deprecate these options, and instead say that if
you want to use byte counts you should use a number string ending in "B".

Here's another way to put it.  Currently this:

 dd oseek=100KiB

means "seek 102,400 blocks". It should simply mean "seek 102,400 bytes",
which is what it says. And if we change oseek's meaning this way, we
don't need "oseek_bytes".

Although this is an incompatible change to GNU dd, I don't think it'll
affect real-world uses (who would use oseek in such a confusing way
now?) and overall it will be a win.


That is a more concise and direct way to achieve the same functionality.
+1

I guess we should remove docs for the other options,
but leave support there for backwards compat.

thanks,
Pádraig





bug#54112: dd seek_bytes etc. is confusing

2022-02-22 Thread Paul Eggert
While looking into Bug#45648 I noticed that the GNU extensions 
count_bytes, seek_bytes, and skip_bytes are confusing, and the proposed 
fix to bug#45648 would make them even more confusing. To fix this 
confusion, we should deprecate these options, and instead say that if 
you want to use byte counts you should use a number string ending in "B".


Here's another way to put it.  Currently this:

   dd oseek=100KiB

means "seek 102,400 blocks". It should simply mean "seek 102,400 bytes", 
which is what it says. And if we change oseek's meaning this way, we 
don't need "oseek_bytes".


Although this is an incompatible change to GNU dd, I don't think it'll 
affect real-world uses (who would use oseek in such a confusing way 
now?) and overall it will be a win.