bug#45648: `dd` seek/skip which way is up?

2022-02-24 Thread Pádraig Brady

On 22/02/2022 17:12, Paul Eggert wrote:

On 1/4/21 20:08, Paul Eggert wrote:

On 1/4/21 7:44 PM, Bela Lubkin wrote:

TLDR: *huge* existing presence of 'iseek' and 'oseek'; most OSes document
them as pure synonyms for 'skip' and 'seek'.


Thanks for doing all that research. It's compelling, and I think your
patch (or something like it) should go in. I'll wait for a bit to hear
other opinions.


After thinking about the patch a bit more, let's omit the part about
adding new conversions iseek_bytes etc., as I think there's a better way
to address that issue. I proposed something in .

So instead of your patch, I installed the attached patches. The first
one adds the iseek and oseek operands that you suggested; the second one
clarifies dd documentation, as I found several things were confusing
when rereading it carefully. Something like these patches should appear
in the next coreutils release.


+1

The aliases are useful.
I always remembered it like skIp for Input,
but that is awkward.

As for the overlap in solaris with disabling reading,
I think that would be better as a flag, like "seek_only",
if deemed useful.

thanks,
Pádraig





bug#45648: `dd` seek/skip which way is up?

2022-02-22 Thread Paul Eggert

On 1/4/21 20:08, Paul Eggert wrote:

On 1/4/21 7:44 PM, Bela Lubkin wrote:

TLDR: *huge* existing presence of 'iseek' and 'oseek'; most OSes document
them as pure synonyms for 'skip' and 'seek'.


Thanks for doing all that research. It's compelling, and I think your 
patch (or something like it) should go in. I'll wait for a bit to hear 
other opinions.


After thinking about the patch a bit more, let's omit the part about 
adding new conversions iseek_bytes etc., as I think there's a better way 
to address that issue. I proposed something in .


So instead of your patch, I installed the attached patches. The first 
one adds the iseek and oseek operands that you suggested; the second one 
clarifies dd documentation, as I found several things were confusing 
when rereading it carefully. Something like these patches should appear 
in the next coreutils release.From 6ad981900cc170258d4914197e2796fc94a37863 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Mon, 21 Feb 2022 11:23:02 -0800
Subject: [PATCH 1/2] dd: support iseek= and oseek=

Alias iseek=N to skip=N, oseek=N to seek=N (Bug#45648).
* src/dd.c (scanargs): Parse iseek= and oseek=.
* tests/dd/skip-seek.pl (sk-seek5): New test case.
---
 NEWS  |  3 +++
 doc/coreutils.texi| 16 ++--
 src/dd.c  |  8 
 tests/dd/skip-seek.pl | 10 ++
 4 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index ef65b4ab8..de03f0d47 100644
--- a/NEWS
+++ b/NEWS
@@ -57,6 +57,9 @@ GNU coreutils NEWS-*- outline -*-
   dd conv=fsync now synchronizes output even after a write error,
   and similarly for dd conv=fdatasync.
 
+  dd now supports the aliases iseek=N for skip=N, and oseek=N for seek=N,
+  like FreeBSD and other operating systems.
+
   timeout --foreground --kill-after=... will now exit with status 137
   if the kill signal was sent, which is consistent with the behavior
   when the --foreground option is not specified.  This allows users to
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 8d2974bde..4ec998802 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -9189,8 +9189,7 @@ Read from @var{file} instead of standard input.
 @item of=@var{file}
 @opindex of
 Write to @var{file} instead of standard output.  Unless
-@samp{conv=notrunc} is given, @command{dd} truncates @var{file} to zero
-bytes (or the size specified with @samp{seek=}).
+@samp{conv=notrunc} is given, truncate @var{file} before writing it.
 
 @item ibs=@var{bytes}
 @opindex ibs
@@ -9230,15 +9229,20 @@ When converting variable-length records to fixed-length ones
 use @var{bytes} as the fixed record length.
 
 @item skip=@var{n}
+@itemx iseek=@var{n}
 @opindex skip
+@opindex iseek
 Skip @var{n} @samp{ibs}-byte blocks in the input file before copying.
 If @samp{iflag=skip_bytes} is specified, @var{n} is interpreted
 as a byte count rather than a block count.
 
 @item seek=@var{n}
+@itemx oseek=@var{n}
 @opindex seek
-Skip @var{n} @samp{obs}-byte blocks in the output file before copying.
-if @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
+@opindex oseek
+Skip @var{n} @samp{obs}-byte blocks in the output file before
+truncating or copying.
+If @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
 as a byte count rather than a block count.
 
 @item count=@var{n}
@@ -9588,14 +9592,14 @@ This flag can be used only with @code{iflag}.
 
 @item skip_bytes
 @opindex skip_bytes
-Interpret the @samp{skip=} operand as a byte count,
+Interpret the @samp{skip=} or @samp{iseek=} operand as a byte count,
 rather than a block count, which allows specifying
 an offset that is not a multiple of the I/O block size.
 This flag can be used only with @code{iflag}.
 
 @item seek_bytes
 @opindex seek_bytes
-Interpret the @samp{seek=} operand as a byte count,
+Interpret the @samp{seek=} or @samp{oseek=} operand as a byte count,
 rather than a block count, which allows specifying
 an offset that is not a multiple of the I/O block size.
 This flag can be used only with @code{oflag}.
diff --git a/src/dd.c b/src/dd.c
index 7360a4973..1c30e414d 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -562,8 +562,8 @@ Copy a file, converting and formatting according to the operands.\n\
   obs=BYTES   write BYTES bytes at a time (default: 512)\n\
   of=FILE write to FILE instead of stdout\n\
   oflag=FLAGS write as per the comma separated symbol list\n\
-  seek=N  skip N obs-sized blocks at start of output\n\
-  skip=N  skip N ibs-sized blocks at start of input\n\
+  seek=N  (or oseek=N) skip N obs-sized output blocks\n\
+  skip=N  (or iseek=N) skip N ibs-sized input blocks\n\
   status=LEVELThe LEVEL of information to print to stderr;\n\
   'none' suppresses everything but error messages,\n\
   'noxfer' suppresses the final transfer statistics,\n\
@@ -1564,9 +1564,9 @@ scanargs (int argc, 

bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Paul Eggert

On 1/4/21 7:44 PM, Bela Lubkin wrote:

TLDR: *huge* existing presence of 'iseek' and 'oseek'; most OSes document
them as pure synonyms for 'skip' and 'seek'.


Thanks for doing all that research. It's compelling, and I think your 
patch (or something like it) should go in. I'll wait for a bit to hear 
other opinions.






bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Bela Lubkin
TLDR: *huge* existing presence of 'iseek' and 'oseek'; most OSes document
them as pure synonyms for 'skip' and 'seek'.



The implementation where I encountered it was SCO OpenServer.  Like
Solaris, there was a distinction between 'iseek' and 'skip' ('skip' reads,
'iseek' seeks); no distinction between 'oseek' and 'seek'.

I consulted with freebsd.org/cgi/man.cgi?query=dd -- this shows that *many*
OSes support these keywords.  The current default display is FreeBSD 12.2,
which says:

'iseek=n  Seek on the input file n blocks. This is synonymous with skip=n.'
'oseek=n  Seek on the output file n blocks. This is synonymous with seek=n.'

Identical text exists since FreeBSD 4.0 (2000-03); Darwin 5.0.1; HP-UX
11.1; NetBSD 6.0; DEC OSF/1 4.0.  These are *ancient* OSes.

IRIX 6.5.30 actually documents 'seek' as 'Identical to oseek, retained for
backward compatibility.', i.e. 'oseek' is the real flag in this man page's
mind.

The man pages from Plan 9 & Inferno 4th edition (AT research OSes)
document 'skip', 'iseek', 'oseek', but not 'seek' at all!

Regarding the actual implementation, being able to manually control seeking
vs. actually doing useless I/O does not seem useful to me in 2021.  The
distinction exist(ed) for the benefit of things like tape drives, which of
course do still exist.  But back then, information about what was or was
not seekable was poorly plumbed up from drivers to userland.  Today, it
should be clear whether a file (whatever its fundamental implementation is)
is, or is not, seekable; `dd` should always attempt to seek if possible,
slog through the corresponding I/O only if the underlying file cannot seek.

In fact, the pointed-to Open Group specification precisely supports that
position:

'skip' says, 'Skip n input blocks ... On seekable files, ... read the
blocks or seek past them; on non-seekable files, ... read and ...
[discard]';

'seek' says, 'Skip n [output] blocks ... On non-seekable files, [read]
existing blocks ...; on seekable files, ... seek ... or read ...'

i.e. 'do I/O if not seekable; implementer's choice if seekable'.

The Solaris page is the only one where there is a possible implication that
'oseek' is different from 'seek', but only because the 'oseek' description
is vestigial.  (Exact same text persists from Solaris 2.5.1 through the
11.2 pointed to above.)

Should coreutils `dd` insist that if one uses 'oseek' and the file isn't
seekable, it should fail?  This violates least surprise.  'iseek' and
'oseek' should seek if possible, read if not.  Whereas 'skip' and 'seek'
*may* seek if possible, read if not.  This distinction is uninteresting
since the implementation *should* take advantage of the *may*.

Both the Solaris and Open Group man pages describe 'seek' as 'Skip[s] n
blocks', again showing that the words are not at all bound to a particular
direction.

>Bela<

On Mon, Jan 4, 2021 at 6:06 PM Paul Eggert  wrote:

> On 1/4/21 3:07 PM, Bernhard Voelker wrote:
> >> I previously encountered a `dd` implementation which also accepted
> >> 'oseek=N' and 'iseek=N', which I found far more natural and easy to
> >> remember.
> > What 'dd' implementation was this specifically?
>
> Solaris dd has iseek and oseek. However, they are not aliases for skip
> and seek. If coreutils dd were to add these features I expect we should
> do them the Solaris way, instead of making them aliases for skip and
> seek. This would take more work than the proposed patches.
>
> https://docs.oracle.com/cd/E36784_01/html/E36871/dd-1m.html
>


bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Bernhard Voelker
On 1/5/21 3:06 AM, Paul Eggert wrote:
> On 1/4/21 3:07 PM, Bernhard Voelker wrote:
>> What 'dd' implementation was this specifically?
> 
> Solaris dd has iseek and oseek. However, they are not aliases for skip 
> and seek. If coreutils dd were to add these features I expect we should 
> do them the Solaris way, instead of making them aliases for skip and 
> seek. This would take more work than the proposed patches.
> 
> https://docs.oracle.com/cd/E36784_01/html/E36871/dd-1m.html

That would make the situation even more confusing for the user
... and more complex because such implementation would interfere
with GNU dd's seek/skip and iflag=skip_bytes and oflag=skip_bytes
functionality.  Doesn't sound like a good idea.

Have a nice day,
Berny





bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Paul Eggert

On 1/4/21 3:07 PM, Bernhard Voelker wrote:

I previously encountered a `dd` implementation which also accepted
'oseek=N' and 'iseek=N', which I found far more natural and easy to
remember.

What 'dd' implementation was this specifically?


Solaris dd has iseek and oseek. However, they are not aliases for skip 
and seek. If coreutils dd were to add these features I expect we should 
do them the Solaris way, instead of making them aliases for skip and 
seek. This would take more work than the proposed patches.


https://docs.oracle.com/cd/E36784_01/html/E36871/dd-1m.html





bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Bernhard Voelker
On 1/4/21 4:03 AM, Bela Lubkin wrote:
> I constantly confuse 'seek=N' and 'skip=N'.  The two words have no natural
> affinity to one I/O direction or the other.

While the words 'seek' and 'skip' may not be strong enough for everyone
to be clear about whether they apply on input or output - e.g. for non-native
English speaker like myself - they are well documented in usage() and more 
places:

  $ dd --help | grep -E ' (skip|seek)=N '
seek=N  skip N obs-sized blocks at start of output
skip=N  skip N ibs-sized blocks at start of input

FWIW these terms are required by POSIX:

  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html

> I previously encountered a `dd` implementation which also accepted
> 'oseek=N' and 'iseek=N', which I found far more natural and easy to
> remember.

What 'dd' implementation was this specifically?

> Here is a small patch implementing the same for coreutils `dd`.

In my opinion: if the word chosen for an option is not clear enough
to distinguish from another one, then adding yet another alias would
just increase confusion.

Adding options to coreutils programs has to be carefully chosen.
The only reason I'd see to add such an alias would be existing
behavior in one of the other major implementations.

Have a nice day,
Berny





bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Andreas Schwab
On Jan 03 2021, Bela Lubkin wrote:

> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
> index e9dd21c4e..417857c5e 100644
> --- a/doc/coreutils.texi
> +++ b/doc/coreutils.texi
> @@ -9100,6 +9100,15 @@ Skip @var{n} @samp{obs}-byte blocks in the output
> file before copying.
>  if @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
>  as a byte count rather than a block count.
>
> +@item oseek
> +@item iseek

The second @item needs to be @itemx.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





bug#45648: `dd` seek/skip which way is up?

2021-01-03 Thread Bela Lubkin
Hello --

I constantly confuse 'seek=N' and 'skip=N'.  The two words have no natural
affinity to one I/O direction or the other.

I previously encountered a `dd` implementation which also accepted
'oseek=N' and 'iseek=N', which I found far more natural and easy to
remember.

Here is a small patch implementing the same for coreutils `dd`.  Patch is
against just-gotten git tree; `dd --version` reports 'dd (coreutils)
8.32.101-ebf2c-dirty'.  (I probably got the .texi formatting wrong; please
repair as needed.)

While in the area, I slightly improved some of the help (and therefore man
page).

>Bela<



diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index e9dd21c4e..417857c5e 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -9100,6 +9100,15 @@ Skip @var{n} @samp{obs}-byte blocks in the output
file before copying.
 if @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
 as a byte count rather than a block count.

+@item oseek
+@item iseek
+@opindex oseek
+@opindex iseek
+As the distinction between @samp{seek} and @samp{skip}
+is easily confused, @samp{oseek} is accepted as an alias
+for @samp{seek}; @samp{iseek} for @samp{skip}.
+Do not use these in scripts, as this reduces compatibility.
+
 @item count=@var{n}
 @opindex count
 Copy @var{n} @samp{ibs}-byte blocks from the input file, instead
@@ -9457,6 +9466,15 @@ rather than a block count, which allows specifying
 an offset that is not a multiple of the I/O block size.
 This flag can be used only with @code{oflag}.

+@item oseek_bytes
+@item iseek_bytes
+@opindex oseek_bytes
+@opindex iseek_bytes
+As the distinction between @samp{seek_bytes} and @samp{skip_bytes}
+is easily confused, @samp{oseek_bytes} is accepted as an alias
+for @samp{seek_bytes}; @samp{iseek_bytes} for @samp{skip_bytes}.
+Do not use these in scripts, as this reduces compatibility.
+
 @end table

 These flags are not supported on all systems, and @samp{dd} rejects
diff --git a/src/dd.c b/src/dd.c
index 9152a2550..a187522c2 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -381,7 +381,9 @@ static struct symbol_value const flags[] =
   {"fullblock",   O_FULLBLOCK}, /* Accumulate full blocks from input.  */
   {"count_bytes", O_COUNT_BYTES},
   {"skip_bytes",  O_SKIP_BYTES},
+  {"iseek_bytes", O_SKIP_BYTES},
   {"seek_bytes",  O_SEEK_BYTES},
+  {"oseek_bytes", O_SEEK_BYTES},
   {"", 0}
 };

@@ -571,7 +573,7 @@ Copy a file, converting and formatting according to the
operands.\n\
   overrides ibs and obs\n\
   cbs=BYTES   convert BYTES bytes at a time\n\
   conv=CONVS  convert the file as per the comma separated symbol
list\n\
-  count=N copy only N input blocks\n\
+  count=N copy only N input blocks (bytes if iflag=count_bytes)\n\
   ibs=BYTES   read up to BYTES bytes at a time (default: 512)\n\
 "), stdout);
   fputs (_("\
@@ -580,8 +582,8 @@ Copy a file, converting and formatting according to the
operands.\n\
   obs=BYTES   write BYTES bytes at a time (default: 512)\n\
   of=FILE write to FILE instead of stdout\n\
   oflag=FLAGS write as per the comma separated symbol list\n\
-  seek=N  skip N obs-sized blocks at start of output\n\
-  skip=N  skip N ibs-sized blocks at start of input\n\
+  seek=N (or oseek=N)  skip N obs-sized blocks at start of output (bytes
if oflag=seek_bytes)\n\
+  skip=N (or iseek=N)  skip N ibs-sized blocks at start of input (bytes if
iflag=skip_bytes)\n\
   status=LEVELThe LEVEL of information to print to stderr;\n\
   'none' suppresses everything but error messages,\n\
   'noxfer' suppresses the final transfer statistics,\n\
@@ -660,10 +662,10 @@ Each FLAG symbol may be:\n\
 fputs (_("  count_bytes  treat 'count=N' as a byte count (iflag
only)\n\
 "), stdout);
   if (O_SKIP_BYTES)
-fputs (_("  skip_bytes  treat 'skip=N' as a byte count (iflag
only)\n\
+fputs (_("  skip_bytes (or iseek_bytes)  treat 'skip=N' as a byte
count (iflag only)\n\
 "), stdout);
   if (O_SEEK_BYTES)
-fputs (_("  seek_bytes  treat 'seek=N' as a byte count (oflag
only)\n\
+fputs (_("  seek_bytes (or oseek_bytes)  treat 'seek=N' as a byte
count (oflag only)\n\
 "), stdout);

   {
@@ -1554,9 +1556,11 @@ scanargs (int argc, char *const *argv)
   n_max = SIZE_MAX;
   conversion_blocksize = n;
 }
-  else if (operand_is (name, "skip"))
+  else if (operand_is (name, "skip") ||
+operand_is (name, "iseek"))
 skip = n;
-  else if (operand_is (name, "seek"))
+  else if (operand_is (name, "seek") ||
+   operand_is (name, "oseek"))
 seek = n;
   else if (operand_is (name, "count"))
 count = n;