bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Paul Eggert

On 1/4/21 7:44 PM, Bela Lubkin wrote:

TLDR: *huge* existing presence of 'iseek' and 'oseek'; most OSes document
them as pure synonyms for 'skip' and 'seek'.


Thanks for doing all that research. It's compelling, and I think your 
patch (or something like it) should go in. I'll wait for a bit to hear 
other opinions.






bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Bela Lubkin
TLDR: *huge* existing presence of 'iseek' and 'oseek'; most OSes document
them as pure synonyms for 'skip' and 'seek'.



The implementation where I encountered it was SCO OpenServer.  Like
Solaris, there was a distinction between 'iseek' and 'skip' ('skip' reads,
'iseek' seeks); no distinction between 'oseek' and 'seek'.

I consulted with freebsd.org/cgi/man.cgi?query=dd -- this shows that *many*
OSes support these keywords.  The current default display is FreeBSD 12.2,
which says:

'iseek=n  Seek on the input file n blocks. This is synonymous with skip=n.'
'oseek=n  Seek on the output file n blocks. This is synonymous with seek=n.'

Identical text exists since FreeBSD 4.0 (2000-03); Darwin 5.0.1; HP-UX
11.1; NetBSD 6.0; DEC OSF/1 4.0.  These are *ancient* OSes.

IRIX 6.5.30 actually documents 'seek' as 'Identical to oseek, retained for
backward compatibility.', i.e. 'oseek' is the real flag in this man page's
mind.

The man pages from Plan 9 & Inferno 4th edition (AT research OSes)
document 'skip', 'iseek', 'oseek', but not 'seek' at all!

Regarding the actual implementation, being able to manually control seeking
vs. actually doing useless I/O does not seem useful to me in 2021.  The
distinction exist(ed) for the benefit of things like tape drives, which of
course do still exist.  But back then, information about what was or was
not seekable was poorly plumbed up from drivers to userland.  Today, it
should be clear whether a file (whatever its fundamental implementation is)
is, or is not, seekable; `dd` should always attempt to seek if possible,
slog through the corresponding I/O only if the underlying file cannot seek.

In fact, the pointed-to Open Group specification precisely supports that
position:

'skip' says, 'Skip n input blocks ... On seekable files, ... read the
blocks or seek past them; on non-seekable files, ... read and ...
[discard]';

'seek' says, 'Skip n [output] blocks ... On non-seekable files, [read]
existing blocks ...; on seekable files, ... seek ... or read ...'

i.e. 'do I/O if not seekable; implementer's choice if seekable'.

The Solaris page is the only one where there is a possible implication that
'oseek' is different from 'seek', but only because the 'oseek' description
is vestigial.  (Exact same text persists from Solaris 2.5.1 through the
11.2 pointed to above.)

Should coreutils `dd` insist that if one uses 'oseek' and the file isn't
seekable, it should fail?  This violates least surprise.  'iseek' and
'oseek' should seek if possible, read if not.  Whereas 'skip' and 'seek'
*may* seek if possible, read if not.  This distinction is uninteresting
since the implementation *should* take advantage of the *may*.

Both the Solaris and Open Group man pages describe 'seek' as 'Skip[s] n
blocks', again showing that the words are not at all bound to a particular
direction.

>Bela<

On Mon, Jan 4, 2021 at 6:06 PM Paul Eggert  wrote:

> On 1/4/21 3:07 PM, Bernhard Voelker wrote:
> >> I previously encountered a `dd` implementation which also accepted
> >> 'oseek=N' and 'iseek=N', which I found far more natural and easy to
> >> remember.
> > What 'dd' implementation was this specifically?
>
> Solaris dd has iseek and oseek. However, they are not aliases for skip
> and seek. If coreutils dd were to add these features I expect we should
> do them the Solaris way, instead of making them aliases for skip and
> seek. This would take more work than the proposed patches.
>
> https://docs.oracle.com/cd/E36784_01/html/E36871/dd-1m.html
>


bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Bernhard Voelker
On 1/5/21 3:06 AM, Paul Eggert wrote:
> On 1/4/21 3:07 PM, Bernhard Voelker wrote:
>> What 'dd' implementation was this specifically?
> 
> Solaris dd has iseek and oseek. However, they are not aliases for skip 
> and seek. If coreutils dd were to add these features I expect we should 
> do them the Solaris way, instead of making them aliases for skip and 
> seek. This would take more work than the proposed patches.
> 
> https://docs.oracle.com/cd/E36784_01/html/E36871/dd-1m.html

That would make the situation even more confusing for the user
... and more complex because such implementation would interfere
with GNU dd's seek/skip and iflag=skip_bytes and oflag=skip_bytes
functionality.  Doesn't sound like a good idea.

Have a nice day,
Berny





bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Paul Eggert

On 1/4/21 3:07 PM, Bernhard Voelker wrote:

I previously encountered a `dd` implementation which also accepted
'oseek=N' and 'iseek=N', which I found far more natural and easy to
remember.

What 'dd' implementation was this specifically?


Solaris dd has iseek and oseek. However, they are not aliases for skip 
and seek. If coreutils dd were to add these features I expect we should 
do them the Solaris way, instead of making them aliases for skip and 
seek. This would take more work than the proposed patches.


https://docs.oracle.com/cd/E36784_01/html/E36871/dd-1m.html





bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Bernhard Voelker
On 1/4/21 4:03 AM, Bela Lubkin wrote:
> I constantly confuse 'seek=N' and 'skip=N'.  The two words have no natural
> affinity to one I/O direction or the other.

While the words 'seek' and 'skip' may not be strong enough for everyone
to be clear about whether they apply on input or output - e.g. for non-native
English speaker like myself - they are well documented in usage() and more 
places:

  $ dd --help | grep -E ' (skip|seek)=N '
seek=N  skip N obs-sized blocks at start of output
skip=N  skip N ibs-sized blocks at start of input

FWIW these terms are required by POSIX:

  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html

> I previously encountered a `dd` implementation which also accepted
> 'oseek=N' and 'iseek=N', which I found far more natural and easy to
> remember.

What 'dd' implementation was this specifically?

> Here is a small patch implementing the same for coreutils `dd`.

In my opinion: if the word chosen for an option is not clear enough
to distinguish from another one, then adding yet another alias would
just increase confusion.

Adding options to coreutils programs has to be carefully chosen.
The only reason I'd see to add such an alias would be existing
behavior in one of the other major implementations.

Have a nice day,
Berny





bug#45648: `dd` seek/skip which way is up?

2021-01-04 Thread Andreas Schwab
On Jan 03 2021, Bela Lubkin wrote:

> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
> index e9dd21c4e..417857c5e 100644
> --- a/doc/coreutils.texi
> +++ b/doc/coreutils.texi
> @@ -9100,6 +9100,15 @@ Skip @var{n} @samp{obs}-byte blocks in the output
> file before copying.
>  if @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
>  as a byte count rather than a block count.
>
> +@item oseek
> +@item iseek

The second @item needs to be @itemx.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."