Re: are head/tail allowed (required?) to rewind stdin

2018-04-30 Thread Stephane Chazelas
2018-04-30 16:49:34 +0100, Geoff Clare:
[...]
> Yes, but it clearly shows that this offset is intended to be honoured
> by the next utility to read from stdin, when it says:
> 
> tail -n +2 file
> (sed -n 1q; cat) < file
> [...]
> The second command is equivalent to the first only when the file is
> seekable.

True, but "cat" spec says it reads stdin, not that it reads
stdin *from the start of the file*. The line-number addresses
for "sed" are expressed in terms of "input lines" not "nth line
of *files*".

[...]
> > But would you agree that it's not what the text currently says?
> > Should we create a ticket for that?
> 
> Yes, it needs a ticket.  It may well affect a lot of utilities, so
> perhaps adding something in XCU 1.4 under STDIN would be the best
> solution.
[...]

Thanks. Though it would certainly help to have a clarification
in XCU 1.4 under STDIN, I don't think the problem is that bad.

I'd say the problem is mostly with utilities that explicitely
reference offsets within files/input.

There's also a problem with dd whose "seek" description is wrong
(it says the offset should be relative to the start of the file,
while when there's no of=file, the offset should be relative to
the current position on stdout).

Now it's true that there are a lot of cases where utility
descriptions reference "input files" instead of just "input"
which can be misleading/ambiguous when dealing with stdin.

For instance, in:

{ head -n 1 > /dev/null # skip header
  join/comm - file2
} < file1

file1 may not be sorted as the header would likely break the
sorting, but it's not a problem as we removed it. It's OK
because we skip it before feeding to join. What matters is that
the input join sees is sorted even in the input file is not
sorted itself.

Still, I don't think anyone would infer from the current text
that the behaviour is unspecified because the input files are
not sorted.

-- 
Stephane



Re: are head/tail allowed (required?) to rewind stdin

2018-04-30 Thread Geoff Clare
Stephane Chazelas  wrote, on 30 Apr 2018:
>
> 2018-04-30 15:50:10 +0100, Geoff Clare:
> > Stephane Chazelas 
> >  wrote, on 30 
> > Apr 2018:
> > > 
> > > The head/tail specifications refer to line/byte offsets as
> > > offsets within *files* as opposed to *input*.
> > > 
> > > Does it mean that:
> > > 
> > > { head -n 1; head -n 1; } < file
> > > { tail -n 1; tail -n 1; } < file
> > > 
> > > are required to print the first/last line of "file" twice
> > > (assuming "file" is seekable and is not modified between the two
> > > head/tail invocations)?
> > > 
> > > In the case of "head", I can't find any implementation that
> > > does, they all return the first line of their *input* as opposed
> > > to the first line of whatever file may be open on stdin.
> > 
> > The intended behaviour of the head example is that the first head
> > writes the first line of "file" and the second head writes the second
> > line of "file".  See XCU 1.4 under INPUT FILES.
> 
> Thanks, but that text covers where the utility shall *leave*
> stdin's position *after* it has processed its input, but not
> whether it may change  it before reading the input.

Yes, but it clearly shows that this offset is intended to be honoured
by the next utility to read from stdin, when it says:

tail -n +2 file
(sed -n 1q; cat) < file
[...]
The second command is equivalent to the first only when the file is
seekable.

> [...]
> > > However, in the case of "tail", for seekable stdin, traditional
> > > implementations used to seek to the end of the file open on
> > > stdin and look backward for the last line from there even if the
> > > initial position of stdin was past the start of that last line
> > > (it could even be past the end of the file).
> > 
> > The intention is certainly that when reading from standard input,
> > tail should not write anything that is before the initial offset of
> > standard input.
> [...]
> 
> Thanks.
> 
> But would you agree that it's not what the text currently says?
> Should we create a ticket for that?

Yes, it needs a ticket.  It may well affect a lot of utilities, so
perhaps adding something in XCU 1.4 under STDIN would be the best
solution.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: are head/tail allowed (required?) to rewind stdin

2018-04-30 Thread Stephane Chazelas
2018-04-30 15:50:10 +0100, Geoff Clare:
> Stephane Chazelas  
> wrote, on 30 Apr 2018:
> > 
> > The head/tail specifications refer to line/byte offsets as
> > offsets within *files* as opposed to *input*.
> > 
> > Does it mean that:
> > 
> > { head -n 1; head -n 1; } < file
> > { tail -n 1; tail -n 1; } < file
> > 
> > are required to print the first/last line of "file" twice
> > (assuming "file" is seekable and is not modified between the two
> > head/tail invocations)?
> > 
> > In the case of "head", I can't find any implementation that
> > does, they all return the first line of their *input* as opposed
> > to the first line of whatever file may be open on stdin.
> 
> The intended behaviour of the head example is that the first head
> writes the first line of "file" and the second head writes the second
> line of "file".  See XCU 1.4 under INPUT FILES.

Thanks, but that text covers where the utility shall *leave*
stdin's position *after* it has processed its input, but not
whether it may change  it before reading the input.

(note that it started from that unix.stackexchange.com Q 
https://unix.stackexchange.com/a/239562 where I already quote
part of the "INPUT FILES" section, but to discuss where head
leaves the position after).

In

{ tail -n 1; tail -n 1; } < file

outputting the last line twice in some implementations,
the problem is not that the first tail leaves stdin position at
the start of the last line (it doesn't, it leaves it at the end
of the last line, or possibly even further if it was already
past the end of the file in the implementations that I consider
correct)

But that the second tail then moves the position back (rewinds)
from where the first tail left it (in those implementations that
I consider incorrect).

[...]
> > However, in the case of "tail", for seekable stdin, traditional
> > implementations used to seek to the end of the file open on
> > stdin and look backward for the last line from there even if the
> > initial position of stdin was past the start of that last line
> > (it could even be past the end of the file).
> 
> The intention is certainly that when reading from standard input,
> tail should not write anything that is before the initial offset of
> standard input.
[...]

Thanks.

But would you agree that it's not what the text currently says?
Should we create a ticket for that?

-- 
Stephane



Re: are head/tail allowed (required?) to rewind stdin

2018-04-30 Thread Geoff Clare
Stephane Chazelas  wrote, on 30 Apr 2018:
> 
> The head/tail specifications refer to line/byte offsets as
> offsets within *files* as opposed to *input*.
> 
> Does it mean that:
> 
> { head -n 1; head -n 1; } < file
> { tail -n 1; tail -n 1; } < file
> 
> are required to print the first/last line of "file" twice
> (assuming "file" is seekable and is not modified between the two
> head/tail invocations)?
> 
> In the case of "head", I can't find any implementation that
> does, they all return the first line of their *input* as opposed
> to the first line of whatever file may be open on stdin.

The intended behaviour of the head example is that the first head
writes the first line of "file" and the second head writes the second
line of "file".  See XCU 1.4 under INPUT FILES.

However, I can see that text such as "The first number lines
of each input file shall be copied" for head -n is misleading in
this respect.

> However, in the case of "tail", for seekable stdin, traditional
> implementations used to seek to the end of the file open on
> stdin and look backward for the last line from there even if the
> initial position of stdin was past the start of that last line
> (it could even be past the end of the file).

The intention is certainly that when reading from standard input,
tail should not write anything that is before the initial offset of
standard input.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England