Re: are head/tail allowed (required?) to rewind stdin

2018-04-30 Thread Stephane Chazelas
2018-04-30 16:49:34 +0100, Geoff Clare:
[...]
> Yes, but it clearly shows that this offset is intended to be honoured
> by the next utility to read from stdin, when it says:
> 
> tail -n +2 file
> (sed -n 1q; cat) < file
> [...]
> The second command is equivalent to the first only when the file is
> seekable.

True, but "cat" spec says it reads stdin, not that it reads
stdin *from the start of the file*. The line-number addresses
for "sed" are expressed in terms of "input lines" not "nth line
of *files*".

[...]
> > But would you agree that it's not what the text currently says?
> > Should we create a ticket for that?
> 
> Yes, it needs a ticket.  It may well affect a lot of utilities, so
> perhaps adding something in XCU 1.4 under STDIN would be the best
> solution.
[...]

Thanks. Though it would certainly help to have a clarification
in XCU 1.4 under STDIN, I don't think the problem is that bad.

I'd say the problem is mostly with utilities that explicitely
reference offsets within files/input.

There's also a problem with dd whose "seek" description is wrong
(it says the offset should be relative to the start of the file,
while when there's no of=file, the offset should be relative to
the current position on stdout).

Now it's true that there are a lot of cases where utility
descriptions reference "input files" instead of just "input"
which can be misleading/ambiguous when dealing with stdin.

For instance, in:

{ head -n 1 > /dev/null # skip header
  join/comm - file2
} < file1

file1 may not be sorted as the header would likely break the
sorting, but it's not a problem as we removed it. It's OK
because we skip it before feeding to join. What matters is that
the input join sees is sorted even in the input file is not
sorted itself.

Still, I don't think anyone would infer from the current text
that the behaviour is unspecified because the input files are
not sorted.

-- 
Stephane



Re: are head/tail allowed (required?) to rewind stdin

2018-04-30 Thread Geoff Clare
Stephane Chazelas  wrote, on 30 Apr 2018:
>
> 2018-04-30 15:50:10 +0100, Geoff Clare:
> > Stephane Chazelas 
> >  wrote, on 30 
> > Apr 2018:
> > > 
> > > The head/tail specifications refer to line/byte offsets as
> > > offsets within *files* as opposed to *input*.
> > > 
> > > Does it mean that:
> > > 
> > > { head -n 1; head -n 1; } < file
> > > { tail -n 1; tail -n 1; } < file
> > > 
> > > are required to print the first/last line of "file" twice
> > > (assuming "file" is seekable and is not modified between the two
> > > head/tail invocations)?
> > > 
> > > In the case of "head", I can't find any implementation that
> > > does, they all return the first line of their *input* as opposed
> > > to the first line of whatever file may be open on stdin.
> > 
> > The intended behaviour of the head example is that the first head
> > writes the first line of "file" and the second head writes the second
> > line of "file".  See XCU 1.4 under INPUT FILES.
> 
> Thanks, but that text covers where the utility shall *leave*
> stdin's position *after* it has processed its input, but not
> whether it may change  it before reading the input.

Yes, but it clearly shows that this offset is intended to be honoured
by the next utility to read from stdin, when it says:

tail -n +2 file
(sed -n 1q; cat) < file
[...]
The second command is equivalent to the first only when the file is
seekable.

> [...]
> > > However, in the case of "tail", for seekable stdin, traditional
> > > implementations used to seek to the end of the file open on
> > > stdin and look backward for the last line from there even if the
> > > initial position of stdin was past the start of that last line
> > > (it could even be past the end of the file).
> > 
> > The intention is certainly that when reading from standard input,
> > tail should not write anything that is before the initial offset of
> > standard input.
> [...]
> 
> Thanks.
> 
> But would you agree that it's not what the text currently says?
> Should we create a ticket for that?

Yes, it needs a ticket.  It may well affect a lot of utilities, so
perhaps adding something in XCU 1.4 under STDIN would be the best
solution.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: are head/tail allowed (required?) to rewind stdin

2018-04-30 Thread Stephane Chazelas
2018-04-30 15:50:10 +0100, Geoff Clare:
> Stephane Chazelas  
> wrote, on 30 Apr 2018:
> > 
> > The head/tail specifications refer to line/byte offsets as
> > offsets within *files* as opposed to *input*.
> > 
> > Does it mean that:
> > 
> > { head -n 1; head -n 1; } < file
> > { tail -n 1; tail -n 1; } < file
> > 
> > are required to print the first/last line of "file" twice
> > (assuming "file" is seekable and is not modified between the two
> > head/tail invocations)?
> > 
> > In the case of "head", I can't find any implementation that
> > does, they all return the first line of their *input* as opposed
> > to the first line of whatever file may be open on stdin.
> 
> The intended behaviour of the head example is that the first head
> writes the first line of "file" and the second head writes the second
> line of "file".  See XCU 1.4 under INPUT FILES.

Thanks, but that text covers where the utility shall *leave*
stdin's position *after* it has processed its input, but not
whether it may change  it before reading the input.

(note that it started from that unix.stackexchange.com Q 
https://unix.stackexchange.com/a/239562 where I already quote
part of the "INPUT FILES" section, but to discuss where head
leaves the position after).

In

{ tail -n 1; tail -n 1; } < file

outputting the last line twice in some implementations,
the problem is not that the first tail leaves stdin position at
the start of the last line (it doesn't, it leaves it at the end
of the last line, or possibly even further if it was already
past the end of the file in the implementations that I consider
correct)

But that the second tail then moves the position back (rewinds)
from where the first tail left it (in those implementations that
I consider incorrect).

[...]
> > However, in the case of "tail", for seekable stdin, traditional
> > implementations used to seek to the end of the file open on
> > stdin and look backward for the last line from there even if the
> > initial position of stdin was past the start of that last line
> > (it could even be past the end of the file).
> 
> The intention is certainly that when reading from standard input,
> tail should not write anything that is before the initial offset of
> standard input.
[...]

Thanks.

But would you agree that it's not what the text currently says?
Should we create a ticket for that?

-- 
Stephane



Re: are head/tail allowed (required?) to rewind stdin

2018-04-30 Thread Geoff Clare
Stephane Chazelas  wrote, on 30 Apr 2018:
> 
> The head/tail specifications refer to line/byte offsets as
> offsets within *files* as opposed to *input*.
> 
> Does it mean that:
> 
> { head -n 1; head -n 1; } < file
> { tail -n 1; tail -n 1; } < file
> 
> are required to print the first/last line of "file" twice
> (assuming "file" is seekable and is not modified between the two
> head/tail invocations)?
> 
> In the case of "head", I can't find any implementation that
> does, they all return the first line of their *input* as opposed
> to the first line of whatever file may be open on stdin.

The intended behaviour of the head example is that the first head
writes the first line of "file" and the second head writes the second
line of "file".  See XCU 1.4 under INPUT FILES.

However, I can see that text such as "The first number lines
of each input file shall be copied" for head -n is misleading in
this respect.

> However, in the case of "tail", for seekable stdin, traditional
> implementations used to seek to the end of the file open on
> stdin and look backward for the last line from there even if the
> initial position of stdin was past the start of that last line
> (it could even be past the end of the file).

The intention is certainly that when reading from standard input,
tail should not write anything that is before the initial offset of
standard input.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Pushing/restoring a file descriptor for a compound command

2018-04-30 Thread Chet Ramey
On 4/27/18 9:53 PM, Martijn Dekker wrote:

> That's what I've got. Is that a sane interpretation?
> 
> It would be nice if there were something more unequivocal in the standard,
> but it seems there isn't...
> 
>> You might have more luck with bash (perhaps.)
> 
> Chet, what do you think?

There's nothing in Posix that specifies the behavior one way or another, so
it's just an incompatibility between shells. I'll take a look.

Chet


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs

2018-04-30 Thread Austin Group Bug Tracker

A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1105 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1105
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Stéphane Chazelas 
Organization:
User Reference:  
Section:awk 
Page Number: 
Line Number: 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2016-12-05 21:52 UTC
Last Modified:  2018-04-30 12:00 UTC
== 
Summary:problems with backslashes in awk strings and EREs
== 

-- 
 (0004018) stephane (reporter) - 2018-04-30 12:00
 http://austingroupbugs.net/view.php?id=1105#c4018 
-- 
re: http://austingroupbugs.net/view.php?id=1105#c3999

Yes, I came across that recently as well (though I didn't find the POSIX
spec ambiguous then) at
https://unix.stackexchange.com/questions/439752/replace-pattern-each-time-with-a-different-string-taken-from-external-file/439754#439754

I had to be use

gsub(/[&\\]/, "&", repl)

to "escape" the & and \ characters in "repl" (so it can later be used
verbatim in another sub()), and for gawk, I had to enable the POSIX mode
for that "&" to mean a literal backslash followed by the matched
string. I noticed it didn't work in heirloom awk_su3 (presumably the same
as Solaris awk), I assumed a bug.

I agree that if  the spec is not already clear there, it would be worth
addressing as part of that same bug. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2016-12-05 21:52 stephane   New Issue
2016-12-05 21:52 stephane   Name  => Stéphane Chazelas
2016-12-05 21:52 stephane   Section   => awk 
2018-04-25 22:27 McDutchie  Note Added: 0003999  
2018-04-26 08:53 joerg  Note Added: 0004000  
2018-04-30 09:59 geoffclare Note Added: 0004014  
2018-04-30 10:39 stephane   Note Added: 0004015  
2018-04-30 11:06 McDutchie  Note Added: 0004016  
2018-04-30 11:26 geoffclare Note Added: 0004017  
2018-04-30 12:00 stephane   Note Added: 0004018  
==




[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs

2018-04-30 Thread Austin Group Bug Tracker

A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1105 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1105
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Stéphane Chazelas 
Organization:
User Reference:  
Section:awk 
Page Number: 
Line Number: 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2016-12-05 21:52 UTC
Last Modified:  2018-04-30 11:26 UTC
== 
Summary:problems with backslashes in awk strings and EREs
== 

-- 
 (0004017) geoffclare (manager) - 2018-04-30 11:26
 http://austingroupbugs.net/view.php?id=1105#c4017 
-- 
Re: http://austingroupbugs.net/view.php?id=1105#c4015 We agreed with your other
points and are going to make
changes to address them.  This will include a fix for the /\./ issue (which
consequently will mean /pat/ is different from $0 ~ "pat").

/\\./ is a new one which will need further consideration. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2016-12-05 21:52 stephane   New Issue
2016-12-05 21:52 stephane   Name  => Stéphane Chazelas
2016-12-05 21:52 stephane   Section   => awk 
2018-04-25 22:27 McDutchie  Note Added: 0003999  
2018-04-26 08:53 joerg  Note Added: 0004000  
2018-04-30 09:59 geoffclare Note Added: 0004014  
2018-04-30 10:39 stephane   Note Added: 0004015  
2018-04-30 11:06 McDutchie  Note Added: 0004016  
2018-04-30 11:26 geoffclare Note Added: 0004017  
==




[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs

2018-04-30 Thread Austin Group Bug Tracker

A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1105 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1105
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Stéphane Chazelas 
Organization:
User Reference:  
Section:awk 
Page Number: 
Line Number: 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2016-12-05 21:52 UTC
Last Modified:  2018-04-30 11:06 UTC
== 
Summary:problems with backslashes in awk strings and EREs
== 

-- 
 (0004016) McDutchie (reporter) - 2018-04-30 11:06
 http://austingroupbugs.net/view.php?id=1105#c4016 
-- 
Re: Note: 0004000

I used, and observed that behaviour described in Note: 0003999 on, the
POSIX /usr/xpg4/bin/awk. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2016-12-05 21:52 stephane   New Issue
2016-12-05 21:52 stephane   Name  => Stéphane Chazelas
2016-12-05 21:52 stephane   Section   => awk 
2018-04-25 22:27 McDutchie  Note Added: 0003999  
2018-04-26 08:53 joerg  Note Added: 0004000  
2018-04-30 09:59 geoffclare Note Added: 0004014  
2018-04-30 10:39 stephane   Note Added: 0004015  
2018-04-30 11:06 McDutchie  Note Added: 0004016  
==




are head/tail allowed (required?) to rewind stdin

2018-04-30 Thread Stephane Chazelas
Hello,

The head/tail specifications refer to line/byte offsets as
offsets within *files* as opposed to *input*.

Does it mean that:

{ head -n 1; head -n 1; } < file
{ tail -n 1; tail -n 1; } < file

are required to print the first/last line of "file" twice
(assuming "file" is seekable and is not modified between the two
head/tail invocations)?

In the case of "head", I can't find any implementation that
does, they all return the first line of their *input* as opposed
to the first line of whatever file may be open on stdin.

However, in the case of "tail", for seekable stdin, traditional
implementations used to seek to the end of the file open on
stdin and look backward for the last line from there even if the
initial position of stdin was past the start of that last line
(it could even be past the end of the file).

That was fixed in GNU tail in 1995 and in ksh93's tail builtin
in 2006 (AFAICT), but not in many other implementations (I had a
vague recollection that busybox tail had been fixed as well at
some point, but either they have reverted it, or it was bad
memory on my part).

The tail of Solaris 10, FreeBSD, OpenBSD still output the last
line twice in

{ tail -n 1; tail -n 1; } < file

(but not in cat file | { tail -n 1; tail -n 1; } of course).

IMO, both head/tail without file arguments should give the head
and tail of their input and the fact that some tail
implementations may end up rewinding their stdin is an overlook
on their part. But it looks like POSIX doesn't agree with me,
though it looks like a similar overlook, (or possibly there's
text somewhere else that covers that case?).

-- 
Stephane



[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs

2018-04-30 Thread Austin Group Bug Tracker

A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1105 
== 
Reported By:stephane
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1105
Category:   Shell and Utilities
Type:   Enhancement Request
Severity:   Editorial
Priority:   normal
Status: New
Name:   Stéphane Chazelas 
Organization:
User Reference:  
Section:awk 
Page Number: 
Line Number: 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2016-12-05 21:52 UTC
Last Modified:  2018-04-30 10:39 UTC
== 
Summary:problems with backslashes in awk strings and EREs
== 

-- 
 (0004015) stephane (reporter) - 2018-04-30 10:39
 http://austingroupbugs.net/view.php?id=1105#c4015 
-- 
Re: http://austingroupbugs.net/view.php?id=1105#c4014

I must admit I don't remember how I reached that conclusion as it's been
over a year since I looked into that. But from what I read of that "item
6", in itself, it would mean that

awk '/\./'  would be unspecified and awk '/\\./' would match on literal "."
(it would yield the "\." ERE which matches a literal dot), and /.../ would
be no different from $0 ~ "...", which is not true in any awk
implementation (and on of the main points of having a /.../ so we can
express EREs less awkwardly than with "..."). 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2016-12-05 21:52 stephane   New Issue
2016-12-05 21:52 stephane   Name  => Stéphane Chazelas
2016-12-05 21:52 stephane   Section   => awk 
2018-04-25 22:27 McDutchie  Note Added: 0003999  
2018-04-26 08:53 joerg  Note Added: 0004000  
2018-04-30 09:59 geoffclare Note Added: 0004014  
2018-04-30 10:39 stephane   Note Added: 0004015  
==




Re: Minutes of the 26th April 2018 Teleconference

2018-04-30 Thread Geoff Clare
Robert Elz  wrote, on 27 Apr 2018:
>
> This is kind of odd...
> 
>   | Attendees:
>   | Nick Stoughton, USENIX, ISO/IEC JTC 1/SC 22 OR
>   | Joerg Schilling, FOKUS Fraunhofer
> [...]
> 
>   | We deferred bugs 1084, 1085 and 1100 until Joerg is on the call.
> 
> It looks as if he was. 
> 
> What caues one of those (many, not just those 3) considerations
> of bugs that have been deferred to ever get considered again?
> Clearly the stated pre-condition was not enough.

I guess we normally rely on Andrew to remind us of such things, and
he wasn't on the call.

In the case of these three, had the question arisen of whether to
return to them, I think we would still have deferred them.  That's
because last time we worked on them, Richard was the one proposing
wording changes and he wasn't on this call.

For older cases, I think they are all ones where someone volunteered
to do some "homework" as input to a later meeting, and they have not
(yet) completed it.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England