Re: are head/tail allowed (required?) to rewind stdin
2018-04-30 16:49:34 +0100, Geoff Clare: [...] > Yes, but it clearly shows that this offset is intended to be honoured > by the next utility to read from stdin, when it says: > > tail -n +2 file > (sed -n 1q; cat) < file > [...] > The second command is equivalent to the first only when the file is > seekable. True, but "cat" spec says it reads stdin, not that it reads stdin *from the start of the file*. The line-number addresses for "sed" are expressed in terms of "input lines" not "nth line of *files*". [...] > > But would you agree that it's not what the text currently says? > > Should we create a ticket for that? > > Yes, it needs a ticket. It may well affect a lot of utilities, so > perhaps adding something in XCU 1.4 under STDIN would be the best > solution. [...] Thanks. Though it would certainly help to have a clarification in XCU 1.4 under STDIN, I don't think the problem is that bad. I'd say the problem is mostly with utilities that explicitely reference offsets within files/input. There's also a problem with dd whose "seek" description is wrong (it says the offset should be relative to the start of the file, while when there's no of=file, the offset should be relative to the current position on stdout). Now it's true that there are a lot of cases where utility descriptions reference "input files" instead of just "input" which can be misleading/ambiguous when dealing with stdin. For instance, in: { head -n 1 > /dev/null # skip header join/comm - file2 } < file1 file1 may not be sorted as the header would likely break the sorting, but it's not a problem as we removed it. It's OK because we skip it before feeding to join. What matters is that the input join sees is sorted even in the input file is not sorted itself. Still, I don't think anyone would infer from the current text that the behaviour is unspecified because the input files are not sorted. -- Stephane
Re: are head/tail allowed (required?) to rewind stdin
Stephane Chazelaswrote, on 30 Apr 2018: > > 2018-04-30 15:50:10 +0100, Geoff Clare: > > Stephane Chazelas > > wrote, on 30 > > Apr 2018: > > > > > > The head/tail specifications refer to line/byte offsets as > > > offsets within *files* as opposed to *input*. > > > > > > Does it mean that: > > > > > > { head -n 1; head -n 1; } < file > > > { tail -n 1; tail -n 1; } < file > > > > > > are required to print the first/last line of "file" twice > > > (assuming "file" is seekable and is not modified between the two > > > head/tail invocations)? > > > > > > In the case of "head", I can't find any implementation that > > > does, they all return the first line of their *input* as opposed > > > to the first line of whatever file may be open on stdin. > > > > The intended behaviour of the head example is that the first head > > writes the first line of "file" and the second head writes the second > > line of "file". See XCU 1.4 under INPUT FILES. > > Thanks, but that text covers where the utility shall *leave* > stdin's position *after* it has processed its input, but not > whether it may change it before reading the input. Yes, but it clearly shows that this offset is intended to be honoured by the next utility to read from stdin, when it says: tail -n +2 file (sed -n 1q; cat) < file [...] The second command is equivalent to the first only when the file is seekable. > [...] > > > However, in the case of "tail", for seekable stdin, traditional > > > implementations used to seek to the end of the file open on > > > stdin and look backward for the last line from there even if the > > > initial position of stdin was past the start of that last line > > > (it could even be past the end of the file). > > > > The intention is certainly that when reading from standard input, > > tail should not write anything that is before the initial offset of > > standard input. > [...] > > Thanks. > > But would you agree that it's not what the text currently says? > Should we create a ticket for that? Yes, it needs a ticket. It may well affect a lot of utilities, so perhaps adding something in XCU 1.4 under STDIN would be the best solution. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: are head/tail allowed (required?) to rewind stdin
2018-04-30 15:50:10 +0100, Geoff Clare: > Stephane Chazelas> wrote, on 30 Apr 2018: > > > > The head/tail specifications refer to line/byte offsets as > > offsets within *files* as opposed to *input*. > > > > Does it mean that: > > > > { head -n 1; head -n 1; } < file > > { tail -n 1; tail -n 1; } < file > > > > are required to print the first/last line of "file" twice > > (assuming "file" is seekable and is not modified between the two > > head/tail invocations)? > > > > In the case of "head", I can't find any implementation that > > does, they all return the first line of their *input* as opposed > > to the first line of whatever file may be open on stdin. > > The intended behaviour of the head example is that the first head > writes the first line of "file" and the second head writes the second > line of "file". See XCU 1.4 under INPUT FILES. Thanks, but that text covers where the utility shall *leave* stdin's position *after* it has processed its input, but not whether it may change it before reading the input. (note that it started from that unix.stackexchange.com Q https://unix.stackexchange.com/a/239562 where I already quote part of the "INPUT FILES" section, but to discuss where head leaves the position after). In { tail -n 1; tail -n 1; } < file outputting the last line twice in some implementations, the problem is not that the first tail leaves stdin position at the start of the last line (it doesn't, it leaves it at the end of the last line, or possibly even further if it was already past the end of the file in the implementations that I consider correct) But that the second tail then moves the position back (rewinds) from where the first tail left it (in those implementations that I consider incorrect). [...] > > However, in the case of "tail", for seekable stdin, traditional > > implementations used to seek to the end of the file open on > > stdin and look backward for the last line from there even if the > > initial position of stdin was past the start of that last line > > (it could even be past the end of the file). > > The intention is certainly that when reading from standard input, > tail should not write anything that is before the initial offset of > standard input. [...] Thanks. But would you agree that it's not what the text currently says? Should we create a ticket for that? -- Stephane
Re: are head/tail allowed (required?) to rewind stdin
Stephane Chazelaswrote, on 30 Apr 2018: > > The head/tail specifications refer to line/byte offsets as > offsets within *files* as opposed to *input*. > > Does it mean that: > > { head -n 1; head -n 1; } < file > { tail -n 1; tail -n 1; } < file > > are required to print the first/last line of "file" twice > (assuming "file" is seekable and is not modified between the two > head/tail invocations)? > > In the case of "head", I can't find any implementation that > does, they all return the first line of their *input* as opposed > to the first line of whatever file may be open on stdin. The intended behaviour of the head example is that the first head writes the first line of "file" and the second head writes the second line of "file". See XCU 1.4 under INPUT FILES. However, I can see that text such as "The first number lines of each input file shall be copied" for head -n is misleading in this respect. > However, in the case of "tail", for seekable stdin, traditional > implementations used to seek to the end of the file open on > stdin and look backward for the last line from there even if the > initial position of stdin was past the start of that last line > (it could even be past the end of the file). The intention is certainly that when reading from standard input, tail should not write anything that is before the initial offset of standard input. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Pushing/restoring a file descriptor for a compound command
On 4/27/18 9:53 PM, Martijn Dekker wrote: > That's what I've got. Is that a sane interpretation? > > It would be nice if there were something more unequivocal in the standard, > but it seems there isn't... > >> You might have more luck with bash (perhaps.) > > Chet, what do you think? There's nothing in Posix that specifies the behavior one way or another, so it's just an incompatibility between shells. I'll take a look. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs
A NOTE has been added to this issue. == http://austingroupbugs.net/view.php?id=1105 == Reported By:stephane Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1105 Category: Shell and Utilities Type: Enhancement Request Severity: Editorial Priority: normal Status: New Name: Stéphane Chazelas Organization: User Reference: Section:awk Page Number: Line Number: Interp Status: --- Final Accepted Text: == Date Submitted: 2016-12-05 21:52 UTC Last Modified: 2018-04-30 12:00 UTC == Summary:problems with backslashes in awk strings and EREs == -- (0004018) stephane (reporter) - 2018-04-30 12:00 http://austingroupbugs.net/view.php?id=1105#c4018 -- re: http://austingroupbugs.net/view.php?id=1105#c3999 Yes, I came across that recently as well (though I didn't find the POSIX spec ambiguous then) at https://unix.stackexchange.com/questions/439752/replace-pattern-each-time-with-a-different-string-taken-from-external-file/439754#439754 I had to be use gsub(/[&\\]/, "&", repl) to "escape" the & and \ characters in "repl" (so it can later be used verbatim in another sub()), and for gawk, I had to enable the POSIX mode for that "&" to mean a literal backslash followed by the matched string. I noticed it didn't work in heirloom awk_su3 (presumably the same as Solaris awk), I assumed a bug. I agree that if the spec is not already clear there, it would be worth addressing as part of that same bug. Issue History Date ModifiedUsername FieldChange == 2016-12-05 21:52 stephane New Issue 2016-12-05 21:52 stephane Name => Stéphane Chazelas 2016-12-05 21:52 stephane Section => awk 2018-04-25 22:27 McDutchie Note Added: 0003999 2018-04-26 08:53 joerg Note Added: 0004000 2018-04-30 09:59 geoffclare Note Added: 0004014 2018-04-30 10:39 stephane Note Added: 0004015 2018-04-30 11:06 McDutchie Note Added: 0004016 2018-04-30 11:26 geoffclare Note Added: 0004017 2018-04-30 12:00 stephane Note Added: 0004018 ==
[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs
A NOTE has been added to this issue. == http://austingroupbugs.net/view.php?id=1105 == Reported By:stephane Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1105 Category: Shell and Utilities Type: Enhancement Request Severity: Editorial Priority: normal Status: New Name: Stéphane Chazelas Organization: User Reference: Section:awk Page Number: Line Number: Interp Status: --- Final Accepted Text: == Date Submitted: 2016-12-05 21:52 UTC Last Modified: 2018-04-30 11:26 UTC == Summary:problems with backslashes in awk strings and EREs == -- (0004017) geoffclare (manager) - 2018-04-30 11:26 http://austingroupbugs.net/view.php?id=1105#c4017 -- Re: http://austingroupbugs.net/view.php?id=1105#c4015 We agreed with your other points and are going to make changes to address them. This will include a fix for the /\./ issue (which consequently will mean /pat/ is different from $0 ~ "pat"). /\\./ is a new one which will need further consideration. Issue History Date ModifiedUsername FieldChange == 2016-12-05 21:52 stephane New Issue 2016-12-05 21:52 stephane Name => Stéphane Chazelas 2016-12-05 21:52 stephane Section => awk 2018-04-25 22:27 McDutchie Note Added: 0003999 2018-04-26 08:53 joerg Note Added: 0004000 2018-04-30 09:59 geoffclare Note Added: 0004014 2018-04-30 10:39 stephane Note Added: 0004015 2018-04-30 11:06 McDutchie Note Added: 0004016 2018-04-30 11:26 geoffclare Note Added: 0004017 ==
[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs
A NOTE has been added to this issue. == http://austingroupbugs.net/view.php?id=1105 == Reported By:stephane Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1105 Category: Shell and Utilities Type: Enhancement Request Severity: Editorial Priority: normal Status: New Name: Stéphane Chazelas Organization: User Reference: Section:awk Page Number: Line Number: Interp Status: --- Final Accepted Text: == Date Submitted: 2016-12-05 21:52 UTC Last Modified: 2018-04-30 11:06 UTC == Summary:problems with backslashes in awk strings and EREs == -- (0004016) McDutchie (reporter) - 2018-04-30 11:06 http://austingroupbugs.net/view.php?id=1105#c4016 -- Re: Note: 0004000 I used, and observed that behaviour described in Note: 0003999 on, the POSIX /usr/xpg4/bin/awk. Issue History Date ModifiedUsername FieldChange == 2016-12-05 21:52 stephane New Issue 2016-12-05 21:52 stephane Name => Stéphane Chazelas 2016-12-05 21:52 stephane Section => awk 2018-04-25 22:27 McDutchie Note Added: 0003999 2018-04-26 08:53 joerg Note Added: 0004000 2018-04-30 09:59 geoffclare Note Added: 0004014 2018-04-30 10:39 stephane Note Added: 0004015 2018-04-30 11:06 McDutchie Note Added: 0004016 ==
are head/tail allowed (required?) to rewind stdin
Hello, The head/tail specifications refer to line/byte offsets as offsets within *files* as opposed to *input*. Does it mean that: { head -n 1; head -n 1; } < file { tail -n 1; tail -n 1; } < file are required to print the first/last line of "file" twice (assuming "file" is seekable and is not modified between the two head/tail invocations)? In the case of "head", I can't find any implementation that does, they all return the first line of their *input* as opposed to the first line of whatever file may be open on stdin. However, in the case of "tail", for seekable stdin, traditional implementations used to seek to the end of the file open on stdin and look backward for the last line from there even if the initial position of stdin was past the start of that last line (it could even be past the end of the file). That was fixed in GNU tail in 1995 and in ksh93's tail builtin in 2006 (AFAICT), but not in many other implementations (I had a vague recollection that busybox tail had been fixed as well at some point, but either they have reverted it, or it was bad memory on my part). The tail of Solaris 10, FreeBSD, OpenBSD still output the last line twice in { tail -n 1; tail -n 1; } < file (but not in cat file | { tail -n 1; tail -n 1; } of course). IMO, both head/tail without file arguments should give the head and tail of their input and the fact that some tail implementations may end up rewinding their stdin is an overlook on their part. But it looks like POSIX doesn't agree with me, though it looks like a similar overlook, (or possibly there's text somewhere else that covers that case?). -- Stephane
[1003.1(2016)/Issue7+TC2 0001105]: problems with backslashes in awk strings and EREs
A NOTE has been added to this issue. == http://austingroupbugs.net/view.php?id=1105 == Reported By:stephane Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1105 Category: Shell and Utilities Type: Enhancement Request Severity: Editorial Priority: normal Status: New Name: Stéphane Chazelas Organization: User Reference: Section:awk Page Number: Line Number: Interp Status: --- Final Accepted Text: == Date Submitted: 2016-12-05 21:52 UTC Last Modified: 2018-04-30 10:39 UTC == Summary:problems with backslashes in awk strings and EREs == -- (0004015) stephane (reporter) - 2018-04-30 10:39 http://austingroupbugs.net/view.php?id=1105#c4015 -- Re: http://austingroupbugs.net/view.php?id=1105#c4014 I must admit I don't remember how I reached that conclusion as it's been over a year since I looked into that. But from what I read of that "item 6", in itself, it would mean that awk '/\./' would be unspecified and awk '/\\./' would match on literal "." (it would yield the "\." ERE which matches a literal dot), and /.../ would be no different from $0 ~ "...", which is not true in any awk implementation (and on of the main points of having a /.../ so we can express EREs less awkwardly than with "..."). Issue History Date ModifiedUsername FieldChange == 2016-12-05 21:52 stephane New Issue 2016-12-05 21:52 stephane Name => Stéphane Chazelas 2016-12-05 21:52 stephane Section => awk 2018-04-25 22:27 McDutchie Note Added: 0003999 2018-04-26 08:53 joerg Note Added: 0004000 2018-04-30 09:59 geoffclare Note Added: 0004014 2018-04-30 10:39 stephane Note Added: 0004015 ==
Re: Minutes of the 26th April 2018 Teleconference
Robert Elzwrote, on 27 Apr 2018: > > This is kind of odd... > > | Attendees: > | Nick Stoughton, USENIX, ISO/IEC JTC 1/SC 22 OR > | Joerg Schilling, FOKUS Fraunhofer > [...] > > | We deferred bugs 1084, 1085 and 1100 until Joerg is on the call. > > It looks as if he was. > > What caues one of those (many, not just those 3) considerations > of bugs that have been deferred to ever get considered again? > Clearly the stated pre-condition was not enough. I guess we normally rely on Andrew to remind us of such things, and he wasn't on the call. In the case of these three, had the question arisen of whether to return to them, I think we would still have deferred them. That's because last time we worked on them, Richard was the one proposing wording changes and he wasn't on this call. For older cases, I think they are all ones where someone volunteered to do some "homework" as input to a later meeting, and they have not (yet) completed it. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England