Re: Request RFC3339 format option for date utility
Date:Sun, 12 May 2024 18:59:51 -0500 From:"Andrew Pennebaker via austin-group-l at The Open Group" Message-ID: I do not send mail to @gmail addresses, so I will reply just to the list. | I would like the standard POSIX date utility to receive an option to format | timestamps with modern RFC3339 format. This is not the appropriate forum to make that happen - you'd need to get the various implementations to agree on a new option, which once common, could then be proposed to be added to the standard. I don't really see the need for that 3339 format is trivial to produce already... jacaranda$ date -u +%Y-%m-%dT%H:%M:%SZ 2024-05-13T07:20:06Z (That isn't GNU date, but I'd be a little surprised if it couldn't do the same). | The GNU date utility seems to do this poorly, using the overly elaborate | pattern "...+00:00" instead of "...Z" for UTC timezone. That's allowed by 3339, if you don't like it, or want an option to change just that, you should take that up with the maintainers of GNU date, not here. kre
Re: [Issue 8 drafts 0001798]: Must posix_getdents remember file offsets across exec?
Date:Fri, 22 Mar 2024 09:48:37 + From:"Austin Group Bug Tracker via austin-group-l at The Open Group" Message-ID: | A NOTE has been added to this issue. This comment doesn't need to be an attached note I don't think... | If we reword in terms of directory entries, I think no explicit statement | about renaming will be needed. Agreed, that was what I meant when I made the comment about rename wrt posix_getdents() - and I agree "directory entry" is better than the "file name" I suggested. kre
Re: Austin Group WEBEX +1-408-792-6300 PIN 668 216 233
Date:Tue, 12 Mar 2024 14:56:27 + From:Jonathan Wakely Message-ID: Thanks: | The emails have Content-type: multipart/mixed and the text/html part | includes the meeting time: Well, technically, the multipart/mixed is the body, and the calendar info. The body is multipart/alternative and has text/plain and text/html. As you may have surmised, I only ever read text/plain if that is present (in a multipart/alternative the content of the parts is supposed to be the same, with only the presentation varying - though this is not even close to the worst breach of that rule I have seen). | Strangely, that's missing from the text/plain part. Maybe because it's | in a that can't easily be converted to plain text, so it's | just omitted by whatever software generates the email. The info could be added to both, in the free text part, then that the table at the head of the html version got omitted wouldn't be an issue. kre ps: if nothing changes, I'll keep reading the text/plain and using the calendar attachment for the date & time. And I still think the data/time should be given in UTC rather than America/New_York which requires anyone not in North America to know their summer time rules to translate - if given in UTC one only needs to know ones own (perhaps varying) zone offset.
Re: Austin Group WEBEX +1-408-792-6300 PIN 668 216 233
Date:Tue, 12 Mar 2024 08:16:51 -0400 (EDT) From:"Single UNIX Specification via austin-group-l at The Open Group" Message-ID: <202403120816.c9d2d0b3357afd28622ef410caf1f...@opengroup.org> Something I've been meaning to ask about for ages (not exactly exciting, or I hope controversial). Twice a week (or more sometimes) messages like this are sent out: | Topic: Austin Group teleconference | --- | Audio conference information | --- | | You are invited to a WEBEX meeting. <<< | | Andrew Josey's Personal Room | https://opengroupevents.webex.com/meet/a.josey�| 668216233 | | Join by video system followed by a whole bunch of details about how to join, etc. But nowhere in the body of the message does it say when the meeting to which everyone is invited is to be held. That seems kind of lacking in an invitation. It is in the attached calendar info, if one either adds that to a calendar, or just reads it, but wouldn't it be nicer if it said something like: You are invited to a WEBEX meeting on 18-Mar-2024 at 11:00 America/New_York (I cut/pasted the actual date & time, for this particular invitation, from the calendar info.) Or even better if it gave the time in UTC, so "at 15:00 UTC" - or whatever it is this week. Note that the calls are anchored to US Eastern time (about which I have always wondered, most of the regular participants seem to be outside the US, but never mind) is irrelevant for this - a particular meeting (which is what this is about) is always at some specific UTC time, regardless of why that particular time was chosen. Whether the info goes on that line, or somewhere else, isn't important, just that the date & time of the invitation gets included in the message body, somewhere. Could that be made to happen? kre
Re: sh 'continue' shenanigans: negating
Date:Wed, 14 Feb 2024 20:15:59 -0800 (PST) From:"Roger Marquis via austin-group-l at The Open Group" Message-ID: <6sn184nr-6299-838p-qpro-03qs07401...@mx.roble.com> | Never seen a script use "!" in this way. Is it undocumented? No. That particular usage is bizarre however, and it is no surprise you've never seen it, I doubt anyone has in a real script. | Another question about this code is whether the return value would be | from "! continue" or "done". "done" is a reserved word, not any kind of command, it has no exit status (saying "return value" only makes any sense at all in functions where "return" works, and it doesn't even make much sense there). The exit status of a for (or while or until) loop (which is what the "done" is the end of) is defined to be the exit status of the last command executed in the body of the loop (the part between "do" and "done") (or 0 if no commands were ever executed in the body). In these examples ! continue (or ! break in the more recent one) is the last command executed in the body, as it was the only command there (so the only one which could possibly be executed). As long as the loop body is executed at least once (which it must be when it is "for x in y ...") then the exit status of the for command is the exit status of that ! continue (or ! break). And as the ! inverts the (logical) status of the following pipeline, and both break and continue (unless they fail for some usage error) always have an exit status of 0, the exit status of ! continue (or ! break) must be 1. kre
Re: sh 'continue' shenanigans: negating
Date:Thu, 15 Feb 2024 00:40:24 +0100 From:"Christoph Anton Mitterer via austin-group-l at The Open Group" Message-ID: <9e56d4028f077e0d5dcc2ec2448de62b400a69a3.ca...@scientia.org> | If so, then IMO strictly speaking, it doesn't say whose $? shall be set | that way. That makes no sense, there is just one '?' special parameter ($? is just the syntax by which it is accessed, not a thing itself). I suspect you're confusing exit status and the ? special param - they're not the same thing. Every utility, and the shell compound commands, have an exit status. What actually appears in ? is specified, somewhere, but it certainly is not every exit status of every command run (not even in the shell environment in which they're invoked). Return (and exit) are kind of special in how they're defined to set '?'. kre
Re: [1003.1(2008)/Issue 7 0001219]: snprintf reequirement to fail when n > INT_MAX conflicts with C
Actually, apologies - forget my previous reply - the change to the fwprintf() page (for swprintf()) did happen as the resolution of that bug specified. No idea how I looked at that (I had the page still open when I went back to it just now) and failed to see that the text had been changed. But I did. What made you believe that nothing had been done there? kre
Re: [1003.1(2008)/Issue 7 0001219]: snprintf reequirement to fail when n > INT_MAX conflicts with C
Date:Wed, 17 Jan 2024 17:54:23 -0500 From:"Rich Felker via austin-group-l at The Open Group" Message-ID: <20240117225423.gb24...@brightrain.aerifal.cx> | I went to apply the resolution of this issue to musl libc and noticed | that the corresponding issue in swprintf was never brought up or | addressed. Should I open a new issue for it or can it be fixed along | with this? Actually, I think it was, the accepted resolution contains: Change page 990 line 33924 in D2.1 from: The value of n is greater than {INT_MAX}. to: The number of wide characters requested to be written was n or more. Page 990 is in the fwprintf() page in D2.1, and line 990, is the one which says the "from" above in the paragraph: The swprintf( ) shall fail if: CX [EOVERFLOW] The value of n is greater than {INT_MAX}. So, I think it was intended that the change be applied, and it simply didn't happen. Now it has been pointed out, no more action should be required - that one should simply get fixed. The change for snprintf() simply deleted that whole error, that is the: Delete lines 30917-30918 in D2.1 (page 904). part of Note 5895 in that mantis issue. That one happened. kre
Re: sh: set -o pipefail by default
Date:Mon, 15 Jan 2024 00:13:47 -0600 From:"Daniel Santos via austin-group-l at The Open Group" Message-ID: <08afc6b7-e88f-698a-c9ad-5bdce60a7...@pobox.com> I agree with what you say, but beware: | Otherwise, I myfn() { local shell_restore=$(set +o | grep 'pipefail$'); | set -o pipefail; ; eval "$shell_restore"; } that needless optimisation attempt is n9t guaranteed to work. 'set +o' generates an implementation defined string which when executed will restore any options altered between the set, and executing the output from it, to their values at the time the set was executed. That's what you want there. However nothing guarantees that you can extract a line from that output string, and execute that - in fact the shell might just output one long set command with lots of +o and -o options in it (and -x or +x for any which have n0 long names, or just any which have a 1 letter equiv, just to make the string shorter. Or all kinds of other techniques. The NetBSD shell does it like this... $ set +o set -o default -o promptcmds -o vi -o xlock -o xtrace $ set -o pipefail $ set +o set -o default -o pipefail -o promptcmds -o vi -o xlock -o xtrace eval'ing tbe output from the first set +o would restore things to how they were before, that's the magic "set -o default" which returns all options to their shell startup values. (There's a spec for what that means, but it isn't relevant here). But if you grep for pipefail you won't find it, so your eval would end up executing nothing. So forget that attemmpted optimisation and just save, and then eval, the entire output from set +o - thhat's what is specified to work, what isn't specified is how. If your plan is to allow some other option to be changed by and persist to the caller, then yoou simply have to change that option after the eval (perhaps again, if the also needs the effect of the change). That's rare however. If you don't care about portability, then some shells offer simpler mechanisms that are much easier to use, and have the same effect. But definit;ly not standard mechanisms. kre
Re: IANA TZ / NerBSD TZ: tzalloc/tzfree and localtime_rz, mktime_z
Date:Thu, 04 Jan 2024 23:24:26 +0100 From:Steffen Nurpmeso Message-ID: <20240104222426.ai7_3Mvo@steffen%sdaoden.eu> | I was hoping for the draft; the selection list does not offer | anything but ..TC2 and it. If you want, you can submit a bug now, using any base standard that is in some way still current. It just won't get processed at all (beyond random notes being added) until the next standard is being worked on, so submitting now is kind of pointless. On the other hand, delaying may lead to a much better proposal. I in particular would like to see "struct tm" given a complete overhaul - resulting in a struct with a different name of course. And then, naturally, the interface routines that manipulate it all need redesigning (and renaming). That would be the perfect opportunity to make all the new ones thread safe, and just allow what is there now to wither away. Of course, this is not the place to do that design (and implementation) that needs to happen elsewhere, and then be spread amongst the various systems first - only then should anything happen in the standards universe. kre
Re: IANA TZ / NerBSD TZ: tzalloc/tzfree and localtime_rz, mktime_z
Date:Thu, 04 Jan 2024 00:21:45 +0100 From:"Steffen Nurpmeso via austin-group-l at The Open Group" Message-ID: <20240103232145.6dAnvvQf@steffen%sdaoden.eu> | My question: against which standard should an issue be opened? The next one, after it is issued (ie: just wait, and send in the request after the next standard is published, which is probably this year sometime) - it is far too late for new interfaces in the one currently being developed (the cutoff for those was back in August or something like that). The means, issue 9 is the earliest any new interfaces can be added. kre
Re: Fwd: Bug 1778 in Minutes of the 27th November 2023 Teleconference
Date:Fri, 8 Dec 2023 07:11:17 + From:"Andrew Josey via austin-group-l at The Open Group" Message-ID: | > In edited post-d3 line 111861: | > | >literal value of a following *and* shall prevent a | > | > should this *and* be /or/? | > | > Using *and* seems to imply that you would need to specify: | > | > \\ | > | > to use it, while /or/ should more clearly indicate the intended | > alternatives: I don't agree, it was intended to specify that the \ does both of those things - it escapes the following char = or if that char is a newline, it makes the pair vanish. That is, implementations don't get to choose which of those it should implement, and ignore the other. If the simple wording leaves that ambiguous in some way (I'm not convinced it does) then the whole sentence should be reworded (made more explicit) - just changing "and" to "or" wouldn't do it. kre
Re: A philosophical question regarding shell vars & shell built-in utilities
Date:Mon, 23 Oct 2023 11:02:10 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | do a search for "unexported" in the subject field | (which produces 114 results). I have now read all 114 - I will admit with some trepidation, as Stephane indicated that there were some messages from me, and I wondered if the state of my knowledge about all of this 7 and a half years ago was sufficient that I wouldn't now be appalled at some of what I may have written. Fortunately, with one exception not germane to anything related to the topic of my more recent message re-introducing this topic (not having remembered it was ever considered before) I couldn't find anything I said back then which I disagree with now. Unfortunately, of the 114 messages in that thread, only the first few (I didn't count how many, probably not more than a dozen or so) had any relationship at all with the topic of my recent message, before the thread branched off into a long discussion about the resolution of bug 854 (PATH searching for builtins) and then even further afield (bugs reported, or not reported, about GNU utilities). The 854 discussion is where I no longer agree with what I said then, then I indicated I could almost understand the dumb "find builtins via path searching" nonsense - now, with more appreciation of the issues, I don't accept that at all - it is a completely absurd way to specify things. I could expand upon why shells should simply consider all built-in commands to be intrinsic (or if you prefer, to always find a built-in command before going anywhere near a PATH search) another time, that's not the current issue. OK, now back to the real thing ... those messages did touch upon the issue of whether or not the built-in utilities (or perhaps just the intrinsic ones) can access unexported shell variables - but I didn't see any definite conclusion reached during that discussion (rather a difference of opinions) - and I certainly did not see any reference to anything in the standard which is intended to specify which answer is correct. But that was only the first of the questions I asked in my message of (the early hours of) Fri Oct 20 (it was still Thu Oct 19 most places). And that question was just preparatory lead up to the real issue I was seeking an answer to, one example of which was given by the example command sequence pwd; OLDPWD=/foo; OLDPWD=/bar cd /tmp; echo $OLDPWD where the question is what should be output by that final "echo" (and for this, let's all just assume that OLDPWD never contains anything which might cause different versions of echo to produce different results, replace that final 'echo $OLDPWD' by ' printf %s\\n "$OLDPWD" ' if you prefer, the two are intended to produce identical results here. That is, should that final echo output the same thing as the pwd command printed, or something different, and if different, what should that be, and why? That's first just a philosophical question (but by all means read the definition of what cd is required to do with OLDPWD to assist with that). Then whatever you believe should be done here, where in the standard is there any language that supports (or contradicts) your interpretation. Nothing related to this was in that earlier thread. Apparently there was an even earlier 2009 thread, which was much before my time, so I can't say what was discussed in that one. There's a related issue (a slight complication of that one) which applies, or not, depending upon the answer to the first question, and this one, which is where a built-in utility is required to modify a shell variable, which it also uses as part of its operation - if the consensus is that built-ins should only be able to access exported variables (as if the built-in were not built-in) and the variable that is being modified in the shell is not exported in the current environment - after the variable has been modified by the built-in utility, if its value is to be used again, should the value to be used be the one that was modified (which according to the assumed rule is not accessible) or the original (perhaps a default) value ? While pondering this message, I have also realised there's another problem with how getopts is specified (related to all of this, in a sense) which I will add as a note to bug 1784. kre kre
Re: system(NULL) overly restrictive?
Date:Mon, 23 Oct 2023 18:37:40 -0700 From:"enh via austin-group-l at The Open Group" Message-ID: | i'm assuming the intention here was "you're not a POSIX system without | a shell, so it's not possible for system(NULL) to fail to report that | a command processor is available" ... but is that true? what does | "available" mean? POSIX requires that there be a shell which can execute commands. If there isn't one, it isn't a POSIX conforming environment. That doesn't mean that the environment is useless, or that it cannot still be very similar to a POSIX environment when that makes sense - but does mean that arbitrary applications cannot assume that what POSIX says will work will always work. Beyond that there is nothing more the standard can, or should, say. It would be ludicrous for the standard to attempt to say how an implementation should indicate it is non-conforming, as an implementation that conforms needs no such method, and one that doesn't is non-conforming, and so would not have any particular reason to implement such a method (the standard is essentially irrelevant, one either conforms or does not). If implementations like to agree upon some common method that applications can use to check for specific (common) non-conformance issues, that's fine, and they can do that - but nothing about that is rational in the standard. Nor is it really appropriate to discuss here how to do that, this list is concerned with what happens (or should happen) in conforming environments. If you want to suggest changing the standard so some requirement is no longer required, that's fine, you can do that, but that's about the limit. That is here, if you wanted to suggest that to be conforming no shell should be needed, you could ask for that change (but not probably expect it to be accepted, is my guess) - but if you accept that a shell is needed for a posix conforming system, there's no point asking for a standard way to say "in the current environment there is no shell". kre
Re: A philosophical question regarding shell vars & shell built-in utilities
Date:Sat, 21 Oct 2023 17:42:50 +0100 From:Stephane Chazelas Message-ID: <20231021164250.tfuborbgdf64e...@chazelas.org> | See | news://news.gmane.io/gmane.comp.standards.posix.austin.general/12491 | from May 2016. | | (with lynx for instance) and ensuing (long) discussion, to which | you participated I beleive. Too far back for me to remember. I can't access that with any browser I have installed, which doesn't include lynx or anything similar, and haven't done anything usenet related in decades... kre
A philosophical question regarding shell vars & shell built-in utilities
While generating https://www.austingroupbugs.net/view.php?id=1778#c6550 (note 6550 to bug 1778, mostly about field splitting with the read utility, and in particular whether reading into some vars should have unspecified effects if changes to those variables could affect the field splitting behaviour - reading into, and hence changing, IFS is an obvious example) and even earlier, I started to consider what the relationship should be between shell variables, and shell built-in utilities. Utilities like read (also getopts, cd, ...) which (almost) must be built in as they are specified to alter shell variables are something of a special case, so I'll defer discussion of those until later in this message. [Aside: just "almost must be built in" for some of these, as an implementation could have some other method to allow a utility to interact with the shell, and use that to allow designated utilities to alter shell variables, or other aspects of the shell environment.] So, for now, let's just consider the "often" built in utilities, like printf, echo, test (aka '[') etc. With those, if a shell does something like unset LANG LC_ALL LC_CTYPE LC_COLLATE LC_MONETARY LC_TIME LANG=weird printf format arg arg arg Is printf allowed, required, or prohibited from doing its output as if LANG==weird ?Note that LANG here is not exported (that was part of the point of the unset) and if printf were not built in, it would have no access to the shell's internal LANG variable. But if it is builtin, it does. Is there any language in the current (or forthcoming) standard that is intended to specify this? (If anyone knows of some, please reference or quote it.) Similarly with test, and the collating sequence for the weird LANG. Note that if we were instead to do export LANG=weird printf format arg arg arg or LANG=weird printf format arg arg arg then it is clear that the exported LANG is intended (required) for printf to use (and similarly for any other utilities, built-in or not). Now we get to the issue of those utilities which are required to alter shell variables, where for consistency I think some of the answers will depend upon the answer to the question above. Let's take a particularly simple (and now clear) example first X=whatever X=something unset X In the forthcoming standard, it is clear than when this completes, X must be unset, and not have either "whatever" or "something" as its value, and must not be exported. That applies to any special built-in utility which modifies shell variables. Now let's look at a similar, but closely related (but much more complex) case X=whatever X=something . script and assume the script does X=newvalue as one of its commands (whole command, not a var-assign for something else), and that that is the sole mention of X in "script" (or perhaps it is expanded as well, but that doesn't affect its value). Since '.' is a special builtin, I believe the same rule applies, and that when the dot script completes, the shell environment should have X=newvalue as part of it, though it is less clear to me what the requirement is wrt X's export status (must be, must not be, unspecified whether ...). If we had instead unset X; X=newvalue in the script, then I think it would be clear, when the script is complete the shell environment must have X=newvalue and X must not be exported. [Aside: for anyone wanting to make exceptions in case X is readonly, then we know here it cannot be, as we are making assignments to X before running the dot script.] To make this less abstract, a more likely example perhaps PATH=/where/my/script/lives . script and "script" sets PATH to whatever I really want it to be. That might be all it does, script might be a single line containing PATH=/bin:/usr/bin (or something). There'd be no question if I instead did . /where/my/script/lives/script but I didn't, I chose to find the script using the temporary exported PATH. All of this is now (will be in POSIX Issue 8) specified for special built in utilities. In the PATH example, in both invocations, PATH must end up being what the script set it to, not whatever it had previously held, and not the value exported into the script in the first invocation (though that would be what it would be required to be if the script did not set PATH). But all that doesn't cover other utilities that are built in, which are not special built-in, like read, cd and getopts, but which do set variables. It would (or could) also cover extensions in various shells, like bash's printf's -v option (write the output into a shell variable) or its %n format specifier (next arg is a var name, which gets set to the number of bytes (or maybe chars, doesn't matter here) which have been output before that format specifier (just like printf(3)). OK, first question here, and
Re: [Issue 8 drafts 0001778]: The read utility needs field splitting updates/corrections )and a little more)
Date:Mon, 2 Oct 2023 16:20:50 + From:Austin Group Bug Tracker Message-ID: | -- | (0006507) geoffclare (manager) - 2023-10-02 16:20 | https://austingroupbugs.net/view.php?id=1778#c6507 | -- | Re https://austingroupbugs.net/view.php?id=1778#c6503 | I have changed this to be an Issue 8 draft 3 bug, as requested. Thanks, and for adding the link between 1778 and 1649. kre
Re: [Issue 8 drafts 0001649]: Field splitting is woefully under specified, and in places, simply wrong
Date:Mon, 2 Oct 2023 14:17:29 + From:"Austin Group Bug Tracker via austin-group-l at The Open Group" Message-ID: <924a973badc1b5dcc1d92d7095978...@www.austingroupbugs.net> | A NOTE has been added to this issue. | == | https://www.austingroupbugs.net/view.php?id=1649 [...] | -- | (0006501) kre (reporter) - 2023-10-02 14:17 | https://www.austingroupbugs.net/view.php?id=1649#c6501 | -- | Re https://www.austingroupbugs.net/view.php?id=1649#c6498 (a note added to | bug:1649), where it says: Apologies for that, I added that note (this one mentioned there) to 1649 when I meant to add it to 1778 instead, so I deleted this (you'll no longer find it attached to 1649) and added a new note to 1778. However if you read this stuff as delivered via e-mail, rather than from the web interface to mantis, then you should read this one (note 6501) rather than the later message (about note 6502) which was supposed to be identical - but I totally botched the way I transferred the content of the note from 6501 to 6502, so the e-mail about 6502 has a total nonsense version of the test script I used (most of the rest should be the same). The actual note (6502, on but 1778) has been edited to correct it now, but editing of notes doesn't get reported to the mailing list (nor does the removal of a note). kre
Re: bug#65659: RFC: changing printf(1) behavior on %b
Date:Sun, 3 Sep 2023 07:36:59 +0100 From:Stephane Chazelas Message-ID: <20230903063659.mzyfen4evyrnz...@chazelas.org> | though has the same limitation as my bash echo -e "$*\n\c" Yes, I know, though as nothing anywhere says what echo is supposed to do with a lone trailing \ (or in fact, a \ that is not followed by one of the defined escape sequences), I treat that as unspecified, and so anything that is produced should be acceptable - I doubt that real applications would ever do that (the way to output a \, in a version of echo that handles the escape sequences at all, is to write \\). | $ LC_ALL=zh_TW luit | $ locale title charmap | Chinese locale for Taiwan R.O.C. | BIG5 | $ echo() { printf '%b ' "$@"\\n\\c; } | $ echo 'α' | αn% That one is a different issue, and seems to me to be a simple implementation bug (and no, I am not claiming that NetBSD wouldn't act just like that) - characters ought to be fully formed before testing their values. That the encoding of some of them might happen to include a bit sequence, which in other environments, would represent a backslash, should be irrelevant. kre
Re: bug#65659: RFC: changing printf(1) behavior on %b
Date:Fri, 1 Sep 2023 07:15:14 -0500 From:"Eric Blake via austin-group-l at The Open Group" Message-ID: | > That is dependant on the current value of $IFS. You'd need: | > | > xsi_echo() ( | > IFS=' ' | > printf '%b\n' "$*" | > ) | | So yes, the standard does mention the requirement to have a sane IFS, The SysIII echo (abomination) can be done using printf %b independant of IFS: echo() { printf '%b ' "$@"\\n\\c; } works. But there is no point in defining such a function unless it is called 'echo' (the suggestion of calling it something else, then using an alias to map that to echo is simply farcical IMO) - the only point of doing this is for use in a script which is assuming echo works like that, when run on a system where it probably doesn't. Implementing unix (as in 6th edn, 7th edn, ...) echo using printf is harder, without depending upon IFS. It can be done, but is a bit messy (requires more than just one printf). kre
Re: [Issue 8 drafts 0001771]: support or reserve %q as printf-utility format specifier
Date:Sat, 2 Sep 2023 09:01:06 + From:"Austin Group Bug Tracker via austin-group-l at The Open Group" Message-ID: | If we don't deprecate %b now, the alternative is to deprecate it in Issue 9 Why? I don't mean why is that not a consequent of the condition, but why is it the only one? Why not "don't deprecate %b in printf(1) at all" ?? | Issue 9 will have an inconsistency between the printf() function and the | printf utility. Yes. And exactly why is that a problem? Has anyone seen any demand for the printf utility (printf(1)) to output binary in the 0b format? I haven't. | and add %#s in draft 4. There is already a patch for coreutils | printf, but I think we would need buy-in from at least one other printf | implementation to even consider doing that. I looked at our implementation, and while it would take more code than has been described as required for the coreutils version, it would not be a significant amount (one issue is that our code does not look at the printf(3) flags at all, simply skips them - then passes the format string to printf(3) (except for %b and one or two other weird cases that need special handling) as it was given to printf(1). Any handling of '#' (and "'" which we already support as much as our rather limited locale handling allows - that is, if it works for a C program, it will work for a sh script using printf(1) as well) is all currently done by printf(3).Not a huge change, all we need to do is actually look for it, and then in the %s case, do %b handling instead of %s handling if the # was present, but it isn't just nothing. I didn't already add it, as whatever we do with %#s I cannot see a time when %b in our printf(1) ever means anything different than it does today, whatever the standard requires. I suspect that might be true of most other implementations as well - there is simply too much application code using it to expect it to ever be changed, unless we were to force it - and as long as %b keeps on working for applications, they have no real reason to ever want to change, hence I don't really forsee a time when almost anything would use %#s if we did add it. It is different when superior functionality is replacing something inferior (like printf and echo, or fgets() and gets()) but when we would be just offering the exact same thing, with a different name, and the old one still works anyway ??? Further, I suspect it is more likely that some future version of C will find a need to define a meaning for %#s (and %S, and almost anything else they haven't already defined) than there will ever be a demand for 0b output from printf(1) via a dedicated conversion character - a more general form allowing multiple bases perhaps, but not just that. If we had to pick something as a replacement for %b, I'd be choosing %p - ignoring its printf(3) usage, which makes no sense at all in printf(1), it is more natural ("print") IMO than even %b was, and has zero chance of being usurped by the C committee (and would be easier for me to implement) kre ps: while I'm here (first time on the list for a while) apologies for my absence, my system broke, and for a whole set of weird reasons, took a long time (close to 2 months) to get repaired, so I haven't been following anything of what has been happening here until the back end of this past week (not what has been happening in NetBSD either). All my e-mail accumulated on munnari, so nothing was lost, but I am nowhere near caught up.
Re: Access to the nightly draft
Date:Tue, 27 Jun 2023 10:36:32 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | To get hold of the latest build you need gitlab access. I suspect Roland was asking for something a bit less than that (though he might accept that as an access method - I wouldn't). Better would be much more restricted than the current drafts are, generated PDF files - not the sources that make it (nor info on the number of intermediate updates that are actually made to achieve the desired changes). Not necessarily daily, just whenever a batch of changes have been applied, and are considered complete. Getting a whole new draft, with hundreds, or even thousands, of changes dumped upon us makes reviewing difficult - there's just too much to attempt (I haven't found time to even really start on draft 3 yet). But having a draft having just the past couple of days worth of changes, along with the messages on the list which indicate which changes have been applied, would make that far easier - there would be a much more limited set of pages that actually need reading, the whole thing would be nicely spread over a much longer period (further utilities to diff PDF files exist, and are usable, as long as the set of changes is not too large - once there start to be getting to be a lot, almost every page can have "changes" (perhaps just page numbers) and that method of seeing exactly what altered, quickly, is lost). kre
Re: out-of-bounds numbers in shell utility arguments
Date:Tue, 27 Jun 2023 09:41:02 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | Yes, via XCU 1.1.2; the C standard allows it for signed long, so it's | allowed for anything that 1.1.2 requires to be "equivalent to the | ISO C standard signed long data type". And of course, that means that even though the >> operator is in Table 1-2 as one that must be supported, it cannot actually work, as >> is unspecified (or even undefined, I forget) on signed values, and POSIX sh arithmetic only allows for signed values. << may have similar issues (at least some compilers are starting to complain about the use of << with a signed left operand, which I am guessing means at least some version of the C standard has made that be unspecified/undefined as well). The implementation I work with ignores that, and when an operation works better with unsigned operands, it simply treats them as unsigned instead of signed. I suspect other shells might do the same. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001731]: pthread_sigmask() pending signal requirement time paradox
Date:Wed, 14 Jun 2023 12:56:16 -0500 From:"G. Branden Robinson via austin-group-l at The Open Group" Message-ID: <20230614175616.ilqpqzpbeiipu7s7@illithid> | The question is, did thread A receive SIGINT or not? No, that isn't the question at all. That's a simple race, and irrelevant to the current discussion. | Is the current draft language therefore redundant? Can lines 59787-8 be | deleted without damaging anything? Those line numbers in which draft? In the current draft (the most recent available one, Issue 8 draft 3) those are the (whole) DESCRIPTION section of pthread_setspecific() - and something tells me that's not what you're proposing removing. In e-mail, it is generally better to quote the lines, than line numbers, that's something everyone can understand, and can know exactly which text is in question - at the minute I'm not sure what you're referring to. For large sections, unless some specific wording therein is important, it's OK to just quote the first part and the ending, we can find the whole thing in the draft that way. But do always make it clear which section (for XSH 3 and XCU 3 give the function/utility name, elsewhere the section number, and ideally its title, as numbers sometimes alter). | Thanks for emphasizing the narrow scope. I've tried to direct my reply | accordingly. Except the narrow scope related to what happens when the signal mask is changed to unblock signals that were blocked, and in particular, when one (or more) of those signals are pending. You concentrated on the exact opposite case, when blocking a signal, which has no particular issues at all. The issue here is that the current standard contains language which while clear enough about its intent, is logically absurd (it requires something to be done after, and at the same time before, something else). We could just leave it alone - no-one is going to doubt what it means. But fixing it would be better - and we now have language that does that which works. Beyond that, an APPLICATION USAGE section is being added (technically, it is already there, but just says "None" - that "None" is being replaced by other text) to explain to application writers what can happen, to avoid misunderstandings. The wording of that is the most recent topic of discussion, but that's settled now too. In both cases, naturally, unless someone else sees a problem with them. There never really was anything substantive here, it is all just wording things properly. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001731]: pthread_sigmask() pending signal requirement time paradox
Date:Tue, 13 Jun 2023 16:38:54 -0500 From:"G. Branden Robinson via austin-group-l at The Open Group" Message-ID: <20230613213854.hk3z6zzpkhdiunsk@illithid> | I apologize for the possibly academic recapitulation of multitasking, | but the key point is that the foregoing model does not require the | process to "enter the kernel" to service the signal. "enter" was perhaps a poor choice of words, for the general case, "be in" might be better. But Geoff's message, to which I was replying, said: is currently running, and is executing user code. A process in that state was the one to which I was referring. Such a process must actually enter the kernel (since it is not there at the time) in order for the kernel to deliver a signal to it. And yes, there is something about kernel design being assumed here, but that's not particularly important to anything. The issue was that even in that case, the signal is pending for some period (usually a very short period, perhaps just a few microseconds, but also possibly longer). It is never true, even in the simplest case: kill(getpid(), SIGwhatever) that the signal is not pending for some period. Even when that system call results in no context switch, and "immediately" invokes the application's SIGwhatever handler, there is a brief period, between when the signal is posted, and the handler is invoked. During that period, the signal is pending, as that is defined. This one, on a fast processor, may be for considerably less than a microsecond, but it is never zero. | There's arguably not much difference between your presentation and mine; | in mine, something special and kernelly _might_ need to happen when | returning from the signal handler, returning from the signal handler isn't the issue, it is calling it in the first place. Signals are kernel events, for the application handler to be invoked (in application space, running application code) the kernel needs to be running, to set up the application environment. While the application could have something resembling signal handlers which operate entirely without kernel assistance, those would not be actual signal handlers. | No mode switch is necessary. [when returning from a signal handler] - that's true in most cases, but not if the signal handler blocked signals as part of its invocation. In that case, some kind of call to the kernel is needed to return the signal mask to its state before the handler was invoked. | I guess the question from the POSIX perspective is whether a signal can | be pending if a process cannot observe it to be. I don't think that matters. The notion is used as a mechanism to allow the existence of signals which do not get immediately delivered to the process. | That's good. I surmise, then, that "signal | pendingness" is not a trait that POSIX needs to define, or even employ. Perhaps not, but it does. | The standard should avoid the term if using it--even just for expository | purposes--is going to provoke controversy among highly seasoned Unix | kernel engineers who are accustomed to using it with a more | implementation-specific meaning. No, there has been no controversy here about what that means (with the slight glitch when I didn't bother to look at the POSIX definition, and thought that possibly only unblocked signals were considered pending, but that was just my laziness coming through). The recent discussion has been entirely about how to write down the notion that a signal might be delivered to a process while it is executing the function that unblocks (other) signals. kre ps: SIGQUIT is not an "un-handleable signal" - the only signals that cannot be caught, are SIGKILL and SIGSTOP. There's nothing particularly special about SIGQUIT at all.
Re: [1003.1(2016/18)/Issue7+TC2 0001731]: pthread_sigmask() pending signal requirement time paradox
Date:Tue, 13 Jun 2023 09:29:52 + From:Austin Group Bug Tracker Message-ID: <5a1cedd82cfb7ca6b01a38e53243a...@austingroupbugs.net> | You don't seem to have considered the case where the thread that receives | the signal is currently running, and is executing user code. No, that's one of the shortish pending cases. | Then there is | nothing to delay the delivery - it can happen immediately after generation, No, it can't in that case, it needs to wait until the process enters the kernel for some reason. Typically if a signal is delivered to a process while it is in application mode, it will be the result of a kill() from another process running on a different CPU (anything the process does to itself, including traps, result in the process being in the kernel when the signal is delivered - the kernel side of the process is posting the signal to itself). When that happens, the other CPU (the one running the application) needs to be notified that there's an event it needs to process, which will result in that process being (temporarily) suspended and the kernel taking over. When that is done (which may be immediate, or may be later if the cpu in consideration switches to some other process) and the kernel is returning control to the application, is when the signal is delivered. In the interim period (which may be very short, or may be lengthy (in computer terms anyway)) the signal is pending. But not blocked. (Of course, that is assuming it wasn't being blocked). | Having said that, it would make sense to reword to avoid any subtle | distinctions about exactly when a signal becomes pending. I will try to | come up with something that merges parts of your suggestion with parts of | my previous attempt. Your new version looks OK to me. Still not sure the APPLICATION USAGE section is needed however. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001731]: pthread_sigmask() pending signal requirement time paradox
Date:Mon, 12 Jun 2023 15:36:31 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | Yes, I assumed some context rather than stating it. I should have | said "In particular, when a signal is generated it can become pending | for reasons other than being blocked". That's no better, particularly in light of your explanation of what "pending" means in the standard (which I was too lazy to go and check). Signals never become pending because they're blocked, nor do they become pending upon being unblocked. | I'll edit the note to make that change. Please make a somewhat different change, which just makes it clear that some other signal may have become pending while this function is running, and omit mentions of being blocked, which seem to be confusing things. That, or don't add the application usage text at all. Maybe something like "An unrelated signal may have become pending while..." kre
Re: make: -j documentation consistency enhancements
Date:Thu, 20 Apr 2023 19:26:00 -0500 From:"Andrew Pennebaker via austin-group-l at The Open Group" Message-ID: I hope you're on the list, as gmail refuses mail from me, so I cannot reply directly (I'd suggest getting a more rational e-mail provider). | As an aside, can we please generate embedded page numbers to align more | closely with the logical PDF page counter? In any case, back to the make | utility. That's unlikely to happen - what would be nice would be (if it is possible) to make the PDF page numbering match the actual file, but I have no idea how PDF files handle page numbers i ii iii (etc) which precede page 1. The page number to quote is always the one on the page itself, ignore what your PDF reader thinks it might be. When using PDF files with any recent version (issue 7 onewards) I just use the PDF index, so to find make, select XCU, then section 3 (Utilities) then just pick "make"... | First, the -j option is uniquely missing from the SYNOPSIS section. I think that was already noticed, and will be fixed. | There at line 104481, we have: | -f *makefile* | But at line 104488, we have: | -j | That is, no value. I am not sure anyone noticed that one, but yes, that is a defect and should be fixed. kre
Re: $? behaviour after comsub in same command
Date:Mon, 10 Apr 2023 10:30:08 -0400 From:Chet Ramey Message-ID: <78038281-f431-775e-6d60-a44126d1d...@case.edu> | The different semantics are that the standard specifies the status of the | simple command in terms of the command substitution that's part of the | assignment statement, so you have to hang onto it for a while. I suspect that's because you are treating the assignments (more or less) as statements of their own, and expanding and then assigning each, one by one, left to right as you encounter them. If you treated several var assigns just like they were args to commands, expanded them all (for this purpose, left to right) and then run the command - which involves putting the values to be assigned from var-assigns into the environment of the command to be run ... in this case, the null command, so that means the assignments affect the current shell environment, then there is no issue, and no real need to "hang onto it for a while". In the case where there's no command, the exit status of the last cmdsub is simply there, for the next command to use (not this one, because there are no more expansions to be made) - in the case where there is a command the command execution comes next, and the exit status from that overrides the exit status from the command substitution, before there is any possibility of the cmdsub status (for the one that might matter, or any earlier ones that might also have been executed, which are already lost) become visible, to anything, as no more expansions are happening at this point. But because the standard doesn't actually say which order these things need to be evaluated, but does say how $? is supposed to be affected, the implementations can get messy to handle all of this properly, if the implementation chooses a different way of handling the unspecified part (which really, is unspecified just because some early implementations did that). Note, that before we do any of this (var-assign, and redirect, processing) we have already expanded all the rest of the command line, the words that are not related to redirects or var-assigns, we have the command name (if any) and know if it is there at all (or not, a null command) and if it is a built-in of some kind (so whether or not we shall fork() ... create a new shell environment) or not - and if we want, when there is to be one, most of the rest (redirects and var-assigns) can be expanded in that new environment (in the child process). Or not. That's all just implementation detail (provided we don't leave any inappropriate results in the parent shell environment .. which means special care if the fork() is implemented using vfork()). kre
Re: $? behaviour after comsub in same command
Date:Fri, 7 Apr 2023 05:38:16 +0300 From:=?UTF-8?B?T8SfdXo=?= Message-ID: | a=${b#prefix} a=${a%suffix} | | is common enough a pattern to consider despite having no benefit other than | looking organized. Most shells interpret it the way average user would | expect too Most might, but it is still unspecified (and is not something I think I have ever encountered). It is trivial to fix by putting a ';' or newline between the two assignments, then it works everywhere. Why wouldn't you? And what's more, tell the authors of anyone else making this mistake that it is unspecified, and how simple it is to fix. kre ps: replying only to the list, as gmail simply bounces any messages I send to its users directly.
Re: $? behaviour after comsub in same command
Date:Fri, 07 Apr 2023 03:14:47 +0200 From:Steffen Nurpmeso Message-ID: <20230407011447.ptyvc%stef...@sdaoden.eu> | There i say I'll omit the quotes from the standard... | So everything should be handled sequentially, making it a bug. >From where do you get sequentially? I don't see that anywhere. And sequential what? And where do you see whatever that is specified? | And that is true, no? If expansion has to take place, and the | assignment has been performed, .. it has been performed? Sure, but the normal way to evaluate any command (omitting irrelevant aspects here, like redirects, etc) is to evaluate all the words (perform expansions) and then execute it. Why would evaluating var assigns be any different? Expand all the words, then execute (assign). Seems to me like the obvious (and correct) way. | So maybe null command and that is not a bug? No, I don't think it is. | But all shells except FreeBSD do this; also from the report: NetBSD too, and according to reports, dash only just changed. My guess (no more than that) is that sometimes it is easier to give in to the desires of the masses rather than maintain the correct approach. To people who don't understand sh syntax, a=1 b=2 c=3 kind of looks like 3 commands that should be executed in order as written, just like a=1 b=2 c=3 would be. But the first form isn't 3 commands, it is one. There is nothing there (except the final newline) which is a command terminator. Note here that I am not claiming that shells which do it the "other" way are non-conforming, about all the standard says is that the words need to be expanded before the assignment is performed - it doesn't say to expand all the words, then do assignments, it doesn't say expand each word and then assign, and then go on to the next, and it doesn't say which order to do the expansions or assignments (left to right, right to left, or random). That means that all of that is unspecified, and shells can do it in whatever order makes sense to them. I have my own views on what is best here, and won't be changing the NetBSD sh from how it behaves in this area. I hope FreeBSD don't change either. It alsp means that applications that use any of this unspecified behaviour, expecting some particular result, are broken, and cannot legitimately complain when some shell doesn't work the way they expect. It doesn't mean they won't, unfortunately. kre
Re: $? behaviour after comsub in same command
Date:Wed, 5 Apr 2023 10:35:58 -0400 From:"Chet Ramey via austin-group-l at The Open Group" Message-ID: | A variant with slightly different semantics: | | (exit 8) | a=4 b=$(exit 42) c=$? | echo status:$? c=$c | | The standard is clear about what $? should be for the echo, but should it | be set fron the command substitution for the assignment to c? It isn't really different semantics, it is the same thing. The exit status from the command substitution in that case is used as the exit status for the empty command that is line 2 (you're right, that is clear). But that command doesn't get to set an exit status until it finishes, and it can't do that until its associated var assigns have all been performed, which (even leaving aside the question of the order in which they, and the args for them, are processed) cannot possibly be before c=$? is expanded and assigned. Needless to say, the same (exact) set of shells which produced N:N in the example in my previous message, set c to 42, and all the rest (including the older ksh93) set c to 8 (which really is what it should be - the other possibility here would be "unspecified" as even if the exit status were to become available in the middle of evaluating the args for a command, here we don't know whether c= or b= will be evaluated first. All the standard actually says is: 4. Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value. There's nothing there about the order in which they're processed (unlike, for example, redirects, which are required to be process left to right) which makes the order implicitly unspecified. Anything is possible. But as, in any sane implementation, assigning the values to the variables should not in any way affect the values assigned to other variables in the same set of var assigns, it really should not matter the order in which they're processed, unless someone is idiotic enough to write a=1 a=2 a=3 in which case what value gets left in a is anyone's guess, and they get what they deserve. kre
Re: $? behaviour after comsub in same command
Date:Thu, 6 Apr 2023 11:17:43 -0400 From:"Chet Ramey via austin-group-l at The Open Group" Message-ID: <023c0028-e682-e1b6-99db-c8a596cdf...@case.edu> | My question is why they would choose something other than | what the so-called reference implementations (SVR4 sh, ksh88) did. Not that I was participating at the time, so I have no actual knowledge of any of this, but I get the impression that back then it was considered OK to change things if the group believed it "made it better for the users". Hence we got that absurd PATH search rule for builtins, that no shell of the time did anything like, "because a user might want to override a builtin with a version in their own bin directory, earlier in PATH than where the standard version of the command exists", or the even stupider (and fortunately, going to be gone) rule that all the normal built-in commands needed to be available in the file system (not so much so the preceding PATH rule would allow them to be overridden - that didn't work in practice anyway, but in case someone wants to "nohup cd" or "find ... -exec umask whatever". Nonsense.) This is likely more of the same - but in this case I actually agree with it - $? only gets updated when a command finishes, and only one in the current execution environment. That's clear, simple, and easy to use - otherwise using $? other than as S=$? immediately after the command whose status is of interest, becomes a total crap shoot. That's reinforced in this case, by wording that makes it clear that the only way to ever observe the exit status from a command in a command substitution (other than the command there writing the value of its $? somewhere) is to run it with a null command (just a var-assign, or redirect). That is, when there is no command, the status of the last command substitution (if any) becomes $?. That's the only way. Otherwise things like return $(true) would need to work (as an equiv of return 0 - and return $(false) for return 1 - and the standard as never required that). Given: $SHELL -c 'f() { return $( exit $1 ); }; e() { for A; do f "$A"; echo "$A:$?"; done; }; e 0 1 2 3 99' which I will unwrap to make it easier to read: $SHELL -c ' f() { return $( exit $1 ); }; e() { for A; do f "$A"; echo "$A:$?"; done; }; e 0 1 2 3 99 ' bash, zsh, and a current ksh93 (Version AJM 93u+m/1.0.4 2022-10-22) actually print N:N for all of the output lines, whereas everything else I tested, including an older ksh93 (Version AJM 93u+ 2012-08-01) and ancient pdksh, prints N:0 for everything. Since the return is effectively "return" (the command substitution doesn't output anything - if it were $( echo $1 ) instead of exit $1 things would be different) it should return with the status of the last command to finish - which here is always either 0, from the status set by the function definition for e (the very first time) or the result from the "echo" after the previous iteration, every other time. Since echo's status is (generally, and always here) 0, the return should always be "return 0"). kre ps: all this is really esoteric, and makes no real difference to any sane application.
Re: $? behaviour after comsub in same command
Date:Wed, 5 Apr 2023 18:25:32 +0300 From:"=?UTF-8?B?T8SfdXo=?= via austin-group-l at The Open Group" Message-ID: | Outliers are ash based shells; they apply | assignments concurrently but it isn't useful at all. No we don't, to do it concurrently we'd need to run multiple threads, and synchronise them carefully, and we don't... What we do is separate the process of doing the expansions from doing the execution of any command. The expansions happen first, the rest comes after. The issue here is that people tend to think of a=1 as a command. It isn't (not as people think of it anyway). But with that mindset they treat a=1 b=$a c=$b as 3 commands, one after the other. It isn't. The simple a=1 case is a null command, with a var-assign prepended. The other case a=1 b=$a c=$b is also a (single) null command, this time with 3 var-assigns prepended. If you want sequential execution, that's easy to achieve, just change a=1 b=$a c=$b into a=1;b=$a;c=$b then you have 3 null commands, each with a single var-assign, and those will be executed, in order, one at a time, just like you want the other one to be, in any shell that isn't completely broken. As reported in a later message, the "isn't useful at all" is wrong, as doing the expansions first, and then the assignments later, when it is all part of the same command (whether a null command or just var-assigns preceding any other command) means that a=$b b=$a does work to swap a and b, and doesn't require creating a new var, which in a case like t=$a a=$b b=$t command would result in placing t into the environment for command, which might be harmless, or might not be if you happened to accidentally pick the wrong temporary name to use. a=$b b=$a command doesn't do that, it just puts a and b (as desired) in the environment, and works sensibly. That it works exactly the same way when command is missing, would, I would have thought, be expected. It is the right way. We don't need two different ways to achieve a=1;b=$a;c=$b that one is quite sufficient. Just use it. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001457]: Add readlink(1), realpath(1) utility
Date:Wed, 22 Mar 2023 14:31:16 + From:"Austin Group Bug Tracker via austin-group-l at The Open Group" Message-ID: <9b820bbcf17033e4b2b83a4cd13eb...@www.austingroupbugs.net> | A NOTE has been added to this issue. | == | https://www.austingroupbugs.net/view.php?id=1457 This issue is in a state that doesn't allow ordinary mortals to add notes, so this e-mail instead. Adding -v/-q (which BSD readlink has as well, -v is the default) wouldn't help anything here (or not by itself). Changing the standard to allow an error return (since readlink is a "provide information", not a "test" utility, non-zero status is an error) without a diagnostic seems unlikely, even if -v/-q were added, -v is likely to remain the default. Changing coreutils seems like the sane solution - users who use -q are going outside the standard, and then the lack of an err message is acceptable. While it isn't up to me, I would have thought that getting an error message (by default) when readink fails is more to be expected than the other way, so I wouldn't have the change depend upon POSIXLY_CORRECT. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001640]: The rationale given for retaining "true" is nonsense.
Date:Tue, 14 Mar 2023 10:31:52 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | I think this and some other differences between ":" and "true" are | worth mentioning in the standard. I don't think that would do any harm, or is incorrect, but I'm not sure it is necessary either. Some of us recognise that true and : are (in many uses) more or less interchangeable. That doesn't mean that we need to explain why both exist, or what the differences are. It is often possible to replace grep with sed - the standard does not need to say that, or explain how, or what grep can do that is not so easy using sed. Same here. Just removing the Rationale would be enough, but I don't mind if you really believe the rest of this is needed. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001640]: The rationale given for retaining "true" is nonsense.
Thanks, I think that's all we needed to know about what it does. kre
Re: Syntax error with "command . file" (was: [1003.1(2016/18)/Issue7+TC2 0001629]: Shell vs. read(2) errors on the script)
Date:Fri, 10 Mar 2023 23:40:18 + From:"Harald van Dijk via austin-group-l at The Open Group" Message-ID: | Based on past experiences, I am assuming the e-mail this is a reply to | was meant to be sent to the list and I am quoting it in full and | replying on the list for that reason. Thanks, and yes, it was - my MUA absolutely believes in the one true meaning of Reply-To (where the author of the message to which the reply is being sent requests that replies be sent -- to addresses in that field, and no others). I need to manually override it when I choose to ignore that request and send to different addresses (which is allowed, but in general, done only with proper consideration of why). This list always directs that all replies go only to the author of the message, and never to the list itself. Irritating... | Sourcing arbitrary script fragments and having assurance that they do | not exit the shell is not reasonable, as the arbitrary script fragment | could contain an 'exit' command. Of course, deliberate exits aren't the issue, only accidental ones. | Beyond shell options and variable assignments not persisting in the | parent shell, are there any other issues you see with running them in a | subshell? The whole point of many . scripts is to alter the shell's environment, if they were just arbitrary commands, not intended to affect the current shell, they'd just be sh scripts, and run the normal way. The very act of using the '.' command more or less means "must run in the current shell). "(. file)" is silly, "file" would accomplish the same thing (if executable, otherwise "sh < file" after finding the path to file) in a more obvious way. Apart from options and variables, . files often define functions, change the umask and perhaps ulimit, and may alter the current directory, set exit (or other) traps, ... anything in fact. As an example, consider what you might put in your .profile or $ENV file - those are run in more or less the same way as a '.' file (just without the PATH search to locate the file). XRAT C.2.5.3 says almost exactly that about ENV. (Strangely though, even though .profile is mentioned several times as a place where things can be set, it doesn't appear in the standard (as something that shells process) at all - which is kind of odd really, since it is considerably older then ENV, and as best I can tell, supported by everything. The closest that we get is a mention in XRAT that "some shells" run it at startup of a login shell. Which are the other shells? That is, the ones that don't run .profile? And I don't mean in situations like bash, which prefers .bash_profile if it exists. I doubt that you'd want those scripts run in a subshell environment, I also doubt that you want the shell to exit if there's an error in one of them. How would you ever be able to log in (and start a shell) if it exited before you ever had a chance to run a command? If you can't log in, because your shell won't start, how would you ever fix the problem? As best I can tell (I have done very limited testing of this) shells tend to simply abort processing one of those scripts upon encountering an error (like a syntax error, etc - not executing "exit" - that should exit) and just go on to the next step of initialising the shell. They don't just exit because there's a syntax error - most shells report the error (not all), but I couldn't find one which exits. | You have left out bash 4 here. For the same reason I didn't include ancient versions of all the other shells either. That's obsolete, not going to change in the future, and has been replaced. [And because I happen not to have a binary of it at the minute - I could make one, I do have sources, just don't really see the need.] | I do not expect bosh to have a large user base (even if it will be wider | than mine), but as I am sure J�rg would have pointed out, the shell has | historical significance in that it is a descendant of the Bourne shell | from which POSIX shell language is also derived. So is/was ksh88, and then ksh93 ... they were just modified more. | (Although I wouldn't be | opposed to a change to POSIX to *allow* something different.) As I hinted in the note in bugid:1629 which spawned this discussion (bugnote:6200) I expect this part might need to move to "may exit" rather than "shall not exit" (away from "shall exit" which it is now, in the cases in question, not all) for a release cycle (or two) - but then again given the number, and popularity, of the shells which already don't exit in these circumstances, perhaps that won't be needed. That should be discussed further. The reason that read errors are different in this regard (at least in the main script, not in . files -- not sure it is possible to have an equivalent to a read error in "eval" - perhaps an EILSEQ (bad char encoding) in the string might count? -- and that
Re: [1003.1(2016/18)/Issue7+TC2 0001640]: The rationale given for retaining "true" is nonsense.
Date:Sun, 12 Mar 2023 16:54:34 + From:Austin Group Bug Tracker Message-ID: <0a945390fc5d0c6c366071bcd2d29...@austingroupbugs.net> | A NOTE has been added to this issue. I don't think this discussion needs to be in notes, or not unless something relevant to the actual issue itself is revealed. | GNU true accepts some --version, --help options. That is perhaps not surprising - weird though, as if true was going to need multiple versions to get it right, or add features, or that anyone needs help writing "true" ...But OK, and apart from one potential issue (later). | I don't have access to ksh93 just now but I'd expect its true to supports | those as well as --author --man --usage and many more in that vein like | most of its builtins do. Not that I can see, it appears to ignore any operands to true, just as (almost) everyone else's (and all sane) versions do. With the GNU version, what would be more interesting to know, is what it does when run as true --nonsense true -- true '--:) (-;' (and similar). What's the exit status, is there any output, and if so, to stdout or stderr? I'm also assuming that the --version and --help (to be meaningful "accepted" rather than just ignored - everyone's true allows and ignores those, and any other args given) actually produce some output. stdout or stderr? What happens (exit status etc) if there's a write error while writing that output? kre
Re: Syntax error with "command . file" (was: [1003.1(2016/18)/Issue7+TC2 0001629]: Shell vs. read(2) errors on the script)
Date:Fri, 10 Mar 2023 18:13:00 + From:"Harald van Dijk via austin-group-l at The Open Group" Message-ID: | Other shells that exit are bosh, yash, and my own. It's both what POSIX | currently requires (contrary to what kre wrote on the bug) That's not how I intended what I wrote to be interpreted, I meant exactly what you said - when I wrote "most shells are doing what shells always have, and what the standard requires", I meant "exiting". But as I wrote in my previous message, I was actually testing "command eval" rather than "command ." which I would normally expect to work about the same way in this regard, but it turns out that not shells all do. Further, and subsequent to when I sent that last message, I went and looked at my tests again - the way I do these is by (for tests like this one) composing a command line as input to one shell (each has its own xterm in my shell testing root window page - they're all tiled), then pasting it into the windows for all the others - then I can see the results from all of them, at the same time, and easily compare what happened (the command always starts $SHELL kind of like the example Geoff showed, except I do not quote that, because sometimes SHELL needs to be "bash -o posix" or similar, and I want that field split, not treated as a quoted word. For this, I tested both without, and with, "command" present ... but it turns out that somehow, for some of the shells, instead of running both tests, I managed to paste the wrong command, and ran the one without "command" twice, without noticing. That even included the NetBSD sh test, which contrary to what I said before, turns out does do the same thing for "." and "eval" in both cases (exit without command, not exit with it) which is what I had expected, before I saw the results of the incorrect test - before I noticed it was incorrect. | and what I think is probably the right thing for shells to do. I don't. I want to be able to source arbitrary script fragments, and eval arbitrary strings (there are no security issues here, the fragments and strings, are all provided by the user running the shell - anything that could be done buried one of those other ways, could simply be done as a command without subterfuge) without risking the shell exiting. Sometimes running them in a subshell works, but only sometimes. | Whether bug 1629 should introduce a significant shell consistency issue | is not separate from bug 1629. Perhaps that one, and some new one, yet to be submitted, should be considered together, but resolving 1629 the right way should not be held hostage by other ancient weirdness that might not be so easy to alter. But perhaps after all, it might be - if it is only yash, bosh and your shell not already continuing after "command . file" fails because of a syntax error, then those might not matter, and those, plus, I think, mksh and ancient pdksh (and consequently, probably ksh88 as well) for "command eval 'gibberish<;)'" failing the same way then I'd guess mksh can get changed, and the others also no longer really matter. | Bug 1629 started as trying to see what | shell authors are willing to implement. No, it started because read errors were not being handled in a rational way. A proposed solution depended upon what shell authors are willing to implement. | and I know bosh sadly isn't going to see an update anyway, Really? I thought some group of people had taken over Schilling's stuff. Whether they consider bosh worth continuing with I am not sure (it still has more important issues than this remaining in it, and I don't believe is used much, if at all). | but I would hope that authors | of the other shells also have the good sense to implement something that | makes sense to them and keep it internally consistent, There is so much in the shell already which is not internally consistent, that one more thing (particularly in an area rarely seen) would hardly be noticed, but I very much doubt that there will not be at least an attempt to alter the "what happens when there's an error which is "shall exit" detected when running a special built-in as a sub-command of "command". Syntax errors aren't the only one, command eval 'shift 0 >/' is another (redirection errors are also "shall exit" when used with a special built-in as is being done here). I expect that will probably succeed, even if we all need to make some more changes to almost never encountered parts of the shell, and most probably it won't be "we all" in any case. kre
Re: Syntax error with "command . file" (was: [1003.1(2016/18)/Issue7+TC2 0001629]: Shell vs. read(2) errors on the script)
Date:Fri, 10 Mar 2023 17:12:50 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | All of bash, ksh88, ksh93, dash and mksh reported the syntax error and | then executed the echo command (which output a non-zero number). yash and bosh don't, they simply exit. But you caught me ... the tests I did yesterday were of "command eval" which I assumed would be treated the same (I see no reason why there should be a difference), but apparently isn't in many shells (including the NetBSD one). kre
Re: [Issue 8 drafts 0001639]: Clarify minimun length requirement of "quoted" std and dst names in POSIX TZ string.
Date:Sun, 5 Mar 2023 07:44:38 + From:"Austin Group Bug Tracker via austin-group-l at The Open Group" Message-ID: | The mismatched < em > has been replaced by < /blockquote >. Thanks. | The matched < em > < /em > pairs have been replaced by the more | common < i > < /i > pairs. There I was just copying what I had seen elsewhere, and it seemed to work! kre
TZ setting of "std" and "dst" allowed characters (minor question)
This is just something I am wondering about, rather than any kind of problem, but I'd like to make sure I'm not missing something. In XBD 8.3, in the section on the TZ variable, in the case of what we generally call a "POSIX TZ string" (though in the D2.1 it is just the form that doesn't start with a ':', and in D3 will be the 2nd format) the text in D2.1 says of "std" and "dst" -- In the unquoted form, all characters in these fields shall be alphabetic characters from the portable character set in the current locale. And similarly in the quoted form (I'll just cut the relevant phrase) alphanumeric characters from the portable character set in the current locale, [...] My question is why those two say "in the current locale" - what information or restriction or special meaning is implied by those 4 words? XBD 6.1 says a lot about the Portable Character set, those characters must be present in every locale (their encodings may vary, but they must all be one byte values, and in a char variable must have positive (or 0 for nul) values. I'd have thought "alphabetic characters from the portable character set" (in the first case, and similar in the second) would be enough. I would point out that the quoted form (which is actually first in the text, though I put it second in this message) continues: the ('+') character, or the ('-') character. Those ones don't say "the ('+') character in the current locale" (or similar for '-'), which I would have thought they would need to, if those extra 4 words actually mean something. If those words have no purpose, then I'll submit a mantis issue to have them removed (they're just wasting space, and causing confusion - mine if no-one else's). On the other hand, if they are needed for something then perhaps we also need to add them with the '+' and '-' chars that are allowed in the quoted case. About the only possibilities I can think of for this, are first if locales, while being required to have include the portable character set, were permitted to not include the ascii letters as "alpha" - but XBD 7.3.1 seems to prohibit that. What's left is the possibility that a locate adds some other character, from the portable character set, as type upper or lower (and hence, alpha) eg: ESC perhaps, or maybe '<' and '>' which would make the spec ambiguous, as then one of those fields might be the quoted form or the unquoted form. I don't see that as being forbidden, as long as ESC (or whatever is added) is not include in class cntrl (or punct, or blank) - but even assuming that a locale is allowed to do that, is it really intended that if a locale were to define things that way, then ESC (etc) would become a permitted char in "std" or "dst" - is that why those words are there? If it is, I suspect we should consider changing that... kre
Re: [Issue 8 drafts 0001638]: Requirement that TZ "std" and "dst" be 3 chars long (when given) is apparently ambiguous
Date:Fri, 3 Mar 2023 14:31:13 + From:"Austin Group Bug Tracker via austin-group-l at The Open Group" Message-ID: | Occurrences of "bugno:" and "POSIZ" in the Description have been changed to | "bugid:" and "POSIX", respectively. Thanks. I didn't even notice POSIZ - though that is the kind of typo I would make... kre
Re: [Issue 8 drafts 0001638]: Requirement that TZ "std" and "dst" be 3 chars long (when given) is apparently ambiguous
When sumitting that bug, I (obviously) forgot the magic required to refer to another bug. I'd be grateful if someone who can (ie: not me) would change all of the "bugno:" strings in the Description info "bugid:" (or if that is not correct either, into whatever is). There are several... TIA, kre
Re: Minutes of the 6th February 2023 Teleconference
Date:Thu, 9 Feb 2023 09:19:00 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | there was general agreement that executing the partial line | after getting a read error is really not a good thing for | shells to be doing. I'd probably agree that it isn't the ideal approach, but that's irrelevant (or should be) here - it is not what shells do, or have ever done, so is not the standard. This group, just like anyone else, can put the case to shell implementors that the current approach is sub-optimal, and ought be altered. That would be easier on implementors if the standard makes it conforming to treat a read error as an error (resulting in aborting current processing, etc) as well as the current standard behaviour (treat it as EOF). What cannot be done is to require shells to treat read errors as shell errors rather than EOF, that would be legislating, and that is not what should be happening here. There is a clear de-facto standard here, we either write that down as the standard, or allow it or other behaviour considered better as alternatives, and possibly add a future directions for a posdible change in Issue 9 (if shells have altered their behaviour). kre
Re: Minutes of the 6th February 2023 Teleconference
Date:Wed, 8 Feb 2023 10:24:33 + (UTC) From:"Thorsten Glaser via austin-group-l at The Open Group" Message-ID: | However, executing the partial line after getting a read error | can and probably should be treated differently *unless* a read | error is treated as EOF. I agree with that - the error is either an error, which would cause a non-interactive shell to immediately exit with non-zero exit status (with some message on stderr), or an interactive shell to return to the command prompt, issue a new PS1, start a new read, presumably get an error again, If the read error is treated as EOF, then the shell acts just like any other EOF at that point. I have no problem with specifying the "must be EOF" behaviour (yash could change) but requiring it to be treated as an error, rather than just allowing it, would be a non-starter given that only yash (that we know of) behaves that way.I however don't object to it being unspecified which behaviour will occur - of those two. This is not a case where it needs to simply be unspecified what happens, such that the shell can do anything it likes. kre
Re: Minutes of the 6th February 2023 Teleconference
Date:Tue, 7 Feb 2023 11:45:03 -0500 From:"Chet Ramey via austin-group-l at The Open Group" Message-ID: <26b52c56-89f7-a4a9-e2a1-e754d6387...@case.edu> | The key is that everyone `executes' the partial line after getting EOF, | even yash. This is important, it makes reading from files, and reading from strings, work the same way, which avoids the need for everyone to supply a terminating \n when supplying the command_string arg to sh -c but also for "eval" (and so traps as well) - these strings just have commands which end with an "end of string" (no newline at the end required). Treating "end of string" and "end of file" the same way is the natural thing to do. Read errors being treated differently than EOF would be possible, but isn't what has traditionally been done - at most it should be unspecified whether this is treated as an error, or the same as EOF. kre
Re: tv_nsec
Date:Fri, 20 Jan 2023 08:37:44 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | You haven't stated your reasons for wanting to refute it, so that makes | it difficult to know what we can say to persuade you you're wrong. Nick also didn't state what "it" was to be, making it impossible to decide if it should be refuted or not. | In any case, if C23 changes it to nsec_t If that's the proposal, then I agree with you, that's relatively harmless for POSIX code (slightly more difficult for strict C code, and they should certainly be adding a PRI macro to inttypes.h - all invented printable types should have at least one such a macro defined). But if the proposal is to change it to int32_t (or worse uint32_t) then that would be a real problem - despite 32 bits clearly being enough to represent a count of nanoseconds within one second (had the tv_nsec field been defined that way originally, that would have been OK, but it cannot be changed into that now). Even worse would be (as suggested in an earlier message) for it to be allowed to be any implementation defined type, then it could be a float type, or (absurdly) a struct, union, or even an array. kre
Re: Security risk in uudecode specification?
Date:Mon, 16 Jan 2023 18:02:47 +0100 From:"Christoph Anton Mitterer via austin-group-l at The Open Group" Message-ID: <3d8cea9121caf4944d2d1b8f6ff0dca4537afe92.ca...@scientia.org> | It's the only portable way to encode/decode stuff to/from base64, I didn't even realise that the standard included the base64 variant, rather than just the original traditional encoding. uu*code isn't high on my list of things to care about. | IMO it should only be removed if replaced by the base64 utility. The deadline for that to happen is definitely past. kre
Re: Security risk in uudecode specification?
Date:Mon, 16 Jan 2023 10:01:48 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | There seems to be some misunderstanding here. The only line we | have drawn is for requests for new features. We will continue to | process bug fix requests for inclusion in Issue 8 for a while yet. Ah, OK, good. I thought from: [This is really Andrew Josey from the minutes of the Jan 12 meeting] austin-group-l@opengroup.org said: | We are planning to produce draft 3 soon. | Once bugs 768, 243 (if accepted), and 1617 (if updated to add -w) have been | applied, we just need updated frontmatter to complete draft 3. | Shortly after the meeting the ISO/IEC ballot got underway to approve the | revision project (a separate activity to approving the draft!) Andrew will | need to form the IEEE ballot group as the first part of the IEEE process. and I recalled earlier mention (which I will never find now) that it was planned that Draft 3 be the final draft (I always assumed subject to typo corrections, editing mistakes, things forgotten which were supposed to happen, etc, if there were any of those, otherwise there'd be no point calling it a draft - but I also assumed nothing substantial would change after it was published). I am happy to learn that is not to be the case. I was slightly surprised to see in that quote that the revision project - ie: all that has been happening for the past several years, is not yet formally approved. What would happen to all of this work, should that fail? (Not that I would anticipate that happening, but one never knows). That means, I guess, that if someone cared enough about changing what the text says about uudecode and its handling of setuid bits, that a new bug report might get that changed. It won't be me, a bug report from me would just be to have uuencode/uudecode removed altogether, I think their time to be mandated, or for applications/users to expect to use them, passed quite a while ago - though naturally implementations are still likely to support them for some time yet. kre
Re: Security risk in uudecode specification?
Date:Sat, 14 Jan 2023 09:19:24 -0800 From:Alan Coopersmith Message-ID: <7d6830e3-ab04-2d86-8869-8819283f4...@oracle.com> | We can't compare the command specifications in the standard for tar, So, use pax instead. | as there are none, but if we look at common implementations, they do in | fact protect against issues such as those raised here with the paths: Yes, it can, but the assumption in all of this is that somehow root is being convinced to run the extraction without applying any thought to it (without that, uuencode is no more dangerous than cp). If we're assuming that root can be fooled that way, we may as well assume that root could as easily be convinced that the -P option should be given (tell root that that option preserves the modify times, or something) or perhaps get root to run tar with -C / ... either way would have the same effect. But for tar, overwriting important files isn't the issue that matters, if we can convince root to extract a tar file, we don't need to also get them to add either of those options, we just put a setuid root binary in the tar file, which tar will happily extract as a setuid root binary. As it should. The problem with this isn't the tool, which is doing what it is designed to do, exactly as it is designed to do it, but the root user who doesn't pay any attention to security but just "does as instructed". There are a million ways to take advantage of such a root user, picking uudecode as something to change because of it is pointless. | At the very least here, I thought the standard committee would want to | consider that all of the major implementations of uudecode follow a | defacto standard on removing bits from the permissions that doesn't | seem to be allowed by the current language of the formal standard. Yes, that is an issue that probably should be considered, as what the standard describes doesn't match what implementations actually do. But that won't happen until some submits a bug report in the proper form (ideally complete with new text to update things). There seems now to be no hurry to do that, as the committee (of which I am not part) seems to have drawn a line through the defect reports, and only those which precede it will be attended to in the forthcoming new issue of the standard (which is actually a pity, publishing a whole new version with known defects in it already seems like a poor choice). In any case, it seems as if anything new now will need to wait for at least Issue 8 TC1 (I'd guess 3 or 4 years from now), or perhaps even Issue 9 (maybe 2030 or after - Issue 7 was 2008, Issue 8 might be 2023, or perhaps 2024 - at that rate Issue 9 might be 2040.) kre
Re: Security risk in uudecode specification?
Date:Wed, 11 Jan 2023 13:48:31 -0800 From:"Alan Coopersmith via austin-group-l at The Open Group" Message-ID: | Below is a message sent to the Open Source Security mailing list over | the holidays about a security risk in uudecode, which the GNU maintainer | pointed out was forced by the current language of the standard. The real problem here is that as soon as someone says "security problem" almost everyone simply jumps to "we must find a solution" and no-one ever bothers asking if there really is a security problem or not? That's not an acceptable question, "we must not be seen to be ignoring security issues". But ask yourself, what if the utility in question here was tar, or pax, or cpio (or whatever it is that Solaris uses for system installs and updates)? Is there any material difference to uuencode in how they operate, or what they can do (except that tar (etc) will usually set the setuid bit in extracted files if the archive says to do that - how else would "su" ever get installed correctly?) What's more, it is far more common these days for an e-mail message from some random source to contain a tar file (usually also compressed, but that's irrelevant here) than a uuencoded file - which is the actual bigger security threat? Or is the security threat really the idiot user who simply runs arbitrary commands as root, and then complains when bad things happen? Of course, since we cannot "fix" the users, we keep trying to fix everything else - which is doomed to failure. All of these file container handling utilities do more or less the same thing, they bundle up files, and upon request, unbundle them again. That's what they are designed to do. Using any of them inappropriately can be a security problem, but it isn't the tool that is the problem, but the inappropriate use. And it isn't just the archive format utilities that have issues like this. What do you expect to happen when someone says "make install" ? What if a root user is fooled into running that on a makefile that has: install: cp myfile /usr/local/myfile @ (cp /bin/sh .secret-file; chown root .secret-file; chmod u+s .secret-file) >/dev/null 2>&1 Oh no, security problem in make! Really! The unnamed GNU maintainer was just being polite while passing the buck for something they know cannot be "fixed", "forced by the current language of the standard" is simply another way of saying "doing what it is designed to do". You can add new options to (almost) any utility, and have those non-standard options do almost anything, but if you want to remain conformant to the standard (ie: what people who know what they're doing expect) when the option isn't given, the utility must operate as the standard says, at least when operating in a conforming environment (which you get to define, but need to document). If there was anything to do here with uuencode/uudecode it would be to (again) consider removing them from the standard - but not because of security issues, just because they are now essentially obsolete. That doesn't much help implementations though, which will need to keep supporting them essentially forever, because some user might have a script somewhere which uses these things - and because of that keeping them in the standard so the implementations don't drift apart makes sense. kre ps: this exact issue was also raised in NetBSD, very briefly, a while ago - it got dismissed out of hand, and hasn't been heard of there again. The whole thing is bogus.
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Fri, 16 Dec 2022 17:31:03 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | Before I get into detailed responses, please note this is my last | working day before the holiday break, so I won't be contributing to | this discussion further until January. OK. I think this discussion has mostly reached a dead end anyway, nothing (relevant to the topic anyway) is changing in any of the recent messages. | I may have misled you a little in the way I worded a previous email. | It's not the time_t type itself that you can't do arithmetic on, No, not misled, I knew what you meant, and I understand that arithmetic types allow arithmetic, what matters is whether that arithmetic makes sense in context or not. | The description of time() says: | | The time function determines the current calendar time. The | encoding of the value is unspecified. That's what I was missing. I don't know my way around the C standard, and don't know where to ask people to look. | I agree with "related" but not with "corresponds". By saying | "corresponds" you are assuming that the conversion is reversible, | i.e. that it is a one-to-one mapping. It is not. No, no such assumption, since mktime() input can have out of range values, and inverting the time_t that results will never produce that, it is clear that nothing can (necessarily) be reversed. If you think that "corresponds" implied that, then by all means we can pick a different word. But "related" isn't really strong enough either (Wednesday is related to Tuesday, it is the following day, but that doesn't mean that if the input to mktime() specifies a date that is a Tuesday, it is OK to return the following Wednesday instead, just because they're related). The mktime() input must specify precisely what time_t value is to be returned, otherwise the function is useless - calling functions (apart from random number generators) that return results which are not what the input requests be returned is a waste of time. | > The first thing to note is that this only applies to UTC times. | Hence the "corrected for timezone and any seasonal time adjustments" | in the preceding mktime() quote. Yes, but we cannot make that correction until we have a UTC time to correct, we don't know what correction to apply until after that is done. This is something of a dilemma, as the input is given in the local timezone, but without enough information to allow that correction to be made, until after we have found the corresponding time_t (UTC) value (in general, and at the very least, until after we have a properly in range, and well defined, local time value). | If the standard meant local time here it would say "local time". | The fact that it instead says "actual time of day" shows that it | does *not* mean local time. I agree it doesn't mean only local time, but recall the actual time of day is local time. If the standard said "local time" it could be read as simply meaning that local time is unspecified (which it largely is) rather than also meaning that the system's clock is not necessarily synchronised with that local time (or UTC), which it is also saying. It means both. It could require synchronised times (for some applications that's needed), but doesn't (and shouldn't in general), but it cannot specify how local time works (which includes how it corresponds to UTC), that's outside of POSIX's jurisdiction. | As quoted above, mktime() first converts the broken-down time to UTC | seconds since the Epoch It can't. And nothing in the standard says that it should, as that would be absurd, one cannot convert a local time into a UTC time (however that is reckoned, here as a count of seconds since the Epoch, but that detail is irrelevant) without knowing the local timezone information first. | and then corrects it for "timezone and any seasonal time adjustments". No, that isn't what it says at all. If it did it would be ridiculous. But it doesn't. What it does say (and today I'm quoting from Issue 7 TC2 (2018 edition, ie: c181, not that, other than the page and line numbers, it makes any difference, this part has not been changed (yet anyway)). Page 1331, lines 44305-7: The mktime( ) function shall convert the broken-down time, expressed as local time, in the structure pointed to by timeptr, into a time since the Epoch value with the same encoding as that of the values returned by time( ). So clearly, we have a local time as input. Not disputed I believe. Then, same page, lines 44315-8 The relationship between the tm structure (defined in the header) and the time in seconds since the Epoch is that the result shall be as specified in the expression given in the definition of seconds since the Epoch (see XBD Section 4.16, on page 113) corrected for timezone and any seasonal time
Re: strftime %Ou
Date:Fri, 9 Dec 2022 12:11:14 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | It made it to the list, but the lack of an answer probably means | nobody who read it can answer it. Yes... However, I was looking at XRAT (from the current standard) today (for unrelated reasons) and... | > > In draft 2.1 (and the current spec) strftime's %Ou modified spec is described as: | > > | > > %Ou Replaced by the weekday as a number in the localeâs alternative representation | > > (Monday=1). | > > | > > Should that say "as a number using the locale's alternative numeric symbols"? | > > Otherwise the definition is circular. came across XRAT A.7.3.5 (LC_TIME) which happens to include this statement: It can be noted that the above example is for illustrative purposes only; the %O modifier is primarily intended to provide for Kanji or Hindi digits in date formats. That's on page 3532 (lines 119660-1 aside from the leading "It" which is on line 119659). I haven't checked Issue 8 Draft 2.1, but I cannot see any reason that section would have changed. I also cannot imagine that only Kanji or Hindi is intended there, just for systems that don't use arabic digits (0 1 2 ...). kre ps: I agree that this is still largely a WG14 issue.
Re: behavior of the QUIT character (^\) in the shell command line
Date:Mon, 19 Dec 2022 00:17:25 +0100 From:"Vincent Lefevre via austin-group-l at The Open Group" Message-ID: <20221218231725.ga104...@zira.vinc17.org> | Well, so it is not forbidden to bind it to "exit with a core dump" | (e.g. abort()), which is what a SIGQUIT does by default. :-) No, you can bind ctrl-\ to any action your shell allows, definitely not forbidden. Note, that's not SIGQUIT, it is just a character, not a signal. You only get one or the other (or neither sometimes) never both. | Then the requirement from the standard is a bit strange. Not really. | One may still | say that it is useful for a SIGQUIT sent by some process, but I have | the impression that this is an unusual case and that the standard was | more targeting a SIGQUIT generated by the QUIT character. Yes, it is. The point of it is that if you have job control disabled, run some command, and then generate SIGQUIT from the keyboard, you want that command to exit and dump core, but the shell which ran the command to still be running, get the exit status, and tell you about it, rather than also exiting and dumping core. Or most people do anyway. Or at least that's what shell authors (all the way back to the original Thompson sh, and including csh variants, not just posix sh (Bo8rne sh descendants)) believe you want, and so do. The standard just says what shells actually do. This is less important with job control enabled, as when running some foreground command, the shell and it will be in different process groups, and so a SIGQUIT sent to it will not be received by the shell. But it still matters, as it may happen that you are running some command, which has not finished, is not telling you why, you get bored with waiting, and decide to find out by generating a core file and analysing it. So you press the 'send SIGQUIT' keyboard chord, but just while you are doing that, before the keyboard has had time to send the keycode to the system, the command exits, the shell returns from its wait, and returns the tty pgrp to belong to itself again. Then the char you typed arrives, SIGQUIT is (or might be) generated and sent to the shell now. Would you want the shell to exit and core dump? Also, it was obvious from the test results tha you provided, that you were testing with command line editing enabled. When that is happening, the shell will have altered the termios settings to whatever it needs to make that work the way it wants (and will restore them before running a command). The example where the quit char was included in the input makes it clear that in that case at least, no SIGQUIT was ever generated, so the question of what that shell would do if one were received is unanswered. The terminal driver will never both queue the "quit" char as input, and send the signal, it is one or the other (the same applies to all other signal generating characters). Other tested shells might be doing something similar, but with that char set to perform some different function, perhaps "ignore me", or "flush current command entered so far" (with or without a new prompt) or Your testing might never have generated a SIGQUIT at all, so what any of those shells might do if that signal were received might never have been examined. Disabling line editing would make it more likely to be generated, but not certain, as like any other program, the shell is permitted to modify the terminal settings however it likes when it is in control of (ie: reading from) the terminal. kre
Re: behavior of the QUIT character (^\) in the shell command line
Date:Sat, 17 Dec 2022 22:20:20 +0100 From:"Vincent Lefevre via austin-group-l at The Open Group" Message-ID: <20221217212020.ga388...@zira.vinc17.org> | What is the behavior of the QUIT character (^\) when typing a command | in an interactive sh shell? As you have seen, it varies, and is not specified anywhere I know of. | https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html just | says that if the shell is interactive, SIGQUIT shall be ignored. That just means that the shell doesn't exit with a core dump when the signal is generated (however it is generated, including via kill(2)). What happens depends upon the terminal settings, and how the shell chooses to implement command line editing (which is specified to exist, at least for vi mode - others are allowed as alternatives - but isn't specified how it works). If the shell isn't doing command line editing, the effects of the quit character on the terminal input buffer still occur (anything pending is flushed), if it is, then it all depend what (if anything) the Ctrl-\ (or other char that might be set as the quit char in termios) is defined to work in that editor - one can bind it to do almost anything in most shells (typing the character doesn't necessarily generate a SIGQUIT). kre
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Wed, 14 Dec 2022 08:08:36 +0700 From:"Robert Elz via austin-group-l at The Open Group" Message-ID: <11981.1670980...@jacaranda.noi.kre.to> | Set the input to represent Jan 1, 2023, noon. (Jan 1 just so | working out yday is simple) Which turns out to have been exactly the wrong thing to do for the purposes of the example. The point intended relates to computing tm_yday which depends upon tm_mon and tm_mday (and also tm_year).That is, unless it is simply 0 as above... One cannot compute tm_yday without knowing tm_mday tm_mon and tm_year, and one cannot calculate those without first having normalized tm_sec (which might affect tm_min when adjusted), tm_min (which might affect tm_hour when adjusted) and tm_hour (which might affect tm_mday when adjusted) - and of course, tm_mday, tm_mon, and tm_year have this weird relationship where they all depend upon each other, though in practice, nothing ever needs adjusting more than twice. Still, the result is the same, the normalisation must happen before the XBD 4.17 formula is applied. kre ps: for anyone who cares, here is the bc function I used when testing this... (just so not everyone needs to copy all these magic numbers). Remember the scale needs to be 0 (no fractions permitted). define e() { return ( s + mi*60 + hr*3600 + yd*86400 + (y-70)*31536000 + ((y-69)/4)*86400 - ((y-1)/100)*86400 + ((y+299)/400)*86400 ) } then set s (tm_sec) mi (tm_min) hr (tm_hour) y (tm_year) and calculate yd (tm_yday) from the supplied tm_mon and tm_mday (along with tm_year).
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Tue, 13 Dec 2022 16:52:42 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | No, the adjustment to bring struct tm fields into range is done after | the time since the Epoch value has been calculated. Just in case you don't believe my assertion that this does not work, I have a small experiment for you to run. Write a simple bc function (bc just because it is very quick to write, and will have no issues with overflow - set scale 0, so we get simulated C integer arithmetic) which implements the XBD 4.17 formula, exactly as written (but just use bc global vars instead of fields in the struct tm - or any other way you choose) Set the input to represent Jan 1, 2023, noon. (Jan 1 just so working out yday is simple) Run the function, then use date -u -r , -u to simulate UTC, as the bc function will not be adjusting for the local timezone (or you could make that adjustment for your timezone if you want) and -r N to give the time_t value to use instead of "now", if that is not -r, then use whatever facility your date command has to do that - any reasonable one has a way. Confirm that you get Jan 1, 2023, noon. If not revise either the function (fix typos) or the input data, until you have it working. Then increase the year by 4, and run it again. you should get the time_t for Jan 1, 2027, noon (if everything is ok, this just works). Set the year back to 2023, and instead add 48 to the month var (since Jan is month 0, that just means setting the month var to 48). Run the function again. What do you get this time? Note that you could instead add 1461 (the number of days in any 4 year period which does not span the turn of the century) to the mday field. Or 1461*24 to the hour field (etc) - try them all if you like. Examine the formula to understand why. Normalisation must happen first, the formula really only works for in range values. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Tue, 13 Dec 2022 16:52:42 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | It is too late to add timegm() in Issue 8. I suspected that would be the case. Pity, as using UTC (or whatever it is that POSIX time is really called, not really UTC, as that has leap seconds) (gmtime(), modify the result, timegm()) is a way that works to adjust a time_t without doing direct arithmetic upon it, as POSIX base time is very regular, no anomalies to deal with. Incidentally, I'd be interested to see a quote from the C standard that specifies time_t with the limitations you expressed in the previous message, all I've been able to find is that it must be an arithmetic type, and that its range and precision aren't specified. That's exactly what is specified for a clock_t as well - however for clock_t it is also explicit that it is possible to divide the value by a constant with meaningful results - ie: normal arithmetic operations are possible. If they're possible on a clock_t I see nothing there (in moderately recent C anyway) which would suggest that they're not possible on a time_t as well. Certainly the unspecified precision (where POSIX specifies "seconds") means that care would need to be taken to add using the correct units, but an implementation could provide a specification to allow programs to discover what the precision actually is. | You are suffering from a misconception that *timeptr somehow "specifies" | a time since the Epoch. It does not! It specifies a broken-down time. No, no misconception there, though sometimes I suppose (as it often is) that my language might be a little loose. However I do certainly hope that you agree that the broken-down time is related to the resulting seconds since the Epoch, when I have used "specifies" (loosely perhaps) previously, all I have ever meant is that - that is, that mktime() is not intended to be free to return any random time_t it likes - it must return one that corresponds to the broken-down time passed in. I certainly hope that you're not disagreeing with that. | The standard describes, in detail (in the paragraph beginning "The | relationship between ..."), how this broken-down time is *converted* to | an integer "time since the Epoch" value. That's not "in detail", It says (since the quote contains section and page numbers, this extract is from Issue 8 Draft 2.1, but the substance is unchanged from earlier versions): The relationship between the tm structure (defined in the header) and the time in seconds since the Epoch is that the result shall be as specified in the expression given in the definition of seconds since the Epoch (see XBD Section 4.17, on page 95) corrected for timezone and any seasonal time adjustments For that to mean anything at all, we need to look at XBD 4.17: A value that approximates the number of seconds that have elapsed since the Epoch. A Coordinated Universal Time name (specified in terms of seconds (tm_sec), minutes (tm_min), hours (tm_hour), days since January 1 of the year (tm_yday), and calendar year minus 1900 (tm_year)) is related to a time represented as seconds since the Epoch, according to the expression below. The first thing to note is that this only applies to UTC times. [That's one reason why using gmtime() and timegm() for adjusting time_t values makes much more sense]. If the year is <1970 or the value is negative, the relationship is undefined. If the year is 1970 and the value is non-negative, the value is related to a Coordinated Universal Time name according to the C-language expression, [...] I'm not going to quote the expression here (anyone interested can look it up for themselves) but again it is clear that this applies only to UTC times. It says so. XBD 4.17 goes on to say: The relationship between the actual time of day and the current value for seconds since the Epoch is unspecified. This was discussed briefly before, and you claimed that all this means (paraphrased here by me) is that the system's clock (what time() returns) and the real world precise time of day aren't necessarily the same (ie: there's no promise that systems are running NTP or similar). That's certainly implied by that sentence, but that's not all that it says - it is quite explicit that there is no specified relationship between "actual time of day" (ie: local time) and the "seconds since the epoch" value. Note: not no relationship (obviously there is) just that that relationship isn't specified by the standard. So where exactly is this "in detail" specification of how a local time (with all of its peculiarities) is supposed to be converted to seconds since the epoch? | When the standard says "shall be set to represent the specified time since | the Epoch" it is talking about the integer
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Mon, 12 Dec 2022 12:02:39 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | The above misrepresents my claims in a few respects. | | "the POSIX standard precludes an implementation from returning an error" | | I only claim this for TZ values that do not begin with a colon. But you won't allow an error value for the other cases, so in effect, you're precluding it, whatever the TZ value might be. If an error is ever possible, then it is always possible. Applications can't be written to only work when the TZ value is the one form which is practically useless. They need to work for all possible (correct) TZ values, not just the one useless particularly defined case. | I assume you are basing that on the C committee's response to DR #136. Since that is what they said. Yes. | There is much historical inaccuracy here and your conclusion is wrong. All of that is certainly possible. | Although it requires time_t to be an arithmetic type, the C standard | does not require that it is possible to do arithmetic with time_t (and | this is not changing in C23). Oh. That is a pity. Fortunately, in POSIX, time_t is much more restricted, and arithmetic works (and people use it all the time). | The mktime() and difftime() functions are the only way strictly | conforming C programs can do arithmetic involving time_t. OK. difftime() is fine. mktime() as currently specified is useless. As implemented however, it mostly works, though to use it to do arithmetic on a time_t one needs to be particularly careful, as it doesn't obey the normal rules of arithmetic. C23 is apparently going to have timegm() (the mktime() equivalent for UTC instead of localtime). Using gmtime() modifying the struct tm, and then timegm() to get the time_t back would work much better, at least if the specification of timegm() is better than that of mktime() (I haven't seen it). I know it is getting very late in the process, but perhaps we should also be adding timegm() now. | By a strict reading, you may be right, but it is strongly implied by | "shall be set to represent the specified time since the Epoch". That's fine when the specified time (that is, the time passed in in *timeptr) is a time that exists. But there's nothing that says what month 97, mday 312, minute -1234, hour 999, second -23456789, year (anything that doesn't cause time_t overflow for the implementation) tm_isdst anything represents. If you can find something somewhere that specifies what that means, in the C or POSIX standards (or just about any other standard you care to reference) then great. mktime() allows that input, but I see nothing that says which particular time_t value should be returned. You might be imagining how an implementation might deal with this, as can I, the two might even be the same - but it is certainly not specified anywhere. | In any case, it is being clarified by bug 1613. Unless you made more changes there than I thought, no, it isn't. The extra text that was added there just says what the returned struct tm (in *timeptr) must be, in relationship to the time_t returned. It says nothing at all about how that time_t is selected. | This would definitely not meet the requirement "shall be set to | represent the specified time since the Epoch". Of course it could. If the time passed in contains out of range values, there is no defined meaning that can be attributed to them. If you can find somewhere where that's stated, then please, enlighten us. | Already being fixed by bug 1613. No. | > Where in the standard does it even hint at any of those changes being more | > acceptable than any other? [Hint: it doesn't.] | | Of course it does. It requires that a time since the Epoch is calculated | from the supplied broken-down time, Yes, but one cannot calculate a time since the Epoch from out of range values. It simply doesn't work. If you are believing that you can simply apply the formula in XBD whatever (which is defined for in range UTC values) then you're mistaken. When you're considering this, do note that the values of the fields of the struct tm passed in might all be MAXINT - if time_t is a 64 bit type (which it usually is these days) and int is 32 bits (also still very common) then that won't actually overflow the time_t, but it would cause overflow for the calculations in that formula. | and then requires (on successful completion) that the fields in the | broken-down time are updated to | "represent the specified time since the Epoch". Yes, this part is not controversial. | Your suggested other adjustments would not represent the time since | the Epoch that is going to be returned. Of course it would, the adjustments are made to create a struct tm that only contains in-range values, and then from that a time_t is produced. The two match perfectly. | Huh? The struct tm
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Thu, 8 Dec 2022 11:22:04 -0800 From:"Don Cragun via austin-group-l at The Open Group" Message-ID: <7fd37609-74ff-42f5-a974-76c7010ee...@sonic.net> | I agree with Geoff. I actually don't think you do, not really. You might not agree with me, or not now, but your argument is nothing like his. Geoff is claimimg that the POSIX standard precludes an implementation from returning an error. And further, that remaining compat with the C standard (which does allow an error return) somehow precludes POSIX from also allowing one. Those arguments are nonsense.Yours is much more reasonable, though I believe ultimately reaches the wrong conclusion. You might remember, but perhaps not, that in a postscript to a message I sent on Nov 25, I said: ps: there is more wrong with the mktime() specification than just this issue - this one was supposed to be the simple one, not contentious at all, I expected. I expect much the same for some other problems, but given what happened here, who knows? Most of the other problems are really C specification problems of course, and should really be fixed there (but I have nothing at all to do with that group). You are hitting on two significant issues that referred to. The first, and more significant one in your argument was identified in: https://austingroupbugs.net/view.php?id=1614#c6032 the (very long, I'll admit) bugnote to issue 1614 which was the precursor of the mailing list discussion (moving this from bugnotes to the list was the right thing to do - even if for no other reason than that sending e-mail is much more enjoyable than dealing with mantis, even when needing to remember to explicitly override the "Reply-To" that the list absurdly adds). In that note I said: That is, the "other components" (which means all of the relevant ones, just not tm_wday and tm_yday which are irrelevant here) are set to represent the specified time since the Epoch (that is: the time specified by the caller of mktime()) but with any out of range values (according to what is specified in ) adjusted so that are in range (and while it does not say so, and probably should, I would interpret that to also mean not having 31 days in November, even though 31 is within the range permitted for tm_mday in ) but it doesn't say that they can be adjusted for any other reason. That is, the de-jure standard clearly allows Feb 29, 2023 as a valid struct tm (as it would Nov 31) but all the implementations know that isn't what is really intended, and are more restrictive than the standard requires - whether by doing so they are actually violating the standard is hard to say. That's the first issue, which you encounter here: | If we accept Robert's argument, then it isn't just gaps in time caused | by a timezone shift that would be affected. Before we continue, go back and re-read what you wrote (accurately I think) about the issue here: Robert & Geoff have been arguing about whether or not giving a struct tm to mktime() that specifies a time in the gap between standard time and daylight time is allowed to be treated as an error I'll admit, that when I submitted the bug report, I thought as you wrote in the following paragraph Robert is arguing that if (after adjusting other fields to bring them into the ranges specified in ) mktime() should return an error if ... as I really could not (still cannot really) see any possible justification for acting differently, and couldn't imagine an implementation actually behaving otherwise. But since it is now clear that some implementations do simply invent a time_t to return, I have since changed my stance to me more like that in your first paragraph, "allowed to be treated as an error". I said as much in my most recent (before this) message to the list on this topic (6 Dec): Just agree to add the EINVAL error code, make it a "may fail" if you like, I am no longer expecting an outcome where anyone is required to return the error, just one where the error is possible - and why that entirely fits with your argument you will see soon. | As an example, if I call time() on January 30, 2023 at noon, For your argument, you wouldn't want to do that, you'd want it to be Jan 29. | it will return a struct tm with tm_mon set to 11 (it has a normal range | of 0-11) and tm_mday set to 29 (it has a normal range of 1-31). No it wouldn't, tm_mon would be 0, and (for Jan 30) tm_mday would be 30. But those values don't really matter to the point of your example, so those errors are irrelevant. | If I then add 1 to tm_mon and call mktime() with the resulting struct tm, | I'm asking mktime to give me back a struct tm for noon on February 29, 2023. Indeed, you would be. And this kind of issue (even with the result you're expecting, and which implementations deliver, is exactly why I don't believe
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Tue, 6 Dec 2022 12:01:52 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | You have completely | ignored my earlier email (austin-group-l:archive/latest/35115) where | I stated that for TZ values beginning with a colon, the timezone | information used by mktime() is implementation-defined and therefore | this creates the same loophole that exists in the C standard, I don't agree that it is a loophole, I don't think the C standard views it that way either (or that that is their reason for allowing an error return) - but regardless of any of that, what matters is that mktime() has to work with those implementation defined TZ values (and the new one being added in Issue 8) - and if an error return is agreed as possible is those cases, mktime() should be specifying that possible error, shouldn't it? The more recent part of this discussion follows on from your assertion that to allow an error return would break compatibility with the C standard, and so cannot be done in POSIX. That's utter nonsense, and is relying (from what I can tell) upon your (from what I can tell) unsupported opinion of the reasons that the C committee decided that errors are possible (in both the weird cases). All that really matters is that errors are possible in C, nothing in POSIX forbids an environment in which those errors cannot happen, hence the error must be allowed to occur in POSIX mktime() as well. | This has been a lengthy thread and I have been assuming that if I | quoted something from the standard earlier in the thread I don't need | to quote it again. Sorry, but I, and I expect everyone else, doesn't have the time to go back and reread all of your previous messages, and try to guess which quote there (when there was one) might be the one you're intending to rely upon now. Just include the text (or an explicit reference to it) - if you're thinking of it when creating a message, you know exactly what that is, and a cut in those circumstances is simple. | For TZ values that do not begin with colon, You mean Issue 7 TZ values which do not begin with a colon. We're working on Issue 8 now, and that is going to have TZ values not beginning with a colon which are not nearly as precisely defined. But none of that matters, mktime() has to work with any (properly set) TZ string (ie: TZ=/etc/passwd is probably not going to do much useful). Not just the archaic (functionally useless) TZ strings that POSIX has defined all this time. But certainly TZ=:Asia/Singapore with a local time right near the end of 1981 or very early in 1982 (depending upon whether the correct, or erroneous, data is available) must work (as in, there is no such local time, and hence there cannot be a time_t to represent it). That a non-colon archaic TZ definition cannot describe that transition is irrelevant. | the description of TZ in XBD 8.3 gives precise rules for the adjustments: It does. And that creates gaps (in localtime), during which there is no stated offset. That is (using the default one hour for this) at UTC time N, local time is M, and the offset is M-N (or N-M depending upon how you're thinking about it at the time). At UTC time N+1 local time is M+3601, and the offset is M+3600-N (or ).Local time M+2400 simply does not exist, its offset is neither M-N nor M+3600-N and there is nothing, anywhere in the standard, which says it has to be one or the other (which, again, would be absurd, as things which don't exist don't have attributes). [Aside: I know that in that, I am using what is more or less a time_t representation of local time, which doesn't actually exist - but without that the concept of the offset makes no sense at all - the interpretation of a local time_t M is that which would appear if the UTC time_t M was converted to a broken down (struct tm) time, then considered local.] | > Lines 43855 to 43858, page 1311, in XSH 3 - mktime(): | > | > A positive or 0 value for tm_isdst shall cause | | This wording is taken from the C standard, Yes, almost all of the wording in mktime() in POSIX is directly from the C standard - about all POSIX changes is "calendar time" to "seconds since the epoch" (which is just different wording, and means the same - though the POSIX version is much better) and the addition of errno. | where it is necessarily vague | because of the implementation-defined nature of local time and DST there. But so it is in POSIX - you cannot assume in mktime() (or localtime(), or any of the others) that only the archaic POSIX TZ string is being used. You certainly wouldn't want to, as just about no-one uses that nonsense any more, it simply doesn't work. But that's also presuming the intent of the C standard authors, and one which I doubt is correct - it is just as likely that it is vague because implementations differed, or perhaps even it isn't really vague at all, and says exactly what is
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Mon, 28 Nov 2022 09:35:25 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | When the standard is silent about something, requirements that | *are* stated still apply. Sure, but only for requirements that are actually stated. Here the things you believe to be stated requirements, don't seem to be nearly as obvious to me ... but it is hard to tell, as you almost never bother quoting the words from the standard that you claim say what you believe to be stated. That makes it hard to refute, as I have no idea what I am expected to argue is different - just some random "the standard says" without any clue what part of it you mean. So, from here on out, unless you actually quote the words that you're relying upon, I am going to ignore your arguments, and ask that everyone else take them with a grain of salt as well. | In this case, it is clear from the use of "any" in "corrected for | timezone and any seasonal time adjustments" that either a seasonal | adjustment is made or the value resulting from the timezone adjustment | is used without making a seasonal adjustment. This is better - we know which words you're relying upon here, and how you have managed to mangle what the standard actually says to fit with your preconceived view of what it should mean. The text you quoted there is from (in Issue 8 D 2.1) on page 1311, lines 43862 to 43863 (in XSH 3 / mktime()). (The same thing is in earlier versions, this just happens to be the version I picked to reference today, perhaps I should have used C181, but too late now...) Now lets analyse what that actually says: "corrected for timezone" which you ignored, but seem to be treating as if it said "modified by the offset from UTC of the timezone", which it does not, if it had meant to say that it could have said that. The only (not very good) definition of "timezone" I can find is in XBD 8.3 where it specifies TZ, which says TZ This variable shall represent timezone information. (page 161, line 5613 ... all references in this message will be to I8 D2.1) and then later says (lines 5621-3 same page) If TZ is of the first format (that is, if the first character is a ), the characters following the are handled in an implementation-defined manner. So the definition of a timezone can be implementation defined - that is, everything about it, can be implementation defined, as the standard doesn't seem to specify anything (which is not really a surprise, as the POSIX has no particular influence over lawmakers who get to define how time works within their jurisdiction - but POSIX systems need to be able to work, and show some semblance of what is considered to be the correct local time, whatever those lawmakers deem to be appropriate). OK, next from your quote: "and any seasonal time adjustments" which you then paraphrased as: "either a seasonal adjustment is made or the value resulting from the timezone adjustment is used without making a seasonal adjustment." Don't you see just how myopic that is? In your mindset, you see a nice regular timezone which has a nice fixed offset from UTC, and perhaps at some point a once a year alteration of that offset slightly, and then, also once a year, an adjustment back again.Isn't it clear, even to you, that the "any...adjustments" is plural, and you made it singular "a seasonal adjustment" in your variant of what it says. There is no specification anywhere about how many seasonal adjustments there might be, or what those might look like. That they might not be able to be represented in a traditional (pre issue 8) TZ variable using non implementation defined syntax means nothing. Note that the timezone (when specified with the ':' syntax for TZ, and also in the newer syntax being added in I8) is never "undefined" or "unspecified" - just implementation defined. mktime() isn't excused from working with an implementation defined timezone specification, it needs to work with those as well, and such a thing does not necessarily have the nice neat form that you're expecting timezones to be like - and that the majority of the world's timezones (today) are nice and neat is irrelevant. (Think to systems using solar time for the local time, where the local time is set based upon sunrise each day - POSIX needs to work in that kind of environment as well, even if there might be none of those left, right now). You go on to claim: For times in the gap, the standard does not say which of these choices to make, so it is unspecified whether a seasonal adjustment is made or not, but those are the only two allowed behaviours. which I hope that you, and everyone else now, can see is absurd. There is not "a seasonal adjustment" that can be applied or not, there are many possible implementation defined adjustments that could be applied,
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Fri, 25 Nov 2022 13:17:36 + From:Harald van Dijk Message-ID: | Does POSIX actually specify the seasonal | adjustment, if applied, has to be 1 hour? No, it doesn't - that's just the default (as it is most common) if an (old style) POSIX TZ string doesn't specify the offset to be applied to summer time. It does specify the 1 hour default though. There's no problem with this part of the TZ specification (other than that TZ strings cannot possibly represent all of the world's timezones - the limit on the offset is 24 hours (ahead or behind UTC), and there's no way to specify an alteration of the zone's offset, other than the seasonal variation (summer time). kre
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Thu, 24 Nov 2022 15:49:49 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | Combining the above with the TZ rules, if TZ=EST5EDT then POSIX requires | that mktime() calculates seconds since the Epoch as specified in XBD 4.16 | then applies a timezone adjustment of 5 hours and (depending on tm_isdst | and the specified date and time) a seasonal adjustment of 1 hour (with | implementation-defined start and end times, but we can eliminate that | by including a rule in the TZ value). There is nothing unspecified | here at all. I could (if I needed) dispute more of your message than this, but there is no need, this is enough for the purpose here. In the case where tm_isdst == -1 (which is the relevant one here) and where the broken down time referenced by timeptr specifies a time in the gap, that is, a time which never existed (or ever will) and so is not summer time, and is not standard time, it is not any kind of local time at all (except erroneous) and the application has not told us which to pretend it should be, where exactly is the specification of which offset is supposed to apply? Don't bother hunting, there is none, and as you have said on various topics many times, that which is not specified is unspecified. Note that it is not unspecified whether it is the standard time offset, or the summer time offset, it is simply unspecified. So, there is something here unspecified, and if the application invokes unspecified behaviour, the implementation is free to produce any result that pleases it, right?Hence an error return is acceptable. And if that is true, an errno value ought to have been assigned. Further, in the case where tm_isdst == -1 (still the relevant one) and where the broken down time referenced by timeptr specifies a time in the foldback period (ie: a local time which occurs twice (or more perhaps) with different offset values, the application has not told us which they prefer (and in some cases, have no way to achieve that anyway, as both before and after the fold (or gap in the other case) tm_isdst==N (where N is 0 or 1, but the same in both cases) where is it specified which offset is to apply. Again, it isn't. So this is also unspecified, and consequently ... | This could perhaps be the basis for a compromise solution. NetBSD | could return -1 for times in the gap when TZ begins with a colon, I am not interested in making NetBSD conform, that's not the point of this, if the specification is rational, then we will conform as we generally do. When POSIX is irrational, we simply ignore it. What matters here is that the specification makes sense, and conforms with the C specification as much as possible. Requiring implementations to produce erroneous answers would not be a specification which makes sense, so we would simply ignore it if that is what happens here. Hopefully, others in the decision making process will see this issue for what it is, and sanity will prevail. kre ps: there is more wrong with the mktime() specification than just this issue - this one was supposed to be the simple one, not contentious at all, I expected. I expect much the same for some other problems, but given what happened here, who knows? Most of the other problems are really C specification problems of course, and should really be fixed there (but I have nothing at all to do with that group).
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Tue, 22 Nov 2022 12:49:13 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | Having returned refreshed from my break, I have re-examined this issue | and I now have a clear understanding of why the C standard allows | mktime() to return -1 for times in the gap but POSIX does not. I sent Geoff a much longer reply to this message than this one - but once again neglected to add a cc to the list. He's welcome to forward that message if he feels inclined. It touched upon almost all the points of his message (you will have seen from my earlier reply here today, that the tzdata error will be corrected) - but it really just boils down to this. | Okay, let's examine the text in C89/C90: | | The mktime function converts the broken-down time, expressed as | local time, in the structure pointed to by timeptr into a calendar | time value with the same encoding as that of the values returned | by the time function. | [...] | | Returns | The mktime function returns the specified calendar time encoded as | a value of type time_t. If the calendar time cannot be represented, | the function returns the value (time_t)-1. | | (In C99 and C17 it is the same except for additional parentheses | around "-1"). | | This wording is almost identical to POSIX, except for "shallification", | the use of "time since the Epoch" in POSIX instead of "calendar time" in | C99, and the POSIX requirement to set errno. Yes, they are essentially the same, hence if -1 is allowed from C, it is also allowed for POSIX. | However, there is a big difference in the requirements that arise from | these almost identical wordings, and that is because local time and DST | are implementation-defined in C, but in POSIX they are not. XBD 4.17 The relationship between the actual time of day and the current value for seconds since the Epoch is unspecified. POSIX specifies that local time needs to exist, and that summer time is possible, and provides a mechanism to indicate when summer time begins and ends (if it exists), but that's it. Everything else, as it says there is unspecified. | In order for a non-POSIX implementation of mktime() to return (time_t)-1 | for a time in the gap, all it has to do is define local time and DST in | such a way that times in the gap are converted to a value that cannot be | represented in a time_t. For example, it could say they are converted | to UINT64_MAX if time_t is a signed 64-bit integer type. Then the | requirement in the C standard would kick in, requiring mktime() to | return (time_t)-1 because UINT64_MAX can't be represented in that time_t | type. I very much doubt that's the reasoning they used, but if they did, the exact same reasoning is available in POSIX, with the exact same conclusing, and as POSIX is deferring to the C standard (very explicitly in the case of mktime()) if C says that -1 is an OK return, then -1 is an OK return. | This "loophole" is not present in POSIX because local time and DST are | not implementation-defined. Nonsense. See above. (What they are is explicitly unspecified, which is even looser than implementation defined.) The time_t value cannot be represented, as it does not exist, or is ambiguous, and the implementation has been given no guidance which of several possible values applies. In either case, returning -1 is an entirely reasonable thing to do, and much better than picking some random time_t value and returning that instead. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Tue, 22 Nov 2022 12:49:13 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | Because when the change happened, 1981-12-31 23:30:00 in the old time | zone became 1982-01-01 00:00:00 in the new timezone. That's now been confirmed from other sources, the next tzdata release will contain the fix (with credit to Geoff of course). No updated release just for this, after being incorrect essentially forever (well, really forever) it can wait until a new release is needed for some currently relevant update. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Thu, 10 Nov 2022 12:33:47 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | You are the one requesting a radical change to the standard. Actually, I am not. That's why I am trying to get you to explain where the standard says anything that even permits the behaviour you're claiming it mandates. When I submitted this bug report (#1614) I assumed this one would be a simple no brainer, no controversy at all, unless perhaps someone reported an implementation which returned a different error than EINVAL. If anything, I wondered if the other two bug reports I submitted at around the same time (#1612 and #1613) might have generated some debate (not that I was expecting much resistance there either.) It seems clear to me that the current standard allows an error return in the cases in question - that because the C standard allows an error return POSIX defers to that (except where it says otherwise) and I see nothing in the POSIX version of mktime() which says it is to be different in this area. If anything it is you who is proposing (or perhaps postulating, there's nothing arising to the level of a proposal yet) a radical change to the standard (as in what is published). | If you fail to convince the group to make | your proposed change, then by default the status quo will remain. That would actually not bother me very much. The status quo allows an error return, implementations are permitted to use different error codes when needed, so using EINVAL isn't necessarily wrong, and EOVERFLOW would certainly be acceptable (if not really a very good idea.) | In saying this you have demonstrated that you did in fact lose the | context. The context *was* an example I gave of an application that | calls localtime(), increments tm_mon, sets tm_isdst to -1 and calls | mktime(). Sure, I know that, but you're missing/avoiding the point. That is that mktime() cannot know that. All mktime() sees is what is in the struct tm passed to it (and the timezone, but that's a constant for this purpose). The exact same struct tm could have been produced in a case where localtime() returns the following month, the application decrements tm_mon, sets tm_isdst to -1, and calls mktime(). Or the exact same struct tm could have been produced by an application which calls strptime() to initialise the struct tm (even including tm_wday and tm_yday if you insist on that - though mktime() is not permitted to look at those) then sets tm_isdst to -1, and calls mktime(). You seem to be of the opinion that mktime()'s prime purpose is to allow people to increment time fields, and get a time_t back. Almost as if that is its only use. While I can see that as one of the use cases, I doubt it comes close to the number of uses of mktime() being used to generate a time_t from a calendar representation (in some format or other, RFC822 format (mail Date: headers), ISO format, many others) in all of which failing to produce the correct answer (and allowing a time which doesn't exist through without error) is simply wrong. Further down I will show (assuming I remember to include it, by the time I reach the end) an example of the kind of thing that can happen if code is written in the sloppy way that you seem to insist that application code writers write (apparently) large volumes of code - and which you seem to be planning on changing POSIX to explicitly allow (or require) to happen. | This is just another way of stating your preference for your | idealistic notion of correctness over the pragmatic solution that | almost all implementors have chosen. And that will show that the pragmatic solution is broken, at least in some cases. You might (when you see it) claim that no-one actually does things this way - but what exists that would suggest that it is any different in a material way than the examples that you claim that many applications are using? But that is for later, I mention it here just to reinforce the point that being correct is important, allowing "close enough" (but wrong) isn't really ever acceptable, anywhere. Or at least not without the application informing the implementation in some way that an approximation is all that matters. | Indeed it is, but mktime() does not have any equivalent requirement | and so both of the valid answers are allowed. Yes, and as I have previously said, the case where there are two valid answers (while I do not much like it) is one where I can accept the implementation simply picking one. The case I object to is the one where there is no valid answer. | Yes, it would mean doing two mktime() calls every time. And the fact | that nobody does it shows that nobody cares if they occasionally get | an answer that is one of two valid answers. I suspect it is more likely that nobody can even conceive of the possibility that the code might be permitted to return different answers (at a whim) to one
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Tue, 8 Nov 2022 18:15:20 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | We are going round in circles. You already asked that (probably in | different words) and I already answered it. Which implies that your answer did not convince me. Just saying "already answered" doesn't help if the answer wasn't sufficient. | You snipped too much and lost the context. The "it" here was | referring to "the wall clock time", i.e. tm_hour, tm_min and tm_sec No, I know all that, no context lost. What you might be missing is that tm_isdst is also such a field. The standard just refers to the components of the structure in (in the existing standard there are 9 of those) and then excludes 2 of them. The remaining 7 are all treated identically by the standard, there are no favourites. Any which are out of range are fixed. That certainly will include tm_isdst. The two relevant sections of the standard for this are: The original values of the tm_wday and tm_yday components of the structure shall be ignored, no issues with that part and the original values of the other components shall not be restricted to the ranges described in . And that is all the other 7 - including tm_isdst. And: Upon successful completion, the values of the tm_wday and tm_yday components of the structure shall be set appropriately, again fine, no issues there and the other components which still includes tm_isdst shall be set to represent the specified time since the Epoch, but with their values forced to the ranges indicated in the entry; (the remaining clause isn't relevant). That says that tm_isdst should be set to 0 if the "specified time since the epoch" represents standard time, and 1 for summer time, it cannot mean anything else for that field. | (which came from a localtime() call and therefore are in range | when passed to mktime().) Perhaps in some example that you're imagining that might be true, but mktime() cannot assume that. It has no idea how the struct tm was constructed, or what kinds of values it might contain. mktime() allows anything. | You can't read the EOVERFLOW description in isolation; it needs the | RETURN VALUE section for context, which says: | | The mktime() function shall return the specified time since the | Epoch encoded as a value of type time_t. If the time since the | Epoch cannot be represented, the function shall return the value | (time_t)â1 and set errno to indicate the error. | | So it is talking about the "time since the Epoch" not being representable | in a time_t. It does not apply to a broken-down time (struct tm) not | being able to be converted to a time since the Epoch. But for a time that cannot be represented, just like a NaN in floats, we would need a value to put in the time_t to indicate that no normal value exists - but time_t has no such value (someone could have defined (-MAXINT - 1) (where MAXINT should really be MAXTIME_T but I don't recall ever seeing one of those) as a time_t value representing "invalid", but that's never been done. Lacking that we have a situation where the appropriate value cannot be represented as any time_t value. To me that would entirely fit within the specification of EOVERFLOW (though it would be a pity were we forced to go that route). | In fact the phrase "the specified time since the Epoch" carries with | it the implication that the information passed to mktime() (in a | struct tm) always specifies a time since the Epoch, i.e it can always | be converted to an appropriate numeric value. I don't see that implication - but if it was there, the effect would be to outlaw all timezone variations (no seasonal changes permitted, no zone ever permitted to alter its offset) as that's the only way to guarantee that every "broken down" (ie: wallclock time in a struct tm) that can possibly exist (once normalised) represents a valid time. | The vast majority of implementations do not return -1 for times in the gap, | including the libc implementations on all of the most used POSIX/UNIX | systems. So you keep saying. Over and over again. I don't care. What I care about is what the standard, as written now, actually requires of an implementation. Until you can quote the words from the standard which support your position, then we're getting nowhere. Should it be decided to tear up the current mktime specification, and start all over again, then at that point it might be appropriate to look at what implementations do, and what applications want. We are not at that point yet. It is now quite clear that the C standard allows error returns (because of the input value being a time that cannot exist, or which would result in an ambiguous result, when there is no guidance as to which is wanted provided in the call) - and there is nothing I can see
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Tue, 8 Nov 2022 15:24:21 + From:Austin Group Bug Tracker Message-ID: <60fe2d9d8f9a9039da59e45877c42...@austingroupbugs.net> | Here's where we disagree. As you say, negative tm_isdst means DST | information is "not available"; however, there is nothing in the normative | text that says how mktime() must behave when it is told that DST | information is not available. The footnote is what does that, but it's | non-normative. That could perhaps be the reason that the C committee apparently agrees that it is acceptable for the implementation to return -1 in cases where it is necessary to be told whether summer time is to be treated as applying or not. There's no normative (or other) text in POSIX that says how mktime() must behave when it is told that summer time information is not available either. Hence, in line with the C determination, it should be possible for a POSIX compliant implementation to return -1 in these cases. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
[Forwarded on the request of kre.] Date:Mon, 7 Nov 2022 12:31:33 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | It should behave the same as for tm_isdst=0 or for tm_isdst=1, whichever | it deems the most appropriate. And if it decides that it is not its job to make that decision? What would be its basis for choice, and why? And where in the standard (C or POSIX) is there anything which actually says this is supposed to happen. All it says about mktime() with tm_isdst = -1 is that it attempts to work out whether summer time applies or not (except that it insists on calling it by the idiotic US label). | With tm_idsdt=-1 it will only change on the rare occasions when | the calculated time is in the gap. Nonsense. mktime() allows for out of range values for all of the (then existing) fields of except tm_wday and tm_yday, which it ignores. tm_isdst is such a field. If I set tm_isdst=1 TZ="UTC" set suitable (for this purpose, entirely in range, say specifying 1970-01-01 00:00:00) values in the other 6 fields that struct tm contains (which mktime() uses) then the result MUST contain tm_isdst = 0 as the "other fields" (not being tm_wday and tm_yday) are required to be adjuisted (forced) to be in range (and everyone knows, that means, to suite the time/date actually represented, not just the stated limits in - we don't allow Feb 31 to be returned, ever. The result from mktime() of that struct tm should be 0 (it is after all the Epoch) and should have tm_isdst == 0. You already stated as much, as you agreed that the results of localtime() on the time returned (assuming no errors) and the struct tm that mktime() requires must be the same (actually, to be correct, you stated, even demanded, that, and I agreed, but never mind) - and localtime() applied in a TZ="UTC" encironment, to the Epoch time, must return tm_isdst == 0. For anything other than ambiguous/impossible settings (ones which do not represent a single fixed time_t) the value in tm_isdst on entry to mktime() is irrelevant. It only does anything useful at all in the hard cases. The easiest of these is the "fold back" - when summer time ends. There is tm_isdst == 0 on entry, we select the time_t of the two which would produce the rest of the fields in the struct tm which also produces isdst=0; Similarly if tm_isdst == 1. If tm_isdst == -1 then we have no way to guess which of the two was intended. The C standard is apparently clear that an implementation can return an error in that case, and there is absolutely nothing in POSIX to contradict it (except the missing error code, which is all this bug report was intended to fix). In the struct tm 6 basic time/date fields represent a time "in the gap" then that's a time that simply doesn't exist. It isn't summer time, it isn't not summer time. It simply isn't. It is no different than asking whether "The fourth of Never" is summer time or not? A completely meaingless question. In this case however, to allow the use of struct tm as a way to perform time addition or subtraction, allowing tm_isdst to be used to inform the implementation what might have happened in that case is not unreasonable, and while there is nothing I can see, anywhere at all, in POSIX currently (don't know about the C standard) which says this should work (it is just kind of hinted at, very imprecisely, by the wording allowing input values in the incoming struct tm to be out of range) that specifies that this is intended to work, I have no problem allowing it | If tm_isdst is >=0 then a time_t can always be produced (unless it | overflows). A time_t can always be produced, regardless. The question is whether it is the correct one. A mktime() that was simply time_t mktime(struct tm *tm) { /* normalise the fields of struct tm, code omitted */ return (time_t)0; } is returning a time_t ... but not the right one. Returning the right one is surely the most important criterion here. Simply "return something because the test suite says you must" -- POSIX doesn't, as after all, all it says about EOVERFLOW is The mktime( ) function shall fail if: [EOVERFLOW] The result cannot be represented. Since a time_t does not have a value which allows it to represent the result "that time never existed nor ever will", it would be perfectly OK, according to the spec, for mktime() to return -1 with errno == EOVERFLOW in that case. There's really no question about that, and if you believe it is incorrect, please quote the language in the standard which contradicts it. The reason for the bug (#1614) is that implementations don't return EOVERFLOW in this case, they return EINVAL instead. The standard should reflect what (some) implementations actually do. | The application has no way of knowing whether the | specified time was in the gap. With tm_isdst=-1,
Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should
Date:Mon, 7 Nov 2022 12:31:33 + From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: I sent a long reply to Geoff (forgot to add a cc to the list) which I am hoping he will eventually forward here. One of the topic covered came from this: | > If you are suggesting that passing the return value of mktime() to | > localtime() could produce different struct tm member values than those | > returned by mktime(), then that can never happen, since mktime() is | > required to set them the same way localtime() does. In my not-yet-seen-on-the-list reply, I pointed out that there is nothing at all in the POSIX standard which says this (but I agreed that it should). I have since been sent a copy of the C standard for mktime (from what is claimed to be a very late draft of C99 - more or less identical to the final text, it is claimed). While a large section of the POSIX text is close enough to identical to the C standard, for its origins to be obvious (including one CX section which is word for word what the C standard says, though in a footnote, and thus which clearly should not be CX shaded, it is not any kind of extension or variation of the C standard), the POSIX text is completely missing this paragraph, which appears in the C standard: [#3] If the call is successful, a second call to the mktime function with the resulting struct tm value shall always leave it unchanged and return the same value as the first call. Furthermore, if the normalized time is exactly representable as a time_t value, then the normalized broken- down time and the broken-down time generated by converting the result of the mktime function by a call to localtime shall be identical. That should clearly be added. It has no real bearing upon our current discussions, as we were agreeing that it ought to be like this anyway, and still differing on other issues, but it is an obvious defect in POSIX that should be corrected. kre
Re: [1003.1(2008)/Issue 7 0000375]: Extend test/[...] conditionals: ==, <, >, -nt, -ot, -ef
Date:Mon, 31 Oct 2022 19:03:53 + From:"Stephane Chazelas via austin-group-l at The Open Group" Message-ID: <20221031190353.ar33l2s6dwkor...@chazelas.org> | [ is perfectly fine after we deprecate -a, -o binary operators | and "(", ")". Which was done ages ago.test (or its '[' synonym) is just fine now. Wrt the current issue, I support the new operators being added to '[' that Chet Ramey mentioned in his message or note about that (those listed in the Subject of the bug) - with the exception of == which is just a meaningless frill that adds nothing useful at all (and isn't supported in many test implementations, unlike the others proposed, which are). As best I can tell there is no intent to add anything about the [[ extension that some shells have, so discussing that on this list isn't really appropriate. kre ps: note that not adding == (or any other proposed test operators) doesn't mean that any implementations that support them need remove that support, just that applications cannot rely upon those things working everywhere.
Re: [Issue 8 drafts 0001611]: exit status from fg is either badly specified or is simply wrong
Date:Mon, 31 Oct 2022 16:39:06 + From:Austin Group Bug Tracker Message-ID: | This is already being fixed by bug | https://austingroupbugs.net/view.php?id=1254, OK, thanks, that is fine. If I trusted my ability to conduct a search in mantis, I might have even found it, but that never seems to work for me. I just thought it better to ensure this wasn't just forgotten (or never even considered). Sorry for the noise. Feel free to close 1611. kre
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Date:Wed, 19 Oct 2022 08:26:46 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | I can't see anything "a few lines earlier" that implies quotation-mark | needs to be escaped. Please give the exact wording change you would | like to see. I think Steffen is referring to: \" yields a (double-quote) character. the first bullet point in the (new) section 2.2.4, and that all he means to change would be to add to that sentence something like: , but note that the double-quote character is not required to be escaped to be included (just before the '.' that ends the existing sentence). kre
Re: [1003.1(2008)/Issue 7 0000767]: Add built-in "local"
Date:Mon, 08 Aug 2022 17:24:59 +0200 From:"Christoph Anton Mitterer via austin-group-l at The Open Group" Message-ID: <708410359c03bc0cfb89bfc29baaa9000b0d00b1.ca...@scientia.org> | Just wondered, whether it was ever considered to "simply" specify a new | keyword (e.g. "loc" or something more generic similar to bash's | declare),.. It isn't the keyword that is the problem, it is the desired behaviour, which depends upon the model for variables that the shell implements (or desires to implement). Until we can agree on what the objective is, there's no chance of unifying anything else. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001457]: Add readlink(1) utility
Date:Fri, 22 Jul 2022 07:58:47 -0500 From:"Eric Blake via austin-group-l at The Open Group" Message-ID: <20220722125847.tidcrt7a6ntvy...@redhat.com> | [If readlink is implemented as a shell builtin, then you could have an | extension where: | | readlink -v var -n -- "$name" If something like that were implemented, the -n would be a waste of space (there) the variable would always be assigned the value of the symlink, the -n is only to suppress the \n that is printed after that when writing it to stdout. The uses in cmdsubs you dissected are clearly not what -n is intended for (though I wonder if perhaps something similar in csh, if that ability is there - it has been so long since I looked at that - might have a different outcome). Aside from that possibility the only reason would seem to be the same as why echo (real ones) have -n (and trashy ones have \c) and why printf(1) needs a \n to print one ... there are times that it is useful to write a partial line to stdout (or wherever) and there's no reason that the output of readlink could not be intended to be a part of such a gradually constructed output line. kre
Re: [1003.1(2013)/Issue7+TC1 0001068]: Binding to a system-assigned port.
Date:Fri, 22 Jul 2022 09:20:55 +0800 From:"DannyNiu via austin-group-l at The Open Group" Message-ID: | Might I ask how did we resolve this? Just for the sake of record. | Or the next minute will contain these info? It probably will, but the messag you're replying to contained a URL to the accepted text ... but as the URLs in the message you included in this reply are mangled beyond recognition, I can only assume that some protection from dangerous spam/phishing messages in your e-mail system is stopping you getting them. The URL was, with the https colon slash slash stuff stripped off, so that should not be a problem (except you will need to add that back): austingroupbugs.net/view.php?id=1068#c5902 But it is more or less (standards wording applied) exactly what you requested be done, bind to port 0, and the system picks a port for you (which is what systems actually do). kre
Re: [Issue 8 drafts 0001592]: Add %n$ support to the printf utility
Once again, mantis bit... what's in the e-mail is only a half complete, and fully unedited, version of what now appears in the note. Did I ever say what I think of mantis? kre
Re: [1003.1(2016/18)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files
Date:Wed, 13 Jul 2022 15:34:39 + Re: https://austingroupbugs.net/view.php?id=1273#c5885 | Does anyone know if any implementation has made changes to glob() in the | last three years? The last change to NetBSD's glob() was late May 2019, so not here. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001538]: what -s is poorly described, uses the word "quit"
Date:Tue, 21 Jun 2022 09:16:15 + From:Austin Group Bug Tracker Message-ID: <5c79b6e05af68bfbeaebf987e9c80...@austingroupbugs.net> | -- | (0005857) geoffclare (manager) - 2022-06-21 09:16 | https://austingroupbugs.net/view.php?id=1538#c5857 | -- | Suggested new resolution (note that bug | https://austingroupbugs.net/view.php?id=1563 already fixed STDOUT, so | this should just be about -s) ... Looks OK to me. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001538]: what -s is poorly described, uses the word "quit"
Date:Mon, 20 Jun 2022 15:02:55 + From:"Austin Group Bug Tracker via austin-group-l at The Open Group" Message-ID: WRT: | A NOTE has been added to this issue. | == | https://austingroupbugs.net/view.php?id=1538 | == | (0005854) kre (reporter) - 2022-06-20 15:02 | https://austingroupbugs.net/view.php?id=1538#c5854 | -- | Re https://austingroupbugs.net/view.php?id=1538#c5821 | | Apologies for the delay of this response. Also apologies for that - please ignore the e-mail, mantis decided to steal my note when I was half way through entering it. I updated the note (but mantis doesn't have the good manners to forward edited notes ... I can see delaying several (tens of perhaps) minutes after an edit, in case there's another correction immediately after, but it would be really nice to see updates to the text on the list, so we don't have to go fight with mantis quite so much. Would that be possible? Anyway, for this one, if you care (for most readers of this, I doubt that's true, "what" is a relatively insignificant command) you'll need to look at what's in mantis - nothing of substance made it into the part of the note that was e-mailed. kre
Re: POSIX gettext(): lifetime of returned values
Date:Wed, 25 May 2022 02:57:52 +0200 From:"Bruno Haible via austin-group-l at The Open Group" Message-ID: <5462894.CAdn2TfLgq@omega> | IMO, it's useful to distinguish bounded and unbounded memory leaks: | - A _bounded_ memory leak is one where the amount of leaked memory is | bounded by an a-priori computable constant. | - An _unbounded_ memory leak is one where such a bound does not exist. Personally I would first determine whether there is a memory leak at all. For this I like to imagine that we are using a garbage collecting memory allocator ( no equivalent of free() ) and ask whether such a system would reclaim any memory that has not been subject to free() whike using C memory management. Alternately, can the memory be reached by following pointers from some visible starting point (whether in the app, or some library does not matter). If so, it is not leaked, even if never free()'d. If those tests do show a leak, then the above tests can help determine if it matters or not. But from your description, I'd assume (guess perhaps) that there is no leak at all in what you have described, in which case that classification scheme is irrelevant. I would also guess that a side effect of the way it was described is that changes to the on disc backing store (the .mo file, or whatever) will not be detected while the application remains running, and that aside from execing itself to restart clean there is no way for an application designed to run forever to ever see updated data. If that's not the case, then given the guarantees you seem to be making about the lifetime of returned pointers, it looks like a memory leak would be unavoidable. Consider one thread which does gettext() after which you have no idea when it might use that pointer again, while another keeps changing, and then causing to be loaded, the altered data. Forever. kre
Re: When can shells remove "known" process IDs from the list?
Chet and I can continue thus conversation off list, what is being discussed now has nothing at all to do with anything related to posix. kre
Re: When can shells remove "known" process IDs from the list?
Date:Sat, 14 May 2022 03:56:32 +0700 From:"Robert Elz via austin-group-l at The Open Group" Message-ID: <2459.1652475...@jinx.noi.kre.to> | | Show your work. | I no longer remember the exact command I used (cannot even locate the | message you're quoting from), I finally did ... This is what I see: bash5 $ echo $BASH_VERSION 5.1.16(1)-release bash5 $ jobs bash5 $ set +m bash5 $ sleep 20 | sleep 20 & sleep 30 | sleep 30 & jobs -l; ps jT [1] 1868 [2] 1847 [1]- 29632 Running sleep 20 1868 | sleep 20 & [2]+ 2715 Running sleep 30 1847 | sleep 30 & USER PID PPID PGID SESS JOBC STAT TTY TIME COMMAND kre355 1847 5699 d0d6d70 S+ pts/26 0:00.00 sleep 30 kre410 29632 5699 d0d6d70 S+ pts/26 0:00.00 sleep 20 kre 1687 1868 5699 d0d6d70 S+ pts/26 0:00.00 sleep 20 kre 1847 5699 5699 d0d6d70 S+ pts/26 0:00.00 -bash kre 1868 5699 5699 d0d6d70 S+ pts/26 0:00.00 -bash kre 2715 5699 5699 d0d6d70 S+ pts/26 0:00.00 -bash kre 4319 2715 5699 d0d6d70 R+ pts/26 0:00.00 sleep 30 (bash) kre 5333 5699 5699 d0d6d70 O+ pts/26 0:00.00 ps -jT kre 5699 3620 5699 d0d6d70 Ss+ pts/26 0:00.03 -bash kre 29632 5699 5699 d0d6d70 S+ pts/26 0:00.00 -bash bash5 $ echo $$ 5699 bash5 $ Note that pids 29632 and 1868 (which jobs claims are "sleep") are actually bash, the sleep processes are 410 and 1687. Similarly for job 2. Everything is in process group 5699 (the interactive shell's pid). When one kills %1 processes 29632 and 1868 get killed, processes 410 and 1687 do not. You can decide whether the extra interposed bash processes are intentional or not, as I said in the previous message, that is not wrong. The inability to signal the (unknown) grandchildren is expected (the same kind of thing would happen if the command were "make" and there's a whole tree of make, compiler, linker, ... processes running - this is unavoidable). kre
Re: When can shells remove "known" process IDs from the list?
Date:Fri, 13 May 2022 11:22:20 -0400 From:Chet Ramey Message-ID: | Show your work. | | I tested this on macOS 12 and RHEL 7, using interactive shells with job | control enabled, That is likely the difference. The question was about what happens when job control is not enabled. When job control is enabled, the kill kills that job's process group, and all of it gets signalled. Without job control, that's not possible, the shell can only kill its known children, their children (absent relaying of the signal down the tree) never see it. I no longer remember the exact command I used (cannot even locate the message you're quoting from), which caused bash to fork a sub-shell, in which to run the pipeline, rather than running it directly from the parent - but that's not really the point, doing that was not wrong, whatever provoked it, it simply meant that the parent shell did not know the actual processes running in the pipe, so could not do anything to them. kre
Re: When can shells remove "known" process IDs from the list?
Date:Fri, 13 May 2022 10:20:49 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220513092049.GB17043@localhost> | [Robert Cc'ed this to austin-grou...@netbsd.org which presumably bounced. | I'm taking that as indication that he intended it to go to this list, | and am quoting it in full.] Oops. And yes, I did, and thanks. Didn't even notice that this one hadn't appeared on the list (I ignore bounce messages). | However, what the standard requires here does not match existing | practice in some shells and so the standard should change. OK, let's just agree on that, whatever our opinions of what it currently says. | It's not clear at all, and I would say the opposite is implied. | The definition of "Job" is: | | A set of processes, comprising a shell pipeline, and any processes | descended from it, that are all in the same process group. | | Notice it says "that are all in the same process group". Yes, I did. | In the case of a background command started with job control disabled, | the processes all have the same process group Exactly. That meets the definition, doesn't it? | as the parent shell. Not relevant. | By a strict reading, this counts as a job, but I don't think that | was intended. Intended or not, that's what the standard says. It also largely matches what is implemented. | In any case we already know that the current definition of "job" is | very wrong, so using it to support either position is futile. "very wrong" I think is too much - it is very close to the implementations. But given the last clause, we probably need to wait upon proposed new definitions, and specs for the relevant usages, to see if those are a closer fit to reality. kre
Re: wait and stopped processes (was: When can shells remove "known" process IDs from the list?)
Date:Wed, 11 May 2022 09:58:38 -0400 From:"Chet Ramey via austin-group-l at The Open Group" Message-ID: <4d0598b4-efb3-d5c2-1267-b8a807399...@case.edu> | > It is already what the standard requires, and with good reason. | | Sure. It simply isn't what many (most) shells do. You're right about that, given this test (in an interactive shell, with set -m) date; sleep 30 & X=$! ; ( sleep 5; kill -STOP $X) & echo sleep=$X kill=$!; wait $X; jobs -l; date (which I entered on one line, but wrapped here for e-mail convenience) All shells but FreeBSD and zsh (--emulate sh) finished in 5 seconds, leaving a stopped sleep job running. (We can ignore The NetBSD sh for this, it is definitely broken - what happens depends upon that "sleep 5", as the wait behaves differently if the waited upon process is already stopped, vs if it stops while waiting). The FreeBSD and zsh shells didn't terminate that command until a SIGCONT was directed at the sleep process (rather more than 30 seconds after all of this started). | Maybe. And yet I can't recall ever receiving a bug about this. That is most likely because users generally don't wait in interactive shells, and in non-interactive shells, 99.9% of the time if a job stops, is parent shell stops along with it - when they are resumed, they both resume, and simply continue from where they left off. The circumstances to provoke a problem need to be contrived. kre
Re: When can shells remove "known" process IDs from the list?
Date:Wed, 11 May 2022 09:17:15 -0400 From:"Chet Ramey via austin-group-l at The Open Group" Message-ID: <573bc015-dd85-f86e-b89d-33a0bcc4b...@case.edu> Again, apologies, still very little time for any of this. | For neither the first nor the last time. Including now. | > I think they should remain independent. | Sure, I agree. I don't. I cannot think of a single reason why the shell should be forced to maintain two separate lists of its child processes. The jobs table needs to have them, so processes in the job can be identified as they finish. Duplicating that in another table, for no particular reason I can imagine makes no sense to me. Still, if others want to implement it that way, I don't object - but the standard has never required that, and should not, absent some very good reason, be changed to require it now. In a later message Chet said: | > The normative text relating to creation of job numbers/IDs is all | > conditional on job control being enabled. | Where is that? It's not in the definition of Job ID, it's not in 2.9.3 | Asynchronous Lists, it's not in the `jobs' description, it's not part of the | definition of Background Job or Foreground Job, it's not in any of fg/bg/kill/ | wait. I feel like I'm missing something obvious here. Again, I disagree. You're missing nothing. There has not been anything like Geoff is postulating - there might be in his unpublished new draft text, but there is no reason I can imagine that such a change should be adopted. kre
Re: When can shells remove "known" process IDs from the list?
Date:Fri, 29 Apr 2022 20:11:55 +0100 From:"Harald van Dijk via austin-group-l at The Open Group" Message-ID: | >| It also appears that dash still implements remove-before-prompting. | | busybox ash and my shell do as well, but both are derived from dash and | have merely retained dash's behaviour. All ash derived shells work that way. | > Does anyone not? | | bash does not. bosh does not. ksh does not. mksh does not. posh does | not. yash does not. zsh does not. I did a test (not the same one you did) after I sent the mail, and saw that bosh and yash don't. For the other shells, it is not nearly as clearcut what is happening. | You can test this by doing | |true & | |wait $!; echo $? | | This should print 0. Then do the same, except with the first command | changed to false &. That should print 1. Yes, in the shells you mention it does, indicating that something different is happening. It is interesting that in bash you can do that wait over and over again, and it keeps returning the 0 status (until one does a plain "wait" command, even the "jobs" command doesn't remove it, though the standard requires that it do so). bash is the only shell that acts like that, whether it is intentional or not I have no idea. But try a different test true & X=$! (the assignment to X is just in case there is a shell which implements that "no need to retain" stuff when $! is not referenced). Then repeat that line over and over. (Consecutive lines). In ash derived shells (and pdksh) the first will report job 1 starting (assuming you had none already running), the 2nd line will report job 2 starting, and before prompting for the 3rd, report job 1 has finished. The third will be job 1 again, and report job 2 has finished, and that continues over and over again. This is all consistent with how we know that they work. In bosh and yash, the job number just keeps on climbing, even though they report the previous job finished as each subsequent one is started. That's also consistent with how they operate. A simple "wait N" for one of the jobs removes that one from the list, then more true& commands add more jobs. A simple "wait" clears up everything. In yash "jobs" reports them all finished and clears everything, as it should. In bosh "jobs" reports them all finished, but clears nothing (the jobs command can be repeated over and over and keeps reporting all the completed jobs). That's clearly broken. zsh does something different, once a job has been reported as finished at a prompt, it is removed from the jobs table, and you can no longer do "wait %3" for it, but the pid and status seem to be remembered somewhere else, and wait gets the status from the job. That seems odd to me, it should be possible to use either form to wait on a job. (I should note that there is something odd about my zsh install - I tend to need to type two newlines after a command to get it executed, both are seen by the shell. Most of the time that's just mildly annoying, when I forget the 2nd, nothing happens, and I have to wake up and remember that zsh is waiting for the 2nd before it will do anything with the command - but in testing like this, where the newlines generate prompts, and the accompanying the prompt is an action we care about, it kind of ruins the test.) ksh93 is similar (without the double newline issue). mksh is almost similar, but in it I saw internal error: j_async: bad nzombie (161) twice (once, then more testing, then again), which does not look good. I don't know what the 161 represents, it was not the same each time, but is not a pid of any of the jobs started. A count? In that one, with this sequence, there are only ever 2 jobs (as in job numbers) assigned, as each is started, the previous one is reported finished, and removed from the jobs table. It is possible to wait %n for the job number most recently started, but only that one (were the commands to run for longer, then presumably it would be possible to wait on any not completed and reported as completed). bash is different again, it counts up the job numbers, like bosh and yash, but as it reports each earlier one finished, removes it from the jobs table, so the "jobs" command only ever shows (and then removes) the last one started. It still allows wait N to return the status, as many times as you want to do that command, but not wait %n for any but the most recently created one. | I consider the dash behaviour a bug, but do not want to | fix it in a way that introduces another bug. While removing jobs that have been reported (ie: removing them as soon as possible) might reduce the risk of getting duplicate pids, it doesn't actually solve the problem. In particular, the removal only happens in interactive shells (ones which prompt) so does nothing at all for scripts, which have the same issue. It can also happen in an interactive
Re: When can shells remove "known" process IDs from the list?
Date:Fri, 29 Apr 2022 15:39:23 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220429143923.GA22521@localhost> Sorry, been too busy to participate here much recently, will catch up someday soon (I hope). | However, today it threw a last curve ball when I was working on an | update to the description of set -b ... How many shells actually implement that? | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs | remain known until: | | 1. The command terminates and the application waits for the process ID. | | 2. Another asynchronous list is invoked before "$!" (corresponding to | the previous asynchronous list) is expanded in the current execution | environment. Does anyone implement that bit (#2) at all? In a non-interactive shell it might almost be possible, but in an interactive shell, if the job isn't in the list (whether $! has been referenced or not - usually it will not have been) because it has been removed, what is the shell supposed to do if the job stops? Further users (even in scripts) are allowed to use % %- %1 etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should work). I'd suggest that #2 should simply be removed. But do note that the definition of the jobs command says: When jobs reports the termination status of a job, the shell shall remove its process ID from the list of those ``known in the current shell execution environment''; see Section 2.9.3.1 (on page 2338). (quote from I8 Draft 2.1 -- but that text has been there forever, or seemingly). So that's another way that an entry is removed, and this one is "shall remove" whereas "remain known until" puts a minimum on how long the job is supposed to remain known, but doesn't actually require removal. For #2 that's obvious, shells aren't required to make that optimisation (that's some academic view of what was thought should be possible - but isn't in practice), but for #1 if the job isn't removed (when wait happens) then it could still be there, again, and again, forever - even if the system uses the same pid later (days, weeks, months later perhaps) for another job started by the same shell -- against which there is no protection of any kind currently, though a shell could do WNOWAIT waits so zombies remain in the process table, even though the shell has already collected the exit status - but that's difficult to actually code correctly, especially given the definition of how SIGCHLD works, which as best I can tell has to be used as the only thing that would make it even conceivable to use WNOWAIT. Without that, when the shell acts like I believe most, or all do, and cleans up zombies ASAP, just keeping the job in its jobs table, marked terminated, with the status ready to give back when requested, the kernel is free to assign the reclaimed pid to any new process it likes, whenever it likes. | My initial reaction to this was that the above quote from set -b is | likely a left-over from before the decision to disallow the historical | remove-before-prompting behaviour was made. I doubt that -b is particularly relevant to this, other than that it provides an alternate time at which termination status of a process can be shown. | However, then I spotted that the text from wait, which seems to be an | attempt to justify that decision, first says it was historical | behaviour for *interactive* shells but then talks about the problems | it could cause for *scripts*. So it seems to me that the | justification does not stand up to scrutiny. The justification doesn't, but for scripts I don't recall there ever really being an issue - the removal happens when the status of jobs which have changed status is reported just before PS1 is written, and non-interactive shells (scripts) don't do that. On the other hand, users of interactive shells are not in the habit of issuing wait commands (even jobs commands, without some reason do do so). They expect to be told when a background job has finished (without -b both working, and set, that might require causing new prompts to appear from time to time) and simply expect that when a job has been reported as done, it is done, and no longer exists. | It also appears that dash still implements remove-before-prompting. Does anyone not? | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to | add a third list item (for interactive shells only) and deleting the | above quoted text from the wait page. This is necessary, we would be making use of the shell too difficult for interactive users otherwise. But there is no particular need for an "interactive only" here, scripts can (though usually don't) use the jobs command as well (it is a convenient way to get rid of any jobs from the table that have finished, without knowing what they are, and without potentially hanging waiting for something
Re: [Issue 8 drafts 0001564]: clariy on what (character/byte) strings pattern matching notation should work
Date:Thu, 14 Apr 2022 09:42:37 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220414084237.GA15370@localhost> | That is how things are at present. The suggested changes just make it | explicit. Yes, I know, but that's what I am suggesting that we not do in this one case. | Do you have an alternative proposal? Only to the extent of "do nothing". I am certainly not suggesting that we attempt to solve the problem. Except perhaps it might be worth adding something to the Rationale (but about what, ie: where there, I have no idea) along the lines of: It is often unclear whether a string is to be interpreted as characters in some locale, or as an arbitrary byte string. While it would have been possible to arbitrarily make the various cases more explicit, or explicitly unspecifried, it was considered better, in this version of to make no changes, as it is believed that much additional work is required to enable a standards-worthy specification possible. This work is beyond the scope of this standard. The problem I see, is that any specification at all of any of this, allows implementors to just say "that is what posix requires" and do nothing at all, where we really need some innovation, by someone who actually understands the issues and how to deal with them in a rational way - or at least who can come up with some kind of plan, and without any possibility of being considered a non-conformant implementation because of it. | The application can document that it requires pathnames to be in the | same encoding as the user's locale. That's not sufficient.Try encoding a find command to look for pathnames containing currency symbols. It should be just a simple find -name '*[ABCD]*' type operation, with appropriate substitutions for the ABCD chars. No problem if not all the world's currency symbols are encoded, if we find one that has been forgotten, it can simply be added. Currency symbols are things like the $ sign, British pound, Euro, Yen, Baht, ... (there are a whole bunch of them). If there were a [:currency:] class, it would be easy (and I'd need to come up with a different example). But there isn't. If we cannot do something this simple, and expect it to work reliably, everywhere, then what we have is useless, and needs to be replaced or reworked. That's not a standards' body type task. But we should be doing nothing to interfere with the production of a solution. | The C locale is specified as containing 256 single-byte characters. | Thus in the C locale all pathnames are valid character strings. Sure, understood. | > Even worse perhaps, ???.doc which should match 7 char | > names that end in ".doc" (or is that 7 byte names?) (not counting the \0). | | It would match 7-byte names. Yes, in the C locale it would. But do you believe that is what the user would have intended? Are they to be required to work out how many bytes their local filenames are encoded as, and enter the appropriate number of '?' chars? kre
Re: [Issue 8 drafts 0001564]: clariy on what (character/byte) strings pattern matching notation should work
Date:Tue, 12 Apr 2022 08:51:51 + From:"Austin Group Bug Tracker via austin-group-l at The Open Group" Message-ID: <1541e949d4c9cd28467acf6033bfd...@austingroupbugs.net> That is, Geoff Clare: | 1. The vast majority of apps will never need to do that because they know | (or can assume) that the pathnames they handle either always use the | portable filename character set or use the user's locale. The latter, perhaps, the former, certainly not in an international context. The point was that, at least as I read the proposed text, you're defining things like '*' to only work (reliably as specified) when the locale is POSIX (aka C). In the user's locale, who knows what happens? | I.e. the pathnames are not abitrary (a word I was careful to | include in the proposed changes). Sure, the problem is that when dealing with user input (as in, for example, the command line args) the application cannot assume that the pathnames are not aribtrary. They're anything that's OK for the user. | 2. In apps that truly do need to do matching or expansion on arbitrary | pathnames, a C program can call uselocale() before and after calls to | fnmatch(), glob(), and wordexp(). A shell script can set LC_ALL=C before | handling pathnames (and unset it or restore it afterwards). But how does that help *.doc (in a defined way, as opposed to "of course that works in all glob implementations") match a filename that isn't entirely ascii (by which I mean, using characters only from the portable character set)?Even worse perhaps, ???.doc which should match 7 char names that end in ".doc" (or is that 7 byte names?) (not counting the \0). Anyone from outside the English speaking world is likely to encounter many of those. kre
Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed
Date:Thu, 07 Apr 2022 18:15:55 +0700 From:"Robert Elz via austin-group-l at The Open Group" Message-ID: <5473.1649330...@jinx.noi.kre.to> | | e.g. adding: | | | | For example, the context address "\.[.][0-9]." is equivalent | | to "/\.[0-9]/". | | Looks good to me. Actually, to make things even clearer, you might want to add to that: , however with "\.\.[0-9]." it is unspecified which of "/\.[0-9]/" or "/.[0-9]/" is its equivalent form. kre
Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed
Date:Thu, 7 Apr 2022 10:37:06 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220407093706.GA7005@localhost> | The new definition in bug 1546 is specific to regular expressions | (since it talks about the backslash not being in a bracket expression), Yes, of course. | e.g. adding: | | For example, the context address "\.[.][0-9]." is equivalent | to "/\.[0-9]/". Looks good to me. kre
Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed
Date:Tue, 5 Apr 2022 15:54:40 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220405145440.GB6489@localhost> | Okay, I'll see what I can do. It may make sense to use the new | definition of "escape sequence" from bug 1546. | It won't be possible in the y command, as that doesn't use an RE (so | would need its own definition of "escape character"). I wasn't paying attention to just where any of this was to be placed in the final doc, but couldn't the definition of "escape sequence" (and those related to it) be somewhere generic? It might even be worth (since it is so common) defining a "backslash escape sequence" in XBD - but for that allow it to have an application specified following sequence (one or more following characters, as defined by the application), and then for REs just define only the case for a single following char. Or just define "escape sequence" and leave it for the application also to define what the escape character is (are there many that don't use \ though?) | What matters is that the delimiter can only be escaped with an | _unescaped_ backslash, and that it doesn't end the RE when it is in a | bracket expression. I believe my proposal makes both of those things | clear. I suspect that the point was more related to when 2-pass parsing is used, and an escaped delimiter is seen, does the second pass still see the escaped delimiter, or is it now unescaped. I'm no sed expert (I use it a lot, but have never really looked into an implementation, and don't push the wacky boundary cases in my uses) - but I believe this is to be explicitly unspecified (that is, implementations can do either, and applications must not depend upon which is done). | It really is hardly any limitation on applications if they need to | avoid using special RE characters as delimiters in order to be portable I agree. Not just portable, but sane. Only a moron would actually use . ? * [ ( ... as a delimiter, there are plenty of perfectly good alternatives available when good old / isn't the best choice (which it often isn't when manipulating path names). Personally I'm quite fond of ascii BEL (^G) as the delimiter in the cases when neither / nor ; (my 2 favourites) are really available (while BEL probably isn't technically portable, it always works in my experience). It still needs to be clear that it is possible to be a moron if one wants, but in such cases, some things just might not be possible. | It might be worth altering this somehow, but "literal" is wrong | (specifically if the delimiter is '^' or '-', or things like ':' in | [[:alpha:]]). That depends upon the context of the word "literal" there - I just took it to mean that the character would mean the same thing as it would if it were not also the delimiter, not that it would be deprived of any other magic properties it might gain by such use. | > => And perhaps something like "should put it inside a bracket | > expression __with not other characters__" to make clear, that one | > cannot re-use one e.g. 'sX\X[0-9]XfooX' can NOT be written as | > 'sX[X0-9]XfooX' but only as 'sX[X][0-9]XfooX'. | | Incorrect, sX[X0-9]XfooX is required to "work" I think the point there was that it doesn't mean the same thing, in that one a single char is being substituted, in the others, it is a 2 char sequence, the delimiter, followed by a digit, not either the delimiter or a digit. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001546]: BREs: reserve \? \+ and \|
Date:Tue, 5 Apr 2022 09:41:26 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220405084126.GA6489@localhost> | > > ��� An escape sequence is defined as the escape character followed | > > ��� by any single character.� The escape character is a | > > ��� that is neither in a bracket expression nor itself escaped. | Okay, I'll propose that wording in Thursday's teleconference. Actually, if this (or something to the same intent) doesn't already exist, then it might be worth adding a third sentence: A character is considered "escaped" if it appears as the second character in an escape sequence. I was first going to suggest that you switch from "nor itself escaped" to the way I originally worded it ("nor the 2nd char of an escape seq") but I realised it would be better to explicitly define "escaped" instead, so that can be used elsewhere, and be properly defined (not just rely upon being what is obvious). Whether this new sentence goes 2nd (between the existing two) or 3rd (after them) I don't think matters -- but a slight preference for 2nd, in which case it could also just be an additional clause on the first sentence ", that character is escaped." or something like that, perhaps "which is thereby escaped". kre
Re: [1003.1(2016/18)/Issue7+TC2 0001546]: BREs: reserve \? \+ and \|
Date:Mon, 4 Apr 2022 15:24:25 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220404142425.GA23024@localhost> | I don't see a need for an xref to XBD 6.1, That's fine too, I just suggested that as a replacement, just in case... | A minimal fix to the current proposed text would be something like this: [...] | or it could be split into two sentences along the lines of your original | suggestion: Either would work. I (kind of obviously) slightly prefer the 2nd, I think it is slightly clearer (easier to follow), but the version that's closer to what is currently there would also work. kre
Re: [1003.1(2016/18)/Issue7+TC2 0001546]: BREs: reserve \? \+ and \|
Date:Mon, 4 Apr 2022 08:46:56 + From:Austin Group Bug Tracker Message-ID: That is, really from Geoff Clare: | Personally I don't see that there is a problem with the current wording. It is almost OK, and if you consider the readers must be able to interpret the words in a rational, obvious, way, would be. The problem is that an escape character cannot be escaped, if it is, it isn't an escape character (so there is a contradiction). the escape character ('\\'), when neither [...] nor itself escaped, There are plenty of ways to rewrite this to make the point that it is an unescaped backslash (rather than an unescaped escape char) which becomes the escape char, my suggestion was just one possibility. The same issue applies to being within a bracket expression, an escape char cannot be there, so it makes no real sense to exclude it - though it does to say that a backslash that is in there is not an escape char. kre ps: I'm also not greatly in favour of writing the backslash character as a C character constant, rather than just as a character (as in a sh quoted string for example) as '\'. Since there will always people who will object to either of those, I wouldn't give the character's glyph form at all, but rather refer to XBD 6.1 where it is presented without the quotes, and so there's no problem. So " ([xref XBD 6.1])".