from:"Robert Elz"

Re: Request RFC3339 format option for date utility

2024-05-14 Thread Robert Elz via austin-group-l at The Open Group

Date:Sun, 12 May 2024 18:59:51 -0500
From:"Andrew Pennebaker via austin-group-l at The Open Group" 

Message-ID:  

I do not send mail to @gmail addresses, so I will reply just to the list.

  | I would like the standard POSIX date utility to receive an option to format
  | timestamps with modern RFC3339 format.

This is not the appropriate forum to make that happen - you'd need
to get the various implementations to agree on a new option, which
once common, could then be proposed to be added to the standard.

I don't really see the need for that 3339 format is trivial to
produce already...

jacaranda$ date -u +%Y-%m-%dT%H:%M:%SZ
2024-05-13T07:20:06Z

(That isn't GNU date, but I'd be a little surprised if it couldn't
do the same).

  | The GNU date utility seems to do this poorly, using the overly elaborate
  | pattern "...+00:00" instead of "...Z" for UTC timezone.

That's allowed by 3339, if you don't like it, or want an option to
change just that, you should take that up with the maintainers of
GNU date, not here.

kre

Re: [Issue 8 drafts 0001798]: Must posix_getdents remember file offsets across exec?

2024-03-22 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 22 Mar 2024 09:48:37 +
From:"Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
Message-ID:  

  | A NOTE has been added to this issue.

This comment doesn't need to be an attached note I don't think...

  | If we reword in terms of directory entries, I think no explicit statement
  | about renaming will be needed.

Agreed, that was what I meant when I made the comment about rename wrt
posix_getdents() - and I agree "directory entry" is better than the
"file name" I suggested.

kre

Re: Austin Group WEBEX +1-408-792-6300 PIN 668 216 233

2024-03-12 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 12 Mar 2024 14:56:27 +
From:Jonathan Wakely 
Message-ID:  

Thanks:

  | The emails have Content-type: multipart/mixed and the text/html part
  | includes the meeting time:

Well, technically, the multipart/mixed is the body, and the calendar info.

The body is multipart/alternative and has text/plain and text/html.

As you may have surmised, I only ever read text/plain if that is present
(in a multipart/alternative the content of the parts is supposed to be
the same, with only the presentation varying - though this is not even
close to the worst breach of that rule I have seen).

  | Strangely, that's missing from the text/plain part. Maybe because it's
  | in a  that can't easily be converted to plain text, so it's
  | just omitted by whatever software generates the email.

The info could be added to both, in the free text part, then that the
table at the head of the html version got omitted wouldn't be an issue.

kre

ps: if nothing changes, I'll keep reading the text/plain and using the
calendar attachment for the date & time.   And I still think the data/time
should be given in UTC rather than America/New_York which requires anyone
not in North America to know their summer time rules to translate - if
given in UTC one only needs to know ones own (perhaps varying) zone offset.

Re: Austin Group WEBEX +1-408-792-6300 PIN 668 216 233

2024-03-12 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 12 Mar 2024 08:16:51 -0400 (EDT)
From:"Single UNIX Specification via austin-group-l at The Open 
Group" 
Message-ID:  <202403120816.c9d2d0b3357afd28622ef410caf1f...@opengroup.org>

Something I've been meaning to ask about for ages (not
exactly exciting, or I hope controversial).

Twice a week (or more sometimes) messages like this are sent out:

  | Topic: Austin Group teleconference
  | ---
  | Audio conference information
  | ---
  |
  | You are invited to a WEBEX meeting. <<<
  |
  | Andrew Josey's Personal Room
  | https://opengroupevents.webex.com/meet/a.josey�| 668216233
  |
  | Join by video system

followed by a whole bunch of details about how to join, etc.

But nowhere in the body of the message does it say when the meeting
to which everyone is invited is to be held.   That seems kind of
lacking in an invitation.

It is in the attached calendar info, if one either adds that to a
calendar, or just reads it, but wouldn't it be nicer if it said
something like:

You are invited to a WEBEX meeting on 18-Mar-2024 at 11:00 America/New_York

(I cut/pasted the actual date & time, for this particular invitation, from
the calendar info.)

Or even better if it gave the time in UTC, so "at 15:00 UTC" - or whatever
it is this week.   Note that the calls are anchored to US Eastern time (about
which I have always wondered, most of the regular participants seem to be
outside the US, but never mind) is irrelevant for this - a particular meeting
(which is what this is about) is always at some specific UTC time, regardless
of why that particular time was chosen.

Whether the info goes on that line, or somewhere else, isn't important,
just that the date & time of the invitation gets included in the message
body, somewhere.

Could that be made to happen?

kre

Re: sh 'continue' shenanigans: negating

2024-02-14 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 14 Feb 2024 20:15:59 -0800 (PST)
From:"Roger Marquis via austin-group-l at The Open Group" 

Message-ID:  <6sn184nr-6299-838p-qpro-03qs07401...@mx.roble.com>

  | Never seen a script use "!" in this way.  Is it undocumented?

No.   That particular usage is bizarre however, and it is no surprise
you've never seen it, I doubt anyone has in a real script.

  | Another question about this code is whether the return value would be
  | from "! continue" or "done".

"done" is a reserved word, not any kind of command, it has no exit status
(saying "return value" only makes any sense at all in functions where
"return" works, and it doesn't even make much sense there).

The exit status of a for (or while or until) loop (which is what the "done"
is the end of) is defined to be the exit status of the last command executed
in the body of the loop (the part between "do" and "done") (or 0 if no
commands were ever executed in the body).

In these examples ! continue (or ! break in the more recent one) is
the last command executed in the body, as it was the only command there
(so the only one which could possibly be executed).   As long as the loop
body is executed at least once (which it must be when it is "for x in y ...")
then the exit status of the for command is the exit status of that ! continue
(or ! break).   And as the ! inverts the (logical) status of the following
pipeline, and both break and continue (unless they fail for some usage error)
always have an exit status of 0, the exit status of ! continue (or ! break)
must be 1.

kre

Re: sh 'continue' shenanigans: negating

2024-02-14 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 15 Feb 2024 00:40:24 +0100
From:"Christoph Anton Mitterer via austin-group-l at The Open 
Group" 
Message-ID:  <9e56d4028f077e0d5dcc2ec2448de62b400a69a3.ca...@scientia.org>

  | If so, then IMO strictly speaking, it doesn't say whose $? shall be set
  | that way.

That makes no sense, there is just one '?' special parameter ($? is just
the syntax by which it is accessed, not a thing itself).

I suspect you're confusing exit status and the ? special param - they're
not the same thing.  Every utility, and the shell compound commands, have
an exit status.   What actually appears in ? is specified, somewhere,
but it certainly is not every exit status of every command run (not even
in the shell environment in which they're invoked).

Return (and exit) are kind of special in how they're defined to set '?'.

kre

Re: [1003.1(2008)/Issue 7 0001219]: snprintf reequirement to fail when n > INT_MAX conflicts with C

2024-01-17 Thread Robert Elz via austin-group-l at The Open Group

Actually, apologies - forget my previous reply - the change to the
fwprintf() page (for swprintf()) did happen as the resolution of that
bug specified.

No idea how I looked at that (I had the page still open when I went
back to it just now) and failed to see that the text had been changed.
But I did.

What made you believe that nothing had been done there?

kre

Re: [1003.1(2008)/Issue 7 0001219]: snprintf reequirement to fail when n > INT_MAX conflicts with C

2024-01-17 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 17 Jan 2024 17:54:23 -0500
From:"Rich Felker via austin-group-l at The Open Group" 

Message-ID:  <20240117225423.gb24...@brightrain.aerifal.cx>

  | I went to apply the resolution of this issue to musl libc and noticed
  | that the corresponding issue in swprintf was never brought up or
  | addressed. Should I open a new issue for it or can it be fixed along
  | with this?

Actually, I think it was, the accepted resolution contains:

Change page 990 line 33924 in D2.1 from:

The value of n is greater than {INT_MAX}.

to:

The number of wide characters requested to be written was n or more.

Page 990 is in the fwprintf() page in D2.1, and line 990, is the one which
says the "from" above in the paragraph:

   The swprintf( ) shall fail if:
CX [EOVERFLOW] The value of n is greater than {INT_MAX}.

So, I think it was intended that the change be applied, and it
simply didn't happen.   Now it has been pointed out, no more
action should be required - that one should simply get fixed.

The change for snprintf() simply deleted that whole error, that is
the:
Delete lines 30917-30918 in D2.1 (page 904).

part of Note 5895 in that mantis issue.   That one happened.

kre

Re: sh: set -o pipefail by default

2024-01-15 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 15 Jan 2024 00:13:47 -0600
From:"Daniel Santos via austin-group-l at The Open Group" 

Message-ID:  <08afc6b7-e88f-698a-c9ad-5bdce60a7...@pobox.com>

I agree with what you say, but beware:

  | Otherwise, I myfn() { local shell_restore=$(set +o | grep 'pipefail$'); 
  | set -o pipefail; ; eval "$shell_restore"; }

that needless optimisation attempt is n9t guaranteed to work.
'set +o' generates an implementation defined string which when
executed will restore any options altered between the set, and
executing the output from it, to their values at the time the
set was executed.   That's what you want there.

However nothing guarantees that you can extract a line from
that output string, and execute that - in fact the shell might
just output one long set command with lots of +o and -o options
in it (and -x or +x for any which have n0 long names, or just
any which have a 1 letter equiv, just to make the string shorter.

Or all kinds of other techniques.   The NetBSD shell does it
like this...

$ set +o
set -o default -o promptcmds -o vi -o xlock -o xtrace
$ set -o pipefail
$ set +o
set -o default -o pipefail -o promptcmds -o vi -o xlock -o xtrace

eval'ing tbe output from the first set +o would restore
things to how they were before, that's the magic "set -o default"
which returns all options to their shell startup values.
(There's a spec for what that means, but it isn't relevant
here).

But if you grep for pipefail you won't find it, so your eval
would end up executing nothing.   So forget that attemmpted
optimisation and just save, and then eval, the entire output
from set +o - thhat's what is specified to work, what isn't
specified is how.

If your plan is to allow some other option to be changed by
 and persist to the caller, then yoou simply have to
change that option after the eval (perhaps again, if the 
also needs the effect of the change).   That's rare however.

If you don't care about portability, then some shells offer
simpler mechanisms that are much easier to use, and have
the same effect.   But definit;ly not standard mechanisms.

kre

Re: IANA TZ / NerBSD TZ: tzalloc/tzfree and localtime_rz, mktime_z

2024-01-04 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 04 Jan 2024 23:24:26 +0100
From:Steffen Nurpmeso 
Message-ID:  <20240104222426.ai7_3Mvo@steffen%sdaoden.eu>

  | I was hoping for the draft; the selection list does not offer
  | anything but ..TC2 and it.

If you want, you can submit a bug now, using any base standard
that is in some way still current.   It just won't get processed
at all (beyond random notes being added) until the next standard
is being worked on, so submitting now is kind of pointless.

On the other hand, delaying may lead to a much better proposal.
I in particular would like to see "struct tm" given a complete
overhaul - resulting in a struct with a different name of
course.   And then, naturally, the interface routines that
manipulate it all need redesigning (and renaming).

That would be the perfect opportunity to make all the new ones
thread safe, and just allow what is there now to wither away.

Of course, this is not the place to do that design (and implementation)
that needs to happen elsewhere, and then be spread amongst the
various systems first - only then should anything happen in the
standards universe.

kre

Re: IANA TZ / NerBSD TZ: tzalloc/tzfree and localtime_rz, mktime_z

2024-01-03 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 04 Jan 2024 00:21:45 +0100
From:"Steffen Nurpmeso via austin-group-l at The Open Group" 

Message-ID:  <20240103232145.6dAnvvQf@steffen%sdaoden.eu>

  | My question: against which standard should an issue be opened?

The next one, after it is issued (ie: just wait, and send in the
request after the next standard is published, which is probably
this year sometime) - it is far too late for new interfaces in the
one currently being developed (the cutoff for those was back in
August or something like that).

The means, issue 9 is the earliest any new interfaces can be added.

kre

Re: Fwd: Bug 1778 in Minutes of the 27th November 2023 Teleconference

2023-12-08 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 8 Dec 2023 07:11:17 +
From:"Andrew Josey via austin-group-l at The Open Group" 

Message-ID:  

  | > In edited post-d3 line 111861:
  | >
  | >literal value of a following  *and* shall prevent a
  | >
  | > should this *and* be /or/?
  | >
  | > Using *and* seems to imply that you would need to specify:
  | >
  | >   \\
  | >
  | > to use it, while /or/ should more clearly indicate the intended
  | > alternatives:

I don't agree, it was intended to specify that the \ does both of
those things - it escapes the following char = or if that char is
a newline, it makes the pair vanish.   That is, implementations
don't get to choose which of those it should implement, and ignore
the other.

If the simple wording leaves that ambiguous in some way (I'm not
convinced it does) then the whole sentence should be reworded (made
more explicit) - just changing "and" to "or" wouldn't do it.

kre

Re: A philosophical question regarding shell vars & shell built-in utilities

2023-10-27 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 23 Oct 2023 11:02:10 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | do a search for "unexported" in the subject field
  | (which produces 114 results).

I have now read all 114 - I will admit with some trepidation, as Stephane
indicated that there were some messages from me, and I wondered if the
state of my knowledge about all of this 7 and a half years ago was sufficient 
that I wouldn't now be appalled at some of what I may have written.

Fortunately, with one exception not germane to anything related to the topic
of my more recent message re-introducing this topic (not having remembered
it was ever considered before) I couldn't find anything I said back then
which I disagree with now.

Unfortunately, of the 114 messages in that thread, only the first few (I
didn't count how many, probably not more than a dozen or so) had any
relationship at all with the topic of my recent message, before the
thread branched off into a long discussion about the resolution of bug 854
(PATH searching for builtins) and then even further afield (bugs reported,
or not reported, about GNU utilities).

The 854 discussion is where I no longer agree with what I said then, then
I indicated I could almost understand the dumb "find builtins via path
searching" nonsense - now, with more appreciation of the issues, I don't
accept that at all - it is a completely absurd way to specify things.

I could expand upon why shells should simply consider all built-in
commands to be intrinsic (or if you prefer, to always find a built-in
command before going anywhere near a PATH search) another time, that's
not the current issue.

OK, now back to the real thing ... those messages did touch upon the
issue of whether or not the built-in utilities (or perhaps just the
intrinsic ones) can access unexported shell variables - but I didn't
see any definite conclusion reached during that discussion (rather a
difference of opinions) - and I certainly did not see any reference
to anything in the standard which is intended to specify which answer
is correct.

But that was only the first of the questions I asked in my message of
(the early hours of) Fri Oct 20 (it was still Thu Oct 19 most places).
And that question was just preparatory lead up to the real issue I
was seeking an answer to, one example of which was given by the
example command sequence

pwd; OLDPWD=/foo; OLDPWD=/bar cd /tmp; echo $OLDPWD

where the question is what should be output by that final "echo"
(and for this, let's all just assume that OLDPWD never contains
anything which might cause different versions of echo to produce
different results, replace that final 'echo $OLDPWD' by
' printf %s\\n "$OLDPWD" ' if you prefer, the two are intended to
produce identical results here.

That is, should that final echo output the same thing as the pwd
command printed, or something different, and if different, what
should that be, and why?   That's first just a philosophical
question (but by all means read the definition of what cd is
required to do with OLDPWD to assist with that).

Then whatever you believe should be done here, where in the standard
is there any language that supports (or contradicts) your interpretation.

Nothing related to this was in that earlier thread.   Apparently there
was an even earlier 2009 thread, which was much before my time, so I can't
say what was discussed in that one.

There's a related issue (a slight complication of that one) which applies,
or not, depending upon the answer to the first question, and this one,
which is where a built-in utility is required to modify a shell variable,
which it also uses as part of its operation - if the consensus is that
built-ins should only be able to access exported variables (as if the
built-in were not built-in) and the variable that is being modified in
the shell is not exported in the current environment - after the variable
has been modified by the built-in utility, if its value is to be used
again, should the value to be used be the one that was modified (which
according to the assumed rule is not accessible) or the original (perhaps
a default) value ?

While pondering this message, I have also realised there's another problem
with how getopts is specified (related to all of this, in a sense) which
I will add as a note to bug 1784.

kre

kre

Re: system(NULL) overly restrictive?

2023-10-26 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 23 Oct 2023 18:37:40 -0700
From:"enh via austin-group-l at The Open Group" 

Message-ID:  

  | i'm assuming the intention here was "you're not a POSIX system without
  | a shell, so it's not possible for system(NULL) to fail to report that
  | a command processor is available" ... but is that true? what does
  | "available" mean?

POSIX requires that there be a shell which can execute commands.  If there
isn't one, it isn't a POSIX conforming environment.   That doesn't mean
that the environment is useless, or that it cannot still be very similar to
a POSIX environment when that makes sense - but does mean that arbitrary
applications cannot assume that what POSIX says will work will always work.

Beyond that there is nothing more the standard can, or should, say.

It would be ludicrous for the standard to attempt to say how an implementation
should indicate it is non-conforming, as an implementation that conforms needs
no such method, and one that doesn't is non-conforming, and so would not have
any particular reason to implement such a method (the standard is essentially
irrelevant, one either conforms or does not).

If implementations like to agree upon some common method that applications
can use to check for specific (common) non-conformance issues, that's fine,
and they can do that - but nothing about that is rational in the standard.

Nor is it really appropriate to discuss here how to do that, this list is
concerned with what happens (or should happen) in conforming environments.

If you want to suggest changing the standard so some requirement is no longer
required, that's fine, you can do that, but that's about the limit.   That is
here, if you wanted to suggest that to be conforming no shell should be
needed, you could ask for that change (but not probably expect it to be
accepted, is my guess) - but if you accept that a shell is needed for a
posix conforming system, there's no point asking for a standard way to
say "in the current environment there is no shell".

kre

Re: A philosophical question regarding shell vars & shell built-in utilities

2023-10-21 Thread Robert Elz via austin-group-l at The Open Group

Date:Sat, 21 Oct 2023 17:42:50 +0100
From:Stephane Chazelas 
Message-ID:  <20231021164250.tfuborbgdf64e...@chazelas.org>

  | See
  | news://news.gmane.io/gmane.comp.standards.posix.austin.general/12491
  | from May 2016.
  |
  | (with lynx for instance) and ensuing (long) discussion, to which
  | you participated I beleive.

Too far back for me to remember.   I can't access that with any browser
I have installed, which doesn't include lynx or anything similar, and
haven't done anything usenet related in decades...

kre

A philosophical question regarding shell vars & shell built-in utilities

2023-10-19 Thread Robert Elz via austin-group-l at The Open Group

While generating

https://www.austingroupbugs.net/view.php?id=1778#c6550

   (note 6550 to bug 1778, mostly about field splitting with the read utility,
   and in particular whether reading into some vars should have unspecified
   effects if changes to those variables could affect the field splitting
   behaviour - reading into, and hence changing, IFS is an obvious example) 

and even earlier, I started to consider what the relationship should be
between shell variables, and shell built-in utilities.

Utilities like read (also getopts, cd, ...) which (almost) must be built
in as they are specified to alter shell variables are something of a special
case, so I'll defer discussion of those until later in this message.  [Aside:
just "almost must be built in" for some of these, as an implementation could
have some other method to allow a utility to interact with the shell, and use
that to allow designated utilities to alter shell variables, or other aspects
of the shell environment.]

So, for now, let's just consider the "often" built in utilities, like
printf, echo, test (aka '[') etc.

With those, if a shell does something like

unset LANG LC_ALL LC_CTYPE LC_COLLATE LC_MONETARY LC_TIME 
LANG=weird
printf format arg arg arg

Is printf allowed, required, or prohibited from doing its output as
if LANG==weird ?Note that LANG here is not exported (that was part
of the point of the unset) and if printf were not built in, it would
have no access to the shell's internal LANG variable.   But if it is
builtin, it does.

Is there any language in the current (or forthcoming) standard that
is intended to specify this?  (If anyone knows of some, please reference
or quote it.)

Similarly with test, and the collating sequence for the weird LANG.

Note that if we were instead to do

export LANG=weird
printf format arg arg arg

or

LANG=weird printf format arg arg arg

then it is clear that the exported LANG is intended (required) for printf
to use (and similarly for any other utilities, built-in or not).

Now we get to the issue of those utilities which are required to alter
shell variables, where for consistency I think some of the answers will
depend upon the answer to the question above.

Let's take a particularly simple (and now clear) example first

X=whatever
X=something unset X

In the forthcoming standard, it is clear than when this completes, X must
be unset, and not have either "whatever" or "something" as its value, and
must not be exported.   That applies to any special built-in utility which
modifies shell variables.

Now let's look at a similar, but closely related (but much more complex)
case

X=whatever
X=something . script

and assume the script does

X=newvalue

as one of its commands (whole command, not a var-assign for something else),
and that that is the sole mention of X in "script" (or perhaps it is expanded
as well, but that doesn't affect its value).

Since '.' is a special builtin, I believe the same rule applies, and that
when the dot script completes, the shell environment should have X=newvalue
as part of it, though it is less clear to me what the requirement is wrt
X's export status (must be, must not be, unspecified whether ...).

If we had instead

unset X; X=newvalue

in the script, then I think it would be clear, when the script is complete
the shell environment must have X=newvalue and X must not be exported.

[Aside: for anyone wanting to make exceptions in case X is readonly, then
we know here it cannot be, as we are making assignments to X before running
the dot script.]

To make this less abstract, a more likely example perhaps

PATH=/where/my/script/lives . script

and "script" sets PATH to whatever I really want it to be.  That might be
all it does, script might be a single line containing
PATH=/bin:/usr/bin
(or something).   There'd be no question if I instead did

. /where/my/script/lives/script

but I didn't, I chose to find the script using the temporary exported PATH.

All of this is now (will be in POSIX Issue 8) specified for special built
in utilities.   In the PATH example, in both invocations, PATH must end
up being what the script set it to, not whatever it had previously held, and
not the value exported into the script in the first invocation (though that
would be what it would be required to be if the script did not set PATH).


But all that doesn't cover other utilities that are built in, which are
not special built-in, like read, cd and getopts, but which do set variables.
It would (or could) also cover extensions in various shells, like bash's
printf's -v option (write the output into a shell variable) or its %n
format specifier (next arg is a var name, which gets set to the number of
bytes (or maybe chars, doesn't matter here) which have been output before
that format specifier (just like printf(3)).

OK, first question here, and

Re: [Issue 8 drafts 0001778]: The read utility needs field splitting updates/corrections )and a little more)

2023-10-02 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 2 Oct 2023 16:20:50 +
From:Austin Group Bug Tracker 
Message-ID:  

  | -- 
  |  (0006507) geoffclare (manager) - 2023-10-02 16:20
  |  https://austingroupbugs.net/view.php?id=1778#c6507 
  | -- 
  | Re https://austingroupbugs.net/view.php?id=1778#c6503
  | I have changed this to be an Issue 8 draft 3 bug, as requested. 

Thanks, and for adding the link between 1778 and 1649.

kre

Re: [Issue 8 drafts 0001649]: Field splitting is woefully under specified, and in places, simply wrong

2023-10-02 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 2 Oct 2023 14:17:29 +
From:"Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
Message-ID:  <924a973badc1b5dcc1d92d7095978...@www.austingroupbugs.net>

  | A NOTE has been added to this issue. 
  | == 
  | https://www.austingroupbugs.net/view.php?id=1649 
[...]
  | -- 
  |  (0006501) kre (reporter) - 2023-10-02 14:17
  |  https://www.austingroupbugs.net/view.php?id=1649#c6501 
  | -- 
  | Re https://www.austingroupbugs.net/view.php?id=1649#c6498 (a note added to
  | bug:1649), where it says:

Apologies for that, I added that note (this one mentioned there) to 1649
when I meant to add it to 1778 instead, so I deleted this (you'll no longer
find it attached to 1649) and added a new note to 1778.

However if you read this stuff as delivered via e-mail, rather than from
the web interface to mantis, then you should read this one (note 6501)
rather than the later message (about note 6502) which was supposed to be
identical - but I totally botched the way I transferred the content of
the note from 6501 to 6502, so the e-mail about 6502 has a total nonsense
version of the test script I used (most of the rest should be the same).

The actual note (6502, on but 1778) has been edited to correct it now,
but editing of notes doesn't get reported to the mailing list (nor does
the removal of a note).

kre

Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-03 Thread Robert Elz via austin-group-l at The Open Group

Date:Sun, 3 Sep 2023 07:36:59 +0100
From:Stephane Chazelas 
Message-ID:  <20230903063659.mzyfen4evyrnz...@chazelas.org>

  | though has the same limitation as my bash echo -e "$*\n\c"

Yes, I know, though as nothing anywhere says what echo is supposed
to do with a lone trailing \ (or in fact, a \ that is not followed
by one of the defined escape sequences), I treat that as unspecified,
and so anything that is produced should be acceptable - I doubt that
real applications would ever do that (the way to output a \, in a
version of echo that handles the escape sequences at all, is to write \\).

  | $ LC_ALL=zh_TW luit
  | $ locale title charmap
  | Chinese locale for Taiwan R.O.C.
  | BIG5
  | $ echo() { printf '%b ' "$@"\\n\\c; }
  | $ echo 'Î±'
  | Î±n%

That one is a different issue, and seems to me to be a simple
implementation bug (and no, I am not claiming that NetBSD wouldn't
act just like that) - characters ought to be fully formed before
testing their values.  That the encoding of some of them might happen
to include a bit sequence, which in other environments, would represent
a backslash, should be irrelevant.

kre

Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-02 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 1 Sep 2023 07:15:14 -0500
From:"Eric Blake via austin-group-l at The Open Group" 

Message-ID:  

  | > That is dependant on the current value of $IFS. You'd need:
  | > 
  | > xsi_echo() (
  | >   IFS=' '
  | >   printf '%b\n' "$*"
  | > )
  |
  | So yes, the standard does mention the requirement to have a sane IFS,

The SysIII echo (abomination) can be done using printf %b independant
of IFS:

echo() { printf '%b ' "$@"\\n\\c; }

works.   But there is no point in defining such a function unless
it is called 'echo' (the suggestion of calling it something else, then
using an alias to map that to echo is simply farcical IMO) - the only
point of doing this is for use in a script which is assuming echo
works like that, when run on a system where it probably doesn't.

Implementing unix (as in 6th edn, 7th edn, ...) echo using printf
is harder, without depending upon IFS.  It can be done, but is a
bit messy (requires more than just one printf).

kre

Re: [Issue 8 drafts 0001771]: support or reserve %q as printf-utility format specifier

2023-09-02 Thread Robert Elz via austin-group-l at The Open Group

Date:Sat, 2 Sep 2023 09:01:06 +
From:"Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
Message-ID:  

  | If we don't deprecate %b now, the alternative is to deprecate it in Issue 9

Why?   I don't mean why is that not a consequent of the condition, but
why is it the only one?   Why not "don't deprecate %b in printf(1) at all" ??

  | Issue 9 will have an inconsistency between the printf() function and the
  | printf utility.

Yes.   And exactly why is that a problem?   Has anyone seen any demand for
the printf utility (printf(1)) to output binary in the 0b format?
I haven't.

  | and add %#s in draft 4. There is already a patch for coreutils
  | printf, but I think we would need buy-in from at least one other printf
  | implementation to even consider doing that. 

I looked at our implementation, and while it would take more code
than has been described as required for the coreutils version, it
would not be a significant amount (one issue is that our code does
not look at the printf(3) flags at all, simply skips them - then passes
the format string to printf(3) (except for %b and one or two other
weird cases that need special handling) as it was given to printf(1).
Any handling of '#' (and "'" which we already support as much as our
rather limited locale handling allows - that is, if it works for a C
program, it will work for a sh script using printf(1) as well) is all
currently done by printf(3).Not a huge change, all we need to do
is actually look for it, and then in the %s case, do %b handling
instead of %s handling if the # was present, but it isn't just nothing.

I didn't already add it, as whatever we do with %#s I cannot see a
time when %b in our printf(1) ever means anything different than it
does today, whatever the standard requires.   I suspect that might be
true of most other implementations as well - there is simply too much
application code using it to expect it to ever be changed, unless we
were to force it - and as long as %b keeps on working for applications,
they have no real reason to ever want to change, hence I don't really
forsee a time when almost anything would use %#s if we did add it.

It is different when superior functionality is replacing something
inferior (like printf and echo, or fgets() and gets()) but when we
would be just offering the exact same thing, with a different name,
and the old one still works anyway ???

Further, I suspect it is more likely that some future version of C
will find a need to define a meaning for %#s (and %S, and almost
anything else they haven't already defined) than there will ever be
a demand for 0b output from printf(1) via a dedicated conversion
character - a more general form allowing multiple bases perhaps, but
not just that.   If we had to pick something as a replacement for %b,
I'd be choosing %p - ignoring its printf(3) usage, which makes no
sense at all in printf(1), it is more natural ("print") IMO than even
%b was, and has zero chance of being usurped by the C committee (and
would be easier for me to implement)

kre

ps: while I'm here (first time on the list for a while) apologies for
my absence, my system broke, and for a whole set of weird reasons, took
a long time (close to 2 months) to get repaired, so I haven't been
following anything of what has been happening here until the back end
of this past week (not what has been happening in NetBSD either).
All my e-mail accumulated on munnari, so nothing was lost, but I am
nowhere near caught up.

Re: Access to the nightly draft

2023-06-27 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 27 Jun 2023 10:36:32 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | To get hold of the latest build you need gitlab access.

I suspect Roland was asking for something a bit less than that (though
he might accept that as an access method - I wouldn't).

Better would be much more restricted than the current drafts are,
generated PDF files - not the sources that make it (nor info on the
number of intermediate updates that are actually made to achieve
the desired changes).  Not necessarily daily, just whenever a batch
of changes have been applied, and are considered complete.

Getting a whole new draft, with hundreds, or even thousands, of changes
dumped upon us makes reviewing difficult - there's just too much to
attempt (I haven't found time to even really start on draft 3 yet).
But having a draft having just the past couple of days worth of changes,
along with the messages on the list which indicate which changes have
been applied, would make that far easier - there would be a much more
limited set of pages that actually need reading, the whole thing would
be nicely spread over a much longer period (further utilities to diff
PDF files exist, and are usable, as long as the set of changes is not
too large - once there start to be getting to be a lot, almost every
page can have "changes" (perhaps just page numbers) and that method of
seeing exactly what altered, quickly, is lost).

kre

Re: out-of-bounds numbers in shell utility arguments

2023-06-27 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 27 Jun 2023 09:41:02 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | Yes, via XCU 1.1.2; the C standard allows it for signed long, so it's
  | allowed for anything that 1.1.2 requires to be "equivalent to the
  | ISO C standard signed long data type".

And of course, that means that even though the >> operator is in Table 1-2
as one that must be supported, it cannot actually work, as >> is unspecified
(or even undefined, I forget) on signed values, and POSIX sh arithmetic only
allows for signed values.   << may have similar issues (at least some compilers
are starting to complain about the use of << with a signed left operand, which
I am guessing means at least some version of the C standard has made that be
unspecified/undefined as well).

The implementation I work with ignores that, and when an operation works
better with unsigned operands, it simply treats them as unsigned instead
of signed.   I suspect other shells might do the same.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001731]: pthread_sigmask() pending signal requirement time paradox

2023-06-14 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 14 Jun 2023 12:56:16 -0500
From:"G. Branden Robinson via austin-group-l at The Open Group" 

Message-ID:  <20230614175616.ilqpqzpbeiipu7s7@illithid>

  | The question is, did thread A receive SIGINT or not?

No, that isn't the question at all.   That's a simple race, and
irrelevant to the current discussion.

  | Is the current draft language therefore redundant?  Can lines 59787-8 be
  | deleted without damaging anything?

Those line numbers in which draft?   In the current draft (the most recent
available one, Issue 8 draft 3) those are the (whole) DESCRIPTION section of 
pthread_setspecific() - and something tells me that's not what you're
proposing removing.

In e-mail, it is generally better to quote the lines, than line numbers,
that's something everyone can understand, and can know exactly which text
is in question - at the minute I'm not sure what you're referring to.

For large sections, unless some specific wording therein is important,
it's OK to just quote the first part and the ending, we can find the
whole thing in the draft that way.   But do always make it clear which
section (for XSH 3 and XCU 3 give the function/utility name, elsewhere
the section number, and ideally its title, as numbers sometimes alter).

  | Thanks for emphasizing the narrow scope.  I've tried to direct my reply
  | accordingly.

Except the narrow scope related to what happens when the signal mask is
changed to unblock signals that were blocked, and in particular, when one
(or more) of those signals are pending.   You concentrated on the exact
opposite case, when blocking a signal, which has no particular issues at all.

The issue here is that the current standard contains language which while
clear enough about its intent, is logically absurd (it requires something
to be done after, and at the same time before, something else).   We could
just leave it alone - no-one is going to doubt what it means.   But fixing
it would be better - and we now have language that does that which works.

Beyond that, an APPLICATION USAGE section is being added (technically, it
is already there, but just says "None" - that "None" is being replaced by
other text) to explain to application writers what can happen, to avoid
misunderstandings.   The wording of that is the most recent topic of
discussion, but that's settled now too.

In both cases, naturally, unless someone else sees a problem with them.

There never really was anything substantive here, it is all just wording
things properly.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001731]: pthread_sigmask() pending signal requirement time paradox

2023-06-13 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 13 Jun 2023 16:38:54 -0500
From:"G. Branden Robinson via austin-group-l at The Open Group" 

Message-ID:  <20230613213854.hk3z6zzpkhdiunsk@illithid>

  | I apologize for the possibly academic recapitulation of multitasking,
  | but the key point is that the foregoing model does not require the
  | process to "enter the kernel" to service the signal.

"enter" was perhaps a poor choice of words, for the general case, "be in"
might be better.

But Geoff's message, to which I was replying, said:

is currently running, and is executing user code.

A process in that state was the one to which I was referring.   Such
a process must actually enter the kernel (since it is not there at the
time) in order for the kernel to deliver a signal to it.

And yes, there is something about kernel design being assumed here, but
that's not particularly important to anything.  The issue was that even
in that case, the signal is pending for some period (usually a very short
period, perhaps just a few microseconds, but also possibly longer).
It is never true, even in the simplest case:
kill(getpid(), SIGwhatever)
that the signal is not pending for some period.   Even when that system
call results in no context switch, and "immediately" invokes the application's
SIGwhatever handler, there is a brief period, between when the signal is
posted, and the handler is invoked.   During that period, the signal is
pending, as that is defined.   This one, on a fast processor, may be for
considerably less than a microsecond, but it is never zero.

  | There's arguably not much difference between your presentation and mine;
  | in mine, something special and kernelly _might_ need to happen when
  | returning from the signal handler,

returning from the signal handler isn't the issue, it is calling it in
the first place.   Signals are kernel events, for the application handler
to be invoked (in application space, running application code) the kernel
needs to be running, to set up the application environment.   While the
application could have something resembling signal handlers which operate
entirely without kernel assistance, those would not be actual signal
handlers.

  | No mode switch is necessary.

[when returning from a signal handler] - that's true in most cases, but
not if the signal handler blocked signals as part of its invocation.
In that case, some kind of call to the kernel is needed to return the
signal mask to its state before the handler was invoked.

  | I guess the question from the POSIX perspective is whether a signal can
  | be pending if a process cannot observe it to be.

I don't think that matters.   The notion is used as a mechanism to allow
the existence of signals which do not get immediately delivered to the
process.

  | That's good.  I surmise, then, that "signal
  | pendingness" is not a trait that POSIX needs to define, or even employ.

Perhaps not, but it does.

  | The standard should avoid the term if using it--even just for expository
  | purposes--is going to provoke controversy among highly seasoned Unix
  | kernel engineers who are accustomed to using it with a more
  | implementation-specific meaning.

No, there has been no controversy here about what that means (with the
slight glitch when I didn't bother to look at the POSIX definition, and
thought that possibly only unblocked signals were considered pending,
but that was just my laziness coming through).   The recent discussion
has been entirely about how to write down the notion that a signal might
be delivered to a process while it is executing the function that unblocks
(other) signals.

kre

ps: SIGQUIT is not an "un-handleable signal" - the only signals that cannot
be caught, are SIGKILL and SIGSTOP.   There's nothing particularly special
about SIGQUIT at all.

Re: [1003.1(2016/18)/Issue7+TC2 0001731]: pthread_sigmask() pending signal requirement time paradox

2023-06-13 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 13 Jun 2023 09:29:52 +
From:Austin Group Bug Tracker 
Message-ID:  <5a1cedd82cfb7ca6b01a38e53243a...@austingroupbugs.net>

  | You don't seem to have considered the case where the thread that receives
  | the signal is currently running, and is executing user code.

No, that's one of the shortish pending cases.

  | Then there is
  | nothing to delay the delivery - it can happen immediately after generation,

No, it can't in that case, it needs to wait until the process enters the
kernel for some reason.   Typically if a signal is delivered to a process
while it is in application mode, it will be the result of a kill() from
another process running on a different CPU (anything the process does to
itself, including traps, result in the process being in the kernel when
the signal is delivered - the kernel side of the process is posting the
signal to itself).   When that happens, the other CPU (the one running the
application) needs to be notified that there's an event it needs to process,
which will result in that process being (temporarily) suspended and the
kernel taking over.   When that is done (which may be immediate, or may
be later if the cpu in consideration switches to some other process) and
the kernel is returning control to the application, is when the signal is
delivered.   In the interim period (which may be very short, or may be
lengthy (in computer terms anyway)) the signal is pending.   But not blocked.
(Of course, that is assuming it wasn't being blocked).

  | Having said that, it would make sense to reword to avoid any subtle
  | distinctions about exactly when a signal becomes pending. I will try to
  | come up with something that merges parts of your suggestion with parts of
  | my previous attempt. 

Your new version looks OK to me.   Still not sure the APPLICATION USAGE
section is needed however.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001731]: pthread_sigmask() pending signal requirement time paradox

2023-06-12 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 12 Jun 2023 15:36:31 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | Yes, I assumed some context rather than stating it. I should have
  | said "In particular, when a signal is generated it can become pending
  | for reasons other than being blocked".

That's no better, particularly in light of your explanation of what
"pending" means in the standard (which I was too lazy to go and check).

Signals never become pending because they're blocked, nor do they become
pending upon being unblocked.

  | I'll edit the note to make that change.

Please make a somewhat different change, which just makes it clear that
some other signal may have become pending while this function is running,
and omit mentions of being blocked, which seem to be confusing things.
That, or don't add the application usage text at all.   Maybe something
like "An unrelated signal may have become pending while..."

kre

Re: make: -j documentation consistency enhancements

2023-04-21 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 20 Apr 2023 19:26:00 -0500
From:"Andrew Pennebaker via austin-group-l at The Open Group" 

Message-ID:  

I hope you're on the list, as gmail refuses mail from me, so I cannot
reply directly (I'd suggest getting a more rational e-mail provider).

  | As an aside, can we please generate embedded page numbers to align more
  | closely with the logical PDF page counter? In any case, back to the make
  | utility.

That's unlikely to happen - what would be nice would be (if it is possible)
to make the PDF page numbering match the actual file, but I have no idea
how PDF files handle page numbers i ii iii (etc) which precede page 1.

The page number to quote is always the one on the page itself, ignore what
your PDF reader thinks it might be.

When using PDF files with any recent version (issue 7 onewards) I just use
the PDF index, so to find make, select XCU, then section 3 (Utilities) then
just pick "make"...

  | First, the -j option is uniquely missing from the SYNOPSIS section.

I think that was already noticed, and will be fixed.

  | There at line 104481, we have:
  | -f *makefile*
  | But at line 104488, we have:
  | -j
  | That is, no value.

I am not sure anyone noticed that one, but yes, that is a defect and
should be fixed.

kre

Re: $? behaviour after comsub in same command

2023-04-10 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 10 Apr 2023 10:30:08 -0400
From:Chet Ramey 
Message-ID:  <78038281-f431-775e-6d60-a44126d1d...@case.edu>

  | The different semantics are that the standard specifies the status of the
  | simple command in terms of the command substitution that's part of the
  | assignment statement, so you have to hang onto it for a while.

I suspect that's because you are treating the assignments (more or less)
as statements of their own, and expanding and then assigning each, one by one,
left to right as you encounter them.

If you treated several var assigns just like they were args to commands,
expanded them all (for this purpose, left to right) and then run the
command - which involves putting the values to be assigned from var-assigns
into the environment of the command to be run ... in this case, the null
command, so that means the assignments affect the current shell environment,
then there is no issue, and no real need to "hang onto it for a while".

In the case where there's no command, the exit status of the last cmdsub
is simply there, for the next command to use (not this one, because there
are no more expansions to be made) - in the case where there is a command
the command execution comes next, and the exit status from that overrides
the exit status from the command substitution, before there is any possibility
of the cmdsub status (for the one that might matter, or any earlier ones
that might also have been executed, which are already lost) become visible,
to anything, as no more expansions are happening at this point.

But because the standard doesn't actually say which order these things need
to be evaluated, but does say how $? is supposed to be affected, the
implementations can get messy to handle all of this properly, if the
implementation chooses a different way of handling the unspecified part
(which really, is unspecified just because some early implementations did that).

Note, that before we do any of this (var-assign, and redirect, processing)
we have already expanded all the rest of the command line, the words that are
not related to redirects or var-assigns, we have the command name (if any)
and know if it is there at all (or not, a null command) and if it is a
built-in of some kind (so whether or not we shall fork() ... create a new
shell environment) or not - and if we want, when there is to be one, most
of the rest (redirects and var-assigns) can be expanded in that new
environment (in the child process).  Or not.   That's all just implementation
detail (provided we don't leave any inappropriate results in the parent
shell environment .. which means special care if the fork() is implemented
using vfork()).

kre

Re: $? behaviour after comsub in same command

2023-04-06 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 7 Apr 2023 05:38:16 +0300
From:=?UTF-8?B?T8SfdXo=?= 
Message-ID:  

  | a=${b#prefix} a=${a%suffix}
  |
  | is common enough a pattern to consider despite having no benefit other than
  | looking organized. Most shells interpret it the way average user would
  | expect too

Most might, but it is still unspecified (and is not something I think I
have ever encountered).   It is trivial to fix by putting a ';' or newline
between the two assignments, then it works everywhere.  Why wouldn't you?

And what's more, tell the authors of anyone else making this mistake that
it is unspecified, and how simple it is to fix.

kre

ps: replying only to the list, as gmail simply bounces any messages I
send to its users directly.

Re: $? behaviour after comsub in same command

2023-04-06 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 07 Apr 2023 03:14:47 +0200
From:Steffen Nurpmeso 
Message-ID:  <20230407011447.ptyvc%stef...@sdaoden.eu>

  | There i say

I'll omit the quotes from the standard...

  |   So everything should be handled sequentially, making it a bug.

>From where do you get sequentially?  I don't see that anywhere.

And sequential what?  And where do you see whatever that is specified?

  | And that is true, no?  If expansion has to take place, and the
  | assignment has been performed, .. it has been performed?

Sure, but the normal way to evaluate any command (omitting
irrelevant aspects here, like redirects, etc) is to evaluate
all the words (perform expansions) and then execute it.

Why would evaluating var assigns be any different?  Expand all the
words, then execute (assign).  Seems to me like the obvious (and
correct) way.

  | So maybe null command and that is not a bug?

No, I don't think it is.

  | But all shells except FreeBSD do this; also from the report:

NetBSD too, and according to reports, dash only just changed.

My guess (no more than that) is that sometimes it is easier to
give in to the desires of the masses rather than maintain the
correct approach.

To people who don't understand sh syntax,

a=1 b=2 c=3

kind of looks like 3 commands that should be executed in order
as written, just like

a=1
b=2
c=3

would be.  But the first form isn't 3 commands, it is one.
There is nothing there (except the final newline) which is
a command terminator.

Note here that I am not claiming that shells which do it the
"other" way are non-conforming, about all the standard says
is that the words need to be expanded before the assignment
is performed - it doesn't say to expand all the words, then
do assignments, it doesn't say expand each word and then
assign, and then go on to the next, and it doesn't say which
order to do the expansions or assignments (left to right, right
to left, or random).

That means that all of that is unspecified, and shells can
do it in whatever order makes sense to them.  I have my own
views on what is best here, and won't be changing the NetBSD
sh from how it behaves in this area.  I hope FreeBSD don't
change either.

It alsp means that applications that use any of this unspecified
behaviour, expecting some particular result, are broken, and
cannot legitimately complain when some shell doesn't work the
way they expect.  It doesn't mean they won't, unfortunately.

kre

Re: $? behaviour after comsub in same command

2023-04-06 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 5 Apr 2023 10:35:58 -0400
From:"Chet Ramey via austin-group-l at The Open Group" 

Message-ID:  

  | A variant with slightly different semantics:
  |
  | (exit 8)
  | a=4 b=$(exit 42) c=$?
  | echo status:$? c=$c
  |
  | The standard is clear about what $? should be for the echo, but should it
  | be set fron the command substitution for the assignment to c?

It isn't really different semantics, it is the same thing.   The exit status
from the command substitution in that case is used as the exit status for the
empty command that is line 2 (you're right, that is clear).  But that command
doesn't get to set an exit status until it finishes, and it can't do that
until its associated var assigns have all been performed, which (even
leaving aside the question of the order in which they, and the args for
them, are processed) cannot possibly be before c=$? is expanded and assigned.

Needless to say, the same (exact) set of shells which produced N:N in the
example in my previous message, set c to 42, and all the rest (including
the older ksh93) set c to 8 (which really is what it should be - the other
possibility here would be "unspecified" as even if the exit status were to
become available in the middle of evaluating the args for a command, here
we don't know whether c= or b= will be evaluated first.

All the standard actually says is:

4. Each variable assignment shall be expanded for tilde expansion,
   parameter expansion, command substitution, arithmetic expansion,
   and quote removal prior to assigning the value.

There's nothing there about the order in which they're processed (unlike,
for example, redirects, which are required to be process left to right)
which makes the order implicitly unspecified.   Anything is possible.

But as, in any sane implementation, assigning the values to the variables
should not in any way affect the values assigned to other variables in the
same set of var assigns, it really should not matter the order in which
they're processed, unless someone is idiotic enough to write

   a=1 a=2 a=3

in which case what value gets left in a is anyone's guess, and they get
what they deserve.

kre

Re: $? behaviour after comsub in same command

2023-04-06 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 6 Apr 2023 11:17:43 -0400
From:"Chet Ramey via austin-group-l at The Open Group" 

Message-ID:  <023c0028-e682-e1b6-99db-c8a596cdf...@case.edu>

  | My question is why they would choose something other than
  | what the so-called reference implementations (SVR4 sh, ksh88) did.

Not that I was participating at the time, so I have no actual knowledge
of any of this, but I get the impression that back then it was considered
OK to change things if the group believed it "made it better for the users".

Hence we got that absurd PATH search rule for builtins, that no shell of
the time did anything like, "because a user might want to override a
builtin with a version in their own bin directory, earlier in PATH than
where the standard version of the command exists", or the even stupider
(and fortunately, going to be gone) rule that all the normal built-in
commands needed to be available in the file system (not so much so the
preceding PATH rule would allow them to be overridden - that didn't work
in practice anyway, but in case someone wants to "nohup cd" or
"find ... -exec umask whatever".   Nonsense.)

This is likely more of the same - but in this case I actually agree
with it - $? only gets updated when a command finishes, and only one
in the current execution environment.   That's clear, simple, and
easy to use - otherwise using $? other than as S=$? immediately after
the command whose status is of interest, becomes a total crap shoot.

That's reinforced in this case, by wording that makes it clear that the
only way to ever observe the exit status from a command in a command
substitution (other than the command there writing the value of its $?
somewhere) is to run it with a null command (just a var-assign, or
redirect).   That is, when there is no command, the status of the last
command substitution (if any) becomes $?.   That's the only way.

Otherwise things like

return $(true)

would need to work (as an equiv of return 0 - and return $(false) for
return 1 - and the standard as never required that).

Given:
 $SHELL -c 'f() { return $( exit $1 ); }; e() { for A; do f "$A"; echo 
"$A:$?"; done; }; e 0 1 2 3 99'

which I will unwrap to make it easier to read:

 $SHELL -c '
f() {
return $( exit $1 );
};
e() {
for A;
do
f "$A";
echo "$A:$?";
done;
};
e 0 1 2 3 99
'

bash, zsh, and a current ksh93 (Version AJM 93u+m/1.0.4 2022-10-22)
actually print N:N for all of the output lines, whereas everything else
I tested, including an older ksh93 (Version AJM 93u+ 2012-08-01) and
ancient pdksh, prints N:0 for everything.   Since the return is effectively
"return" (the command substitution doesn't output anything - if it were
$( echo $1 )
instead of exit $1 things would be different) it should return with the
status of the last command to finish - which here is always either 0,
from the status set by the function definition for e (the very first time)
or the result from the "echo" after the previous iteration, every other time.
Since echo's status is (generally, and always here) 0, the return should
always be "return 0").

kre

ps: all this is really esoteric, and makes no real difference to any
sane application.

Re: $? behaviour after comsub in same command

2023-04-05 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 5 Apr 2023 18:25:32 +0300
From:"=?UTF-8?B?T8SfdXo=?= via austin-group-l at The Open Group" 

Message-ID:  

  | Outliers are ash based shells; they apply
  | assignments concurrently but it isn't useful at all.

No we don't, to do it concurrently we'd need to run multiple threads,
and synchronise them carefully, and we don't...

What we do is separate the process of doing the expansions from doing
the execution of any command.   The expansions happen first, the rest
comes after.

The issue here is that people tend to think of
a=1
as a command.   It isn't (not as people think of it anyway).
But with that mindset they treat
a=1 b=$a c=$b
as 3 commands, one after the other.   It isn't.

The simple
a=1
case is a null command, with a var-assign prepended.
The other case
a=1 b=$a c=$b
is also a (single) null command, this time with 3 var-assigns
prepended.

If you want sequential execution, that's easy to achieve, just
change
a=1 b=$a c=$b
into
a=1;b=$a;c=$b
then you have 3 null commands, each with a single var-assign,
and those will be executed, in order, one at a time, just like
you want the other one to be, in any shell that isn't completely
broken.

As reported in a later message, the "isn't useful at all" is
wrong, as doing the expansions first, and then the assignments
later, when it is all part of the same command (whether a null
command or just var-assigns preceding any other command) means
that
a=$b b=$a
does work to swap a and b, and doesn't require creating a new
var, which in a case like

t=$a a=$b b=$t command

would result in placing t into the environment for command, which
might be harmless, or might not be if you happened to accidentally
pick the wrong temporary name to use.

a=$b b=$a command

doesn't do that, it just puts a and b (as desired) in the environment,
and works sensibly.

That it works exactly the same way when command is missing, would, I
would have thought, be expected.   It is the right way.

We don't need two different ways to achieve a=1;b=$a;c=$b that one
is quite sufficient.   Just use it.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001457]: Add readlink(1), realpath(1) utility

2023-03-22 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 22 Mar 2023 14:31:16 +
From:"Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
Message-ID:  <9b820bbcf17033e4b2b83a4cd13eb...@www.austingroupbugs.net>

  | A NOTE has been added to this issue. 
  | == 
  | https://www.austingroupbugs.net/view.php?id=1457 

This issue is in a state that doesn't allow ordinary mortals to add notes,
so this e-mail instead.

Adding -v/-q (which BSD readlink has as well, -v is the default) wouldn't
help anything here (or not by itself).

Changing the standard to allow an error return (since readlink is a "provide
information", not a "test" utility, non-zero status is an error)
without a diagnostic seems unlikely, even if -v/-q were added, -v is likely
to remain the default.

Changing coreutils seems like the sane solution - users who use -q
are going outside the standard, and then the lack of an err message
is acceptable.

While it isn't up to me, I would have thought that getting an error
message (by default) when readink fails is more to be expected than
the other way, so I wouldn't have the change depend upon POSIXLY_CORRECT.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001640]: The rationale given for retaining "true" is nonsense.

2023-03-14 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 14 Mar 2023 10:31:52 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | I think this and some other differences between ":" and "true" are
  | worth mentioning in the standard.

I don't think that would do any harm, or is incorrect, but I'm
not sure it is necessary either.   Some of us recognise that
true and : are (in many uses) more or less interchangeable.
That doesn't mean that we need to explain why both exist, or
what the differences are.

It is often possible to replace grep with sed - the standard does
not need to say that, or explain how, or what grep can do that
is not so easy using sed.   Same here.

Just removing the Rationale would be enough, but I don't mind
if you really believe the rest of this is needed.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001640]: The rationale given for retaining "true" is nonsense.

2023-03-12 Thread Robert Elz via austin-group-l at The Open Group

Thanks, I think that's all we needed to know about what it does.

kre

Re: Syntax error with "command . file" (was: [1003.1(2016/18)/Issue7+TC2 0001629]: Shell vs. read(2) errors on the script)

2023-03-12 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 10 Mar 2023 23:40:18 +
From:"Harald van Dijk via austin-group-l at The Open Group" 

Message-ID:  

  | Based on past experiences, I am assuming the e-mail this is a reply to 
  | was meant to be sent to the list and I am quoting it in full and 
  | replying on the list for that reason.

Thanks, and yes, it was - my MUA absolutely believes in the one true
meaning of Reply-To (where the author of the message to which the reply
is being sent requests that replies be sent -- to addresses in that field,
and no others).   I need to manually override it when I choose to ignore
that request and send to different addresses (which is allowed, but in
general, done only with proper consideration of why).   This list always
directs that all replies go only to the author of the message, and never
to the list itself.   Irritating...

  | Sourcing arbitrary script fragments and having assurance that they do 
  | not exit the shell is not reasonable, as the arbitrary script fragment 
  | could contain an 'exit' command.

Of course, deliberate exits aren't the issue, only accidental ones.

  | Beyond shell options and variable assignments not persisting in the 
  | parent shell, are there any other issues you see with running them in a 
  | subshell?

The whole point of many . scripts is to alter the shell's environment,
if they were just arbitrary commands, not intended to affect the current
shell, they'd just be sh scripts, and run the normal way.   The very act
of using the '.' command more or less means "must run in the current shell).

"(. file)" is silly, "file" would accomplish the same thing (if executable,
otherwise "sh < file" after finding the path to file) in a more obvious way.

Apart from options and variables, . files often define functions, change
the umask and perhaps ulimit, and may alter the current directory, set exit
(or other) traps, ...  anything in fact.

As an example, consider what you might put in your .profile or $ENV
file - those are run in more or less the same way as a '.' file (just
without the PATH search to locate the file).   XRAT C.2.5.3 says almost
exactly that about ENV.   (Strangely though, even though .profile is
mentioned several times as a place where things can be set, it doesn't
appear in the standard (as something that shells process) at all - which
is kind of odd really, since it is considerably older then ENV, and as best
I can tell, supported by everything.   The closest that we get is a mention
in XRAT that "some shells" run it at startup of a login shell.   Which are
the other shells?  That is, the ones that don't run .profile?   And I don't
mean in situations like bash, which prefers .bash_profile if it exists.

I doubt that you'd want those scripts run in a subshell environment,
I also doubt that you want the shell to exit if there's an error in
one of them.   How would you ever be able to log in (and start a shell)
if it exited before you ever had a chance to run a command?   If you can't
log in, because your shell won't start, how would you ever fix the problem?

As best I can tell (I have done very limited testing of this) shells tend
to simply abort processing one of those scripts upon encountering an error
(like a syntax error, etc - not executing "exit" - that should exit) and
just go on to the next step of initialising the shell.   They don't just
exit because there's a syntax error - most shells report the error (not all),
but I couldn't find one which exits.

  | You have left out bash 4 here.

For the same reason I didn't include ancient versions of all the
other shells either.   That's obsolete, not going to change in the
future, and has been replaced.   [And because I happen not to have
a binary of it at the minute - I could make one, I do have sources,
just don't really see the need.]

  | I do not expect bosh to have a large user base (even if it will be wider 
  | than mine), but as I am sure J�rg would have pointed out, the shell has 
  | historical significance in that it is a descendant of the Bourne shell 
  | from which POSIX shell language is also derived.

So is/was ksh88, and then ksh93 ... they were just modified more.

  | (Although I wouldn't be 
  | opposed to a change to POSIX to *allow* something different.)

As I hinted in the note in bugid:1629 which spawned this discussion
(bugnote:6200) I expect this part might need to move to "may exit" rather
than "shall not exit" (away from "shall exit" which it is now, in the
cases in question, not all) for a release cycle (or two) - but then again
given the number, and popularity, of the shells which already don't exit
in these circumstances, perhaps that won't be needed.  That should be
discussed further.

The reason that read errors are different in this regard (at least in
the main script, not in . files -- not sure it is possible to have an
equivalent to a read error in "eval" - perhaps an EILSEQ (bad char encoding)
in the string might count? -- and that

Re: [1003.1(2016/18)/Issue7+TC2 0001640]: The rationale given for retaining "true" is nonsense.

2023-03-12 Thread Robert Elz via austin-group-l at The Open Group

Date:Sun, 12 Mar 2023 16:54:34 +
From:Austin Group Bug Tracker 
Message-ID:  <0a945390fc5d0c6c366071bcd2d29...@austingroupbugs.net>

  | A NOTE has been added to this issue.

I don't think this discussion needs to be in notes, or not unless
something relevant to the actual issue itself is revealed.

  | GNU true accepts some --version, --help options.

That is perhaps not surprising - weird though, as if true was going
to need multiple versions to get it right, or add features, or that
anyone needs help writing "true" ...But OK, and apart from one
potential issue (later).

  | I don't have access to ksh93 just now but I'd expect its true to supports
  | those as well as --author --man --usage and many more in that vein like
  | most of its builtins do. 

Not that I can see, it appears to ignore any operands to true, just
as (almost) everyone else's (and all sane) versions do.

With the GNU version, what would be more interesting to know, is what
it does when run as
true --nonsense
true --
true '--:)  (-;'
(and similar).   What's the exit status, is there any output, and if
so, to stdout or stderr?

I'm also assuming that the --version and --help (to be meaningful "accepted"
rather than just ignored - everyone's true allows and ignores those, and
any other args given) actually produce some output.  stdout or stderr?
What happens (exit status etc) if there's a write error while writing that
output?

kre

Re: Syntax error with "command . file" (was: [1003.1(2016/18)/Issue7+TC2 0001629]: Shell vs. read(2) errors on the script)

2023-03-10 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 10 Mar 2023 18:13:00 +
From:"Harald van Dijk via austin-group-l at The Open Group" 

Message-ID:  

  | Other shells that exit are bosh, yash, and my own. It's both what POSIX 
  | currently requires (contrary to what kre wrote on the bug)

That's not how I intended what I wrote to be interpreted, I meant exactly
what you said - when I wrote "most shells are doing what shells always have,
and what the standard requires", I meant "exiting".

But as I wrote in my previous message, I was actually testing "command eval"
rather than "command ." which I would normally expect to work about the same
way in this regard, but it turns out that not shells all do.   Further, and
subsequent to when I sent that last message, I went and looked at my tests
again - the way I do these is by (for tests like this one) composing a
command line as input to one shell (each has its own xterm in my shell
testing root window page - they're all tiled), then pasting it into the
windows for all the others - then I can see the results from all of them,
at the same time, and easily compare what happened (the command always
starts $SHELL kind of like the example Geoff showed, except I do not quote
that, because sometimes SHELL needs to be "bash -o posix" or similar, and
I want that field split, not treated as a quoted word.

For this, I tested both without, and with, "command" present ... but it turns
out that somehow, for some of the shells, instead of running both tests, I
managed to paste the wrong command, and ran the one without "command" twice,
without noticing.   That even included the NetBSD sh test, which contrary
to what I said before, turns out does do the same thing for "." and "eval"
in both cases (exit without command, not exit with it) which is what I had
expected, before I saw the results of the incorrect test - before I noticed
it was incorrect.

  | and what I think is probably the right thing for shells to do.

I don't.   I want to be able to source arbitrary script fragments, and
eval arbitrary strings (there are no security issues here, the fragments
and strings, are all provided by the user running the shell - anything that
could be done buried one of those other ways, could simply be done as a
command without subterfuge) without risking the shell exiting.  Sometimes
running them in a subshell works, but only sometimes.

  | Whether bug 1629 should introduce a significant shell consistency issue 
  | is not separate from bug 1629.

Perhaps that one, and some new one, yet to be submitted, should be
considered together, but resolving 1629 the right way should not be
held hostage by other ancient weirdness that might not be so easy
to alter.

But perhaps after all, it might be - if it is only yash, bosh and your
shell not already continuing after "command . file" fails because of
a syntax error, then those might not matter, and those, plus, I
think, mksh and ancient pdksh (and consequently, probably ksh88 as well)
for "command eval 'gibberish<;)'" failing the same way then I'd guess
mksh can get changed, and the others also no longer really matter.

  | Bug 1629 started as trying to see what 
  | shell authors are willing to implement.

No, it started because read errors were not being handled in a rational
way.   A proposed solution depended upon what shell authors are willing
to implement.

  | and I know bosh sadly isn't going to see an update anyway,

Really?   I thought some group of people had taken over Schilling's stuff.
Whether they consider bosh worth continuing with I am not sure (it still
has more important issues than this remaining in it, and I don't believe is
used much, if at all).

  | but I would hope that authors 
  | of the other shells also have the good sense to implement something that 
  | makes sense to them and keep it internally consistent,

There is so much in the shell already which is not internally consistent,
that one more thing (particularly in an area rarely seen) would hardly be
noticed, but I very much doubt that there will not be at least an attempt
to alter the "what happens when there's an error which is "shall exit"
detected when running a special built-in as a sub-command of "command".
Syntax errors aren't the only one,
command eval 'shift 0 >/'
is another (redirection errors are also "shall exit" when used with
a special built-in as is being done here).

I expect that will probably succeed, even if we all need to make some
more changes to almost never encountered parts of the shell, and most
probably it won't be "we all" in any case.

kre

Re: Syntax error with "command . file" (was: [1003.1(2016/18)/Issue7+TC2 0001629]: Shell vs. read(2) errors on the script)

2023-03-10 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 10 Mar 2023 17:12:50 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | All of bash, ksh88, ksh93, dash and mksh reported the syntax error and
  | then executed the echo command (which output a non-zero number).

yash and bosh don't, they simply exit.

But you caught me ... the tests I did yesterday were of "command eval"
which I assumed would be treated the same (I see no reason why there
should be a difference), but apparently isn't in many shells (including
the NetBSD one).

kre

Re: [Issue 8 drafts 0001639]: Clarify minimun length requirement of "quoted" std and dst names in POSIX TZ string.

2023-03-05 Thread Robert Elz via austin-group-l at The Open Group

Date:Sun, 5 Mar 2023 07:44:38 +
From:"Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
Message-ID:  

  | The mismatched < em > has been replaced by < /blockquote >.

Thanks.

  | The matched < em > < /em > pairs have been replaced by the more
  | common < i > < /i > pairs. 

There I was just copying what I had seen elsewhere, and it
seemed to work!

kre

TZ setting of "std" and "dst" allowed characters (minor question)

2023-03-04 Thread Robert Elz via austin-group-l at The Open Group

This is just something I am wondering about, rather than any kind of
problem, but I'd like to make sure I'm not missing something.

In XBD 8.3, in the section on the TZ variable, in the case of what we
generally call a "POSIX TZ string" (though in the D2.1 it is just the
form that doesn't start with a ':', and in D3 will be the 2nd format)
the text in D2.1 says of "std" and "dst"

-- In the unquoted form, all characters in these fields shall be
alphabetic characters from the portable character set in the
current locale.

And similarly in the quoted form (I'll just cut the relevant phrase)

alphanumeric characters from the portable character set in
the current locale, [...]

My question is why those two say "in the current locale" - what information
or restriction or special meaning is implied by those 4 words?

XBD 6.1 says a lot about the Portable Character set, those characters must
be present in every locale (their encodings may vary, but they must all be
one byte values, and in a char variable must have positive (or 0 for nul)
values.

I'd have thought "alphabetic characters from the portable character set"
(in the first case, and similar in the second) would be enough.

I would point out that the quoted form (which is actually first in the
text, though I put it second in this message) continues:

the  ('+') character, or the  ('-') character.

Those ones don't say "the  ('+') character in the current locale"
(or similar for '-'), which I would have thought they would need to, if those
extra 4 words actually mean something.

If those words have no purpose, then I'll submit a mantis issue to have
them removed (they're just wasting space, and causing confusion - mine if
no-one else's).   On the other hand, if they are needed for something then
perhaps we also need to add them with the '+' and '-' chars that are allowed
in the quoted case.

About the only possibilities I can think of for this, are first if
locales, while being required to have include the portable character set,
were permitted to not include the ascii letters as "alpha" - but XBD 7.3.1
seems to prohibit that.   What's left is the possibility that a locate adds
some other character, from the portable character set, as type upper or
lower (and hence, alpha) eg: ESC perhaps, or maybe '<' and '>' which would
make the spec ambiguous, as then one of those fields might be the quoted
form or the unquoted form.   I don't see that as being forbidden, as long as
ESC (or whatever is added) is not include in class cntrl (or punct, or blank)
- but even assuming that a locale is allowed to do that, is it really intended
that if a locale were to define things that way, then ESC (etc) would become a
permitted char in "std" or "dst" - is that why those words are there?

If it is, I suspect we should consider changing that...

kre

Re: [Issue 8 drafts 0001638]: Requirement that TZ "std" and "dst" be 3 chars long (when given) is apparently ambiguous

2023-03-03 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 3 Mar 2023 14:31:13 +
From:"Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
Message-ID:  

  | Occurrences of "bugno:" and "POSIZ" in the Description have been changed to
  | "bugid:" and "POSIX", respectively. 

Thanks.   I didn't even notice POSIZ - though that is the kind of typo
I would make...

kre

Re: [Issue 8 drafts 0001638]: Requirement that TZ "std" and "dst" be 3 chars long (when given) is apparently ambiguous

2023-03-03 Thread Robert Elz via austin-group-l at The Open Group

When sumitting that bug, I (obviously) forgot the magic required
to refer to another bug.

I'd be grateful if someone who can (ie: not me) would change
all of the "bugno:" strings in the Description info "bugid:"
(or if that is not correct either, into whatever is).

There are several...

TIA,

kre

Re: Minutes of the 6th February 2023 Teleconference

2023-02-09 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 9 Feb 2023 09:19:00 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | there was general agreement that executing the partial line
  | after getting a read error is really not a good thing for
  | shells to be doing.

I'd probably agree that it isn't the ideal approach, but
that's irrelevant (or should be) here - it is not what
shells do, or have ever done, so is not the standard.

This group, just like anyone else, can put the case to
shell implementors that the current approach is sub-optimal,
and ought be altered.   That would be easier on implementors
if the standard makes it conforming to treat a read error
as an error (resulting in aborting current processing, etc)
as well as the current standard behaviour (treat it as EOF).

What cannot be done is to require shells to treat read errors
as shell errors rather than EOF, that would be legislating,
and that is not what should be happening here.   There is
a clear de-facto standard here, we either write that down as
the standard, or allow it or other behaviour considered better
as alternatives, and possibly add a future directions for a
posdible change in Issue 9 (if shells have altered their
behaviour).

kre

Re: Minutes of the 6th February 2023 Teleconference

2023-02-08 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 8 Feb 2023 10:24:33 + (UTC)
From:"Thorsten Glaser via austin-group-l at The Open Group" 

Message-ID:  

  | However, executing the partial line after getting a read error
  | can and probably should be treated differently *unless* a read
  | error is treated as EOF.

I agree with that - the error is either an error, which would cause
a non-interactive shell to immediately exit with non-zero exit status
(with some message on stderr), or an interactive shell to return to the
command prompt, issue a new PS1, start a new read, presumably get an
error again, 

If the read error is treated as EOF, then the shell acts just like any
other EOF at that point.

I have no problem with specifying the "must be EOF" behaviour (yash could
change) but requiring it to be treated as an error, rather than just allowing
it, would be a non-starter given that only yash (that we know of) behaves
that way.I however don't object to it being unspecified which behaviour
will occur - of those two.   This is not a case where it needs to simply be
unspecified what happens, such that the shell can do anything it likes.

kre

Re: Minutes of the 6th February 2023 Teleconference

2023-02-07 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 7 Feb 2023 11:45:03 -0500
From:"Chet Ramey via austin-group-l at The Open Group" 

Message-ID:  <26b52c56-89f7-a4a9-e2a1-e754d6387...@case.edu>

  | The key is that everyone `executes' the partial line after getting EOF,
  | even yash.

This is important, it makes reading from files, and reading from
strings, work the same way, which avoids the need for everyone to
supply a terminating \n when supplying the command_string arg to sh -c
but also for "eval" (and so traps as well) - these strings just have
commands which end with an "end of string" (no newline at the end
required).   Treating "end of string" and "end of file" the same way
is the natural thing to do.

Read errors being treated differently than EOF would be possible, but
isn't what has traditionally been done - at most it should be unspecified
whether this is treated as an error, or the same as EOF.

kre

Re: tv_nsec

2023-01-20 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 20 Jan 2023 08:37:44 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | You haven't stated your reasons for wanting to refute it, so that makes
  | it difficult to know what we can say to persuade you you're wrong.

Nick also didn't state what "it" was to be, making it impossible to
decide if it should be refuted or not.

  | In any case, if C23 changes it to nsec_t

If that's the proposal, then I agree with you, that's relatively
harmless for POSIX code (slightly more difficult for strict C code,
and they should certainly be adding a PRI macro to inttypes.h - all
invented printable types should have at least one such a macro defined).

But if the proposal is to change it to int32_t (or worse uint32_t)
then that would be a real problem - despite 32 bits clearly being
enough to represent a count of nanoseconds within one second (had
the tv_nsec field been defined that way originally, that would have
been OK, but it cannot be changed into that now).

Even worse would be (as suggested in an earlier message) for it to
be allowed to be any implementation defined type, then it could be
a float type, or (absurdly) a struct, union, or even an array.

kre

Re: Security risk in uudecode specification?

2023-01-16 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 16 Jan 2023 18:02:47 +0100
From:"Christoph Anton Mitterer via austin-group-l at The Open 
Group" 
Message-ID:  <3d8cea9121caf4944d2d1b8f6ff0dca4537afe92.ca...@scientia.org>

  | It's the only portable way to encode/decode stuff to/from base64,

I didn't even realise that the standard included the base64 variant,
rather than just the original traditional encoding.   uu*code isn't
high on my list of things to care about.

  | IMO it should only be removed if replaced by the base64 utility.

The deadline for that to happen is definitely past.

kre

Re: Security risk in uudecode specification?

2023-01-16 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 16 Jan 2023 10:01:48 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | There seems to be some misunderstanding here. The only line we
  | have drawn is for requests for new features. We will continue to
  | process bug fix requests for inclusion in Issue 8 for a while yet.

Ah, OK, good.   I thought from:

[This is really Andrew Josey from the minutes of the Jan 12 meeting]
austin-group-l@opengroup.org said:
| We are planning to produce draft 3 soon. 
| Once bugs 768, 243 (if accepted), and 1617 (if updated to add -w) have been
| applied, we just need updated frontmatter to complete draft 3.

| Shortly after the meeting the ISO/IEC ballot got underway to approve the
| revision project (a separate activity to approving the draft!) Andrew will
| need to form the IEEE ballot group as the first part  of the IEEE process.

and I recalled earlier mention (which I will never find now) that it was
planned that Draft 3 be the final draft (I always assumed subject to typo
corrections, editing mistakes, things forgotten which were supposed to
happen, etc, if there were any of those, otherwise there'd be no point
calling it a draft - but I also assumed nothing substantial would change
after it was published).

I am happy to learn that is not to be the case.   I was slightly surprised
to see in that quote that the revision project - ie: all that has been 
happening for the past several years, is not yet formally approved.  What
would happen to all of this work, should that fail?   (Not that I would
anticipate that happening, but one never knows).

That means, I guess, that if someone cared enough about changing what the
text says about uudecode and its handling of setuid bits, that a new bug
report might get that changed.   It won't be me, a bug report from me would
just be to have uuencode/uudecode removed altogether, I think their time to
be mandated, or for applications/users to expect to use them, passed quite
a while ago - though naturally implementations are still likely to support
them for some time yet.

kre

Re: Security risk in uudecode specification?

2023-01-14 Thread Robert Elz via austin-group-l at The Open Group

Date:Sat, 14 Jan 2023 09:19:24 -0800
From:Alan Coopersmith 
Message-ID:  <7d6830e3-ab04-2d86-8869-8819283f4...@oracle.com>

  | We can't compare the command specifications in the standard for tar,

So, use pax instead.

  | as there are none, but if we look at common implementations, they do in
  | fact protect against issues such as those raised here with the paths:

Yes, it can, but the assumption in all of this is that somehow root is
being convinced to run the extraction without applying any thought to it
(without that, uuencode is no more dangerous than cp).   If we're assuming
that root can be fooled that way, we may as well assume that root could as
easily be convinced that the -P option should be given (tell root that that
option preserves the modify times, or something) or perhaps get root to
run tar with -C /  ... either way would have the same effect.

But for tar, overwriting important files isn't the issue that matters,
if we can convince root to extract a tar file, we don't need to also
get them to add either of those options, we just put a setuid root
binary in the tar file, which tar will happily extract as a setuid
root binary.   As it should.

The problem with this isn't the tool, which is doing what it is designed
to do, exactly as it is designed to do it, but the root user who doesn't
pay any attention to security but just "does as instructed".   There are
a million ways to take advantage of such a root user, picking uudecode
as something to change because of it is pointless.

  | At the very least here, I thought the standard committee would want to
  | consider that all of the major implementations of uudecode follow a
  | defacto standard on removing bits from the permissions that doesn't
  | seem to be allowed by the current language of the formal standard.

Yes, that is an issue that probably should be considered, as what the
standard describes doesn't match what implementations actually do.
But that won't happen until some submits a bug report in the proper
form (ideally complete with new text to update things).

There seems now to be no hurry to do that, as the committee (of which
I am not part) seems to have drawn a line through the defect reports,
and only those which precede it will be attended to in the forthcoming
new issue of the standard (which is actually a pity, publishing a whole
new version with known defects in it already seems like a poor choice).

In any case, it seems as if anything new now will need to wait for
at least Issue 8 TC1 (I'd guess 3 or 4 years from now), or perhaps
even Issue 9 (maybe 2030 or after - Issue 7 was 2008, Issue 8 might
be 2023, or perhaps 2024 - at that rate Issue 9 might be 2040.)

kre

Re: Security risk in uudecode specification?

2023-01-11 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 11 Jan 2023 13:48:31 -0800
From:"Alan Coopersmith via austin-group-l at The Open Group" 

Message-ID:  

  | Below is a message sent to the Open Source Security mailing list over
  | the holidays about a security risk in uudecode, which the GNU maintainer
  | pointed out was forced by the current language of the standard.

The real problem here is that as soon as someone says "security problem"
almost everyone simply jumps to "we must find a solution" and no-one ever
bothers asking if there really is a security problem or not?   That's not
an acceptable question, "we must not be seen to be ignoring security issues".

But ask yourself, what if the utility in question here was tar, or pax,
or cpio (or whatever it is that Solaris uses for system installs and
updates)?   Is there any material difference to uuencode in how they
operate, or what they can do (except that tar (etc) will usually set the
setuid bit in extracted files if the archive says to do that - how else
would "su" ever get installed correctly?)

What's more, it is far more common these days for an e-mail message from
some random source to contain a tar file (usually also compressed, but
that's irrelevant here) than a uuencoded file - which is the actual bigger
security threat?   Or is the security threat really the idiot user who
simply runs arbitrary commands as root, and then complains when bad things
happen?   Of course, since we cannot "fix" the users, we keep trying to fix
everything else - which is doomed to failure.

All of these file container handling utilities do more or less the same
thing, they bundle up files, and upon request, unbundle them again.
That's what they are designed to do.   Using any of them inappropriately
can be a security problem, but it isn't the tool that is the problem, but
the inappropriate use.

And it isn't just the archive format utilities that have issues like this.
What do you expect to happen when someone says "make install" ?   What if
a root user is fooled into running that on a makefile that has:

install:
cp myfile /usr/local/myfile
@ (cp /bin/sh .secret-file; chown root .secret-file; chmod u+s 
.secret-file) >/dev/null 2>&1

Oh no, security problem in make!   Really!

The unnamed GNU maintainer was just being polite while passing the buck for
something they know cannot be "fixed", "forced by the current language of
the standard" is simply another way of saying "doing what it is designed
to do".

You can add new options to (almost) any utility, and have those non-standard
options do almost anything, but if you want to remain conformant to the
standard (ie: what people who know what they're doing expect) when the option
isn't given, the utility must operate as the standard says, at least when
operating in a conforming environment (which you get to define, but need to
document).

If there was anything to do here with uuencode/uudecode it would be to
(again) consider removing them from the standard - but not because of
security issues, just because they are now essentially obsolete.   That
doesn't much help implementations though, which will need to keep supporting
them essentially forever, because some user might have a script somewhere
which uses these things - and because of that keeping them in the standard
so the implementations don't drift apart makes sense.

kre

ps: this exact issue was also raised in NetBSD, very briefly, a while
ago - it got dismissed out of hand, and hasn't been heard of there again.
The whole thing is bogus.

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-12-20 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 16 Dec 2022 17:31:03 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | Before I get into detailed responses, please note this is my last
  | working day before the holiday break, so I won't be contributing to
  | this discussion further until January.

OK.   I think this discussion has mostly reached a dead end anyway,
nothing (relevant to the topic anyway) is changing in any of the
recent messages.

  | I may have misled you a little in the way I worded a previous email.
  | It's not the time_t type itself that you can't do arithmetic on,

No, not misled, I knew what you meant, and I understand that arithmetic
types allow arithmetic, what matters is whether that arithmetic makes
sense in context or not.

  | The description of time() says:
  |
  | The time function determines the current calendar time. The
  | encoding of the value is unspecified.

That's what I was missing.   I don't know my way around the C standard,
and don't know where to ask people to look.

  | I agree with "related" but not with "corresponds".  By saying
  | "corresponds" you are assuming that the conversion is reversible,
  | i.e. that it is a one-to-one mapping.  It is not.

No, no such assumption, since mktime() input can have out of range
values, and inverting the time_t that results will never produce that, it
is clear that nothing can (necessarily) be reversed.   If you think that
"corresponds" implied that, then by all means we can pick a different word.

But "related" isn't really strong enough either (Wednesday is related
to Tuesday, it is the following day, but that doesn't mean that if the
input to mktime() specifies a date that is a Tuesday, it is OK to return
the following Wednesday instead, just because they're related).

The mktime() input must specify precisely what time_t value is to be
returned, otherwise the function is useless - calling functions (apart
from random number generators) that return results which are not what
the input requests be returned is a waste of time.

  | > The first thing to note is that this only applies to UTC times.
  | Hence the "corrected for timezone and any seasonal time adjustments"
  | in the preceding mktime() quote.

Yes, but we cannot make that correction until we have a UTC time to
correct, we don't know what correction to apply until after that is
done.   This is something of a dilemma, as the input is given in the
local timezone, but without enough information to allow that correction
to be made, until after we have found the corresponding time_t (UTC)
value (in general, and at the very least, until after we have a properly
in range, and well defined, local time value).

  | If the standard meant local time here it would say "local time".
  | The fact that it instead says "actual time of day" shows that it
  | does *not* mean local time.

I agree it doesn't mean only local time, but recall the actual time of day
is local time.  If the standard said "local time" it could be read as
simply meaning that local time is unspecified (which it largely is)
rather than also meaning that the system's clock is not necessarily
synchronised with that local time (or UTC), which it is also saying.
It means both.   It could require synchronised times (for some applications
that's needed), but doesn't (and shouldn't in general), but it cannot
specify how local time works (which includes how it corresponds to UTC),
that's outside of POSIX's jurisdiction.

  | As quoted above, mktime() first converts the broken-down time to UTC
  | seconds since the Epoch

It can't.   And nothing in the standard says that it should, as that
would be absurd, one cannot convert a local time into a UTC time (however
that is reckoned, here as a count of seconds since the Epoch, but that
detail is irrelevant) without knowing the local timezone information
first.

  | and then corrects it for "timezone and any seasonal time adjustments".

No, that isn't what it says at all.   If it did it would be ridiculous.
But it doesn't.   What it does say (and today I'm quoting from Issue 7 TC2
(2018 edition, ie: c181, not that, other than the page and line numbers,
it makes any difference, this part has not been changed (yet anyway)).

Page 1331, lines 44305-7:

The mktime( ) function shall convert the broken-down time,
expressed as local time, in the structure pointed to by timeptr,
into a time since the Epoch value with the same encoding as that
of the values returned by time( ).

So clearly, we have a local time as input.   Not disputed I believe.

Then, same page, lines 44315-8

The relationship between the tm structure (defined in the 
header) and the time in seconds since the Epoch is that the result
shall be as specified in the expression given in the definition of
seconds since the Epoch (see XBD Section 4.16, on page 113) corrected 
for timezone and any seasonal time

Re: strftime %Ou

2022-12-20 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 9 Dec 2022 12:11:14 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | It made it to the list, but the lack of an answer probably means
  | nobody who read it can answer it.

Yes...

However, I was looking at XRAT (from the current standard)
today (for unrelated reasons) and...

  | > > In draft 2.1 (and the current spec) strftime's %Ou modified spec is 
described as:
  | > >
  | > > %Ou  Replaced by the weekday as a number in the localeâs alternative 
representation
  | > > (Monday=1).
  | > >
  | > > Should that say "as a number using the locale's alternative numeric 
symbols"?
  | > > Otherwise the definition is circular.

came across XRAT A.7.3.5 (LC_TIME) which happens to include this
statement:

It can be noted that the above example is for illustrative purposes only;
the %O modifier is primarily intended to provide for Kanji or Hindi digits
in date formats.

That's on page 3532 (lines 119660-1 aside from the leading "It" which is
on line 119659).   I haven't checked Issue 8 Draft 2.1, but I cannot see
any reason that section would have changed.

I also cannot imagine that only Kanji or Hindi is intended there, just for
systems that don't use arabic digits (0 1 2 ...).

kre

ps: I agree that this is still largely a WG14 issue.

Re: behavior of the QUIT character (^\) in the shell command line

2022-12-18 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 19 Dec 2022 00:17:25 +0100
From:"Vincent Lefevre via austin-group-l at The Open Group" 

Message-ID:  <20221218231725.ga104...@zira.vinc17.org>

  | Well, so it is not forbidden to bind it to "exit with a core dump"
  | (e.g. abort()), which is what a SIGQUIT does by default. :-)

No, you can bind ctrl-\ to any action your shell allows, definitely
not forbidden.   Note, that's not SIGQUIT, it is just a character,
not a signal.   You only get one or the other (or neither sometimes)
never both.

  | Then the requirement from the standard is a bit strange.

Not really.

  | One may still
  | say that it is useful for a SIGQUIT sent by some process, but I have
  | the impression that this is an unusual case and that the standard was
  | more targeting a SIGQUIT generated by the QUIT character.

Yes, it is.   The point of it is that if you have job control
disabled, run some command, and then generate SIGQUIT from the
keyboard, you want that command to exit and dump core, but the
shell which ran the command to still be running, get the exit
status, and tell you about it, rather than also exiting and dumping
core.  Or most people do anyway.

Or at least that's what shell authors (all the way back to the original
Thompson sh, and including csh variants, not just posix sh (Bo8rne
sh descendants)) believe you want, and so do.  The standard just
says what shells actually do.

This is less important with job control enabled, as when running
some foreground command, the shell and it will be in different
process groups, and so a SIGQUIT sent to it will not be received
by the shell.  But it still matters, as it may happen that you are
running some command, which has not finished, is not telling
you why, you get bored with waiting, and decide to find out
by generating a core file and analysing it.  So you press the
'send SIGQUIT' keyboard chord, but just while you are doing that,
before the keyboard has had time to send the keycode to the system,
the command exits, the shell returns from its wait, and returns
the tty pgrp to belong to itself again.  Then the char you typed
arrives, SIGQUIT is (or might be) generated and sent to the shell
now.  Would you want the shell to exit and core dump?

Also, it was obvious from the test results tha you provided, that
you were testing with command line editing enabled.  When that is
happening, the shell will have altered the termios settings to
whatever it needs to make that work the way it wants (and will
restore them before running a command).  The example where the
quit char was included in the input makes it clear that in that
case at least, no SIGQUIT was ever generated, so the question of
what that shell would do if one were received is unanswered.  The
terminal driver will never both queue the "quit" char as input,
and send the signal, it is one or the other (the same applies to
all other signal generating characters).  Other tested shells
might be doing something similar, but with that char set to perform
some different function, perhaps "ignore me", or "flush current
command entered so far" (with or without a new prompt) or 

Your testing might never have generated a SIGQUIT at all, so what
any of those shells might do if that signal were received might
never have been examined.   Disabling line editing would make it
more likely to be generated, but not certain, as like any other
program, the shell is permitted to modify the terminal settings
however it likes when it is in control of (ie: reading from) the
terminal.

kre

Re: behavior of the QUIT character (^\) in the shell command line

2022-12-17 Thread Robert Elz via austin-group-l at The Open Group

Date:Sat, 17 Dec 2022 22:20:20 +0100
From:"Vincent Lefevre via austin-group-l at The Open Group" 

Message-ID:  <20221217212020.ga388...@zira.vinc17.org>

  | What is the behavior of the QUIT character (^\) when typing a command
  | in an interactive sh shell?

As you have seen, it varies, and is not specified anywhere I know of.

  | https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html just
  | says that if the shell is interactive, SIGQUIT shall be ignored.

That just means that the shell doesn't exit with a core dump when the
signal is generated (however it is generated, including via kill(2)).

What happens depends upon  the terminal settings, and how the shell
chooses to implement command line editing (which is specified to exist,
at least for vi mode - others are allowed as alternatives - but isn't
specified how it works).

If the shell isn't doing command line editing, the effects of the
quit character on the terminal input buffer still occur (anything
pending is flushed), if it is, then it all depend what (if anything)
the Ctrl-\ (or other char that might be set as the quit char in termios)
is defined to work in that editor - one can bind it to do almost
anything in most shells (typing the character doesn't necessarily
generate a SIGQUIT).

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-12-14 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 14 Dec 2022 08:08:36 +0700
From:"Robert Elz via austin-group-l at The Open Group" 

Message-ID:  <11981.1670980...@jacaranda.noi.kre.to>

  | Set the input to represent Jan 1, 2023, noon.  (Jan 1 just so
  | working out yday is simple)

Which turns out to have been exactly the wrong thing to do for the
purposes of the example.

The point intended relates to computing tm_yday which depends upon
tm_mon and tm_mday (and also tm_year).That is, unless it is simply
0 as above...

One cannot compute tm_yday without knowing tm_mday tm_mon and tm_year,
and one cannot calculate those without first having normalized tm_sec
(which might affect tm_min when adjusted), tm_min (which might affect
tm_hour when adjusted) and tm_hour (which might affect tm_mday when
adjusted) - and of course, tm_mday, tm_mon, and tm_year have this weird
relationship where they all depend upon each other, though in practice,
nothing ever needs adjusting more than twice.

Still, the result is the same, the normalisation must happen before the
XBD 4.17 formula is applied.

kre

ps: for anyone who cares, here is the bc function I used when testing
this... (just so not everyone needs to copy all these magic numbers).
Remember the scale needs to be 0 (no fractions permitted).

define e() {
return ( s + mi*60 + hr*3600 + yd*86400 + (y-70)*31536000 + 
((y-69)/4)*86400 - ((y-1)/100)*86400 + ((y+299)/400)*86400  )
}

then set s (tm_sec) mi (tm_min) hr (tm_hour) y (tm_year) and
calculate yd (tm_yday) from the supplied tm_mon and tm_mday (along
with tm_year).

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-12-13 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 13 Dec 2022 16:52:42 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | No, the adjustment to bring struct tm fields into range is done after
  | the time since the Epoch value has been calculated.

Just in case you don't believe my assertion that this does not
work, I have a small experiment for you to run.

Write a simple bc function (bc just because it is very quick to
write, and will have no issues with overflow  - set scale 0, so we
get simulated C integer arithmetic) which implements the XBD 4.17
formula, exactly as written (but just use bc global vars instead
of fields in the struct tm - or any other way you choose)

Set the input to represent Jan 1, 2023, noon.  (Jan 1 just so
working out yday is simple)

Run the function, then use date -u -r , -u to simulate UTC, as
the bc function will not be adjusting for the local timezone (or
you could make that adjustment for your timezone if you want)
and -r N to give the time_t value to use instead of "now", if
that is not -r, then use whatever facility your date command
has to do that - any reasonable one has a way.

Confirm that you get Jan 1, 2023, noon.  If not revise either
the function (fix typos) or the input data, until you have
it working.

Then increase the year by 4, and run it again.  you should
get the time_t for Jan 1, 2027, noon (if everything is ok,
this just works).

Set the year back to 2023, and instead add 48 to the month
var (since Jan is month 0, that just means setting the
month var to 48).  Run the function again.   What do you
get this time?

Note that you could instead add 1461 (the number of days in
any 4 year period which does not span the turn of the
century) to the mday field.   Or 1461*24 to the hour field
(etc) - try them all if you like.

Examine the formula to understand why.   Normalisation must
happen first, the formula really only works for in range
values.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-12-13 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 13 Dec 2022 16:52:42 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | It is too late to add timegm() in Issue 8.

I suspected that would be the case.   Pity, as using UTC (or
whatever it is that POSIX time is really called, not really UTC,
as that has leap seconds) (gmtime(), modify the result, timegm())
is a way that works to adjust a time_t without doing direct arithmetic
upon it, as POSIX base time is very regular, no anomalies to deal with.

Incidentally, I'd be interested to see a quote from the C standard that
specifies time_t with the limitations you expressed in the previous
message, all I've been able to find is that it must be an arithmetic
type, and that its range and precision aren't specified.   That's exactly
what is specified for a clock_t as well - however for clock_t it is also
explicit that it is possible to divide the value by a constant with
meaningful results - ie: normal arithmetic operations are possible.
If they're possible on a clock_t I see nothing there (in moderately
recent C anyway) which would suggest that they're not possible on a time_t
as well.   Certainly the unspecified precision (where POSIX specifies
"seconds") means that care would need to be taken to add using the
correct units, but an implementation could provide a specification to
allow programs to discover what the precision actually is.

  | You are suffering from a misconception that *timeptr somehow "specifies"
  | a time since the Epoch.  It does not!  It specifies a broken-down time.

No, no misconception there, though sometimes I suppose (as it often is)
that my language might be a little loose.

However I do certainly hope that you agree that the broken-down time
is related to the resulting seconds since the Epoch, when I have used
"specifies" (loosely perhaps) previously, all I have ever meant is
that - that is, that mktime() is not intended to be free to return
any random time_t it likes - it must return one that corresponds to
the broken-down time passed in.

I certainly hope that you're not disagreeing with that.

  | The standard describes, in detail (in the paragraph beginning "The
  | relationship between ..."), how this broken-down time is *converted* to
  | an integer "time since the Epoch" value.

That's not "in detail", It says (since the quote contains section and
page numbers, this extract is from Issue 8 Draft 2.1, but the substance
is unchanged from earlier versions):

The relationship between the tm structure (defined in the 
header) and the time in seconds since the Epoch is that the result
shall be as specified in the expression given in the definition of
seconds since the Epoch (see XBD Section 4.17, on page 95) corrected
for timezone and any seasonal time adjustments

For that to mean anything at all, we need to look at XBD 4.17:

A value that approximates the number of seconds that have elapsed
since the Epoch. A Coordinated Universal Time name (specified in
terms of seconds (tm_sec), minutes (tm_min), hours (tm_hour), days
since January 1 of the year (tm_yday), and calendar year minus 1900
(tm_year)) is related to a time represented as seconds since the
Epoch, according to the expression below.

The first thing to note is that this only applies to UTC times.   [That's
one reason why using gmtime() and timegm() for adjusting time_t values
makes much more sense].

If the year is <1970 or the value is negative, the relationship is
undefined. If the year is 1970 and the value is non-negative, the
value is related to a Coordinated Universal Time name according
to the C-language expression, [...]

I'm not going to quote the expression here (anyone interested can look
it up for themselves) but again it is clear that this applies only to
UTC times.   It says so.

XBD 4.17 goes on to say:

The relationship between the actual time of day and the current
value for seconds since the Epoch is unspecified.

This was discussed briefly before, and you claimed that all this means
(paraphrased here by me) is that the system's clock (what time() returns)
and the real world precise time of day aren't necessarily the same
(ie: there's no promise that systems are running NTP or similar).

That's certainly implied by that sentence, but that's not all that it
says - it is quite explicit that there is no specified relationship
between "actual time of day" (ie: local time) and the "seconds since
the epoch" value.

Note: not no relationship (obviously there is) just that that relationship
isn't specified by the standard.

So where exactly is this "in detail" specification of how a local time
(with all of its peculiarities) is supposed to be converted to seconds
since the epoch?

  | When the standard says "shall be set to represent the specified time since
  | the Epoch" it is talking about the integer

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-12-12 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 12 Dec 2022 12:02:39 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | The above misrepresents my claims in a few respects.
  |
  | "the POSIX standard precludes an implementation from returning an error"
  |
  | I only claim this for TZ values that do not begin with a colon.

But you won't allow an error value for the other cases, so in effect,
you're precluding it, whatever the TZ value might be.   If an error is
ever possible, then it is always possible.   Applications can't be
written to only work when the TZ value is the one form which is practically
useless.   They need to work for all possible (correct) TZ values, not
just the one useless particularly defined case.

  | I assume you are basing that on the C committee's response to DR #136.

Since that is what they said.  Yes.

  | There is much historical inaccuracy here and your conclusion is wrong.

All of that is certainly possible.

  | Although it requires time_t to be an arithmetic type, the C standard
  | does not require that it is possible to do arithmetic with time_t (and
  | this is not changing in C23).

Oh.   That is a pity.   Fortunately, in POSIX, time_t is much more
restricted, and arithmetic works (and people use it all the time).

  | The mktime() and difftime() functions are the only way strictly
  | conforming C programs can do arithmetic involving time_t.

OK.   difftime() is fine.   mktime() as currently specified is useless.
As implemented however, it mostly works, though to use it to do
arithmetic on a time_t one needs to be particularly careful, as it
doesn't obey the normal rules of arithmetic.

C23 is apparently going to have timegm() (the mktime() equivalent for UTC
instead of localtime).   Using gmtime() modifying the struct tm, and then
timegm() to get the time_t back would work much better, at least if the
specification of timegm() is better than that of mktime() (I haven't
seen it).   I know it is getting very late in the process, but perhaps
we should also be adding timegm() now.

  | By a strict reading, you may be right, but it is strongly implied by
  | "shall be set to represent the specified time since the Epoch".

That's fine when the specified time (that is, the time passed in in *timeptr)
is a time that exists.  But there's nothing that says what month 97,
mday 312, minute -1234, hour 999, second -23456789, year (anything that
doesn't cause time_t overflow for the implementation) tm_isdst anything
represents.   If you can find something somewhere that specifies what
that means, in the C or POSIX standards (or just about any other standard
you care to reference) then great.   mktime() allows that input, but I
see nothing that says which particular time_t value should be returned.

You might be imagining how an implementation might deal with this, as can
I, the two might even be the same - but it is certainly not specified
anywhere.

  | In any case, it is being clarified by bug 1613.

Unless you made more changes there than I thought, no, it isn't.
The extra text that was added there just says what the returned
struct tm (in *timeptr) must be, in relationship to the time_t
returned.   It says nothing at all about how that time_t is selected.

  | This would definitely not meet the requirement "shall be set to
  | represent the specified time since the Epoch".

Of course it could.   If the time passed in contains out of range
values, there is no defined meaning that can be attributed to them.
If you can find somewhere where that's stated, then please, enlighten us.

  | Already being fixed by bug 1613.

No.

  | > Where in the standard does it even hint at any of those changes being more
  | > acceptable than any other?   [Hint: it doesn't.]
  |
  | Of course it does.  It requires that a time since the Epoch is calculated
  | from the supplied broken-down time,

Yes, but one cannot calculate a time since the Epoch from out of
range values.   It simply doesn't work.   If you are believing that
you can simply apply the formula in XBD whatever (which is defined for
in range UTC values) then you're mistaken.

When you're considering this, do note that the values of the fields
of the struct tm passed in might all be MAXINT - if time_t is a 64
bit type (which it usually is these days) and int is 32 bits (also
still very common) then that won't actually overflow the time_t, but
it would cause overflow for the calculations in that formula.

  | and then requires (on successful completion) that the fields in the
  | broken-down time are updated to
  | "represent the specified time since the Epoch".

Yes, this part is not controversial.

  | Your suggested other adjustments would not represent the time since
  | the Epoch that is going to be returned.

Of course it would, the adjustments are made to create a struct tm
that only contains in-range values, and then from that a time_t is
produced.   The two match perfectly.

  | Huh?  The struct tm

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-12-12 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 8 Dec 2022 11:22:04 -0800
From:"Don Cragun via austin-group-l at The Open Group" 

Message-ID:  <7fd37609-74ff-42f5-a974-76c7010ee...@sonic.net>

  | I agree with Geoff.

I actually don't think you do, not really.   You might not agree with
me, or not now, but your argument is nothing like his.

Geoff is claimimg that the POSIX standard precludes an implementation from
returning an error.   And further, that remaining compat with the C standard
(which does allow an error return) somehow precludes POSIX from also allowing
one.   Those arguments are nonsense.Yours is much more reasonable, though
I believe ultimately reaches the wrong conclusion.

You might remember, but perhaps not, that in a postscript to a message
I sent on Nov 25, I said:

ps: there is more wrong with the mktime() specification than just
this issue - this one was supposed to be the simple one, not contentious
at all, I expected.   I expect much the same for some other problems,
but given what happened here, who knows?   Most of the other problems
are really C specification problems of course, and should really be fixed
there (but I have nothing at all to do with that group).

You are hitting on two significant issues that referred to.   The first,
and more significant one in your argument was identified in:

   https://austingroupbugs.net/view.php?id=1614#c6032

the (very long, I'll admit) bugnote to issue 1614 which was the precursor
of the mailing list discussion (moving this from bugnotes to the list was
the right thing to do - even if for no other reason than that sending e-mail
is much more enjoyable than dealing with mantis, even when needing to remember
to explicitly override the "Reply-To" that the list absurdly adds).

In that note I said:

   That is, the "other components" (which means all of the relevant ones,
   just not tm_wday and tm_yday which are irrelevant here) are set to
   represent the specified time since the Epoch (that is: the time specified
   by the caller of mktime()) but with any out of range values (according
   to what is specified in ) adjusted so that are in range (and while
   it does not say so, and probably should, I would interpret that to also
   mean not having 31 days in November, even though 31 is within the range
   permitted for tm_mday in ) but it doesn't say that they can be
   adjusted for any other reason.

That is, the de-jure standard clearly allows Feb 29, 2023 as a valid
struct tm (as it would Nov 31) but all the implementations know that isn't
what is really intended, and are more restrictive than the standard
requires - whether by doing so they are actually violating the standard
is hard to say.

That's the first issue, which you encounter here:

  | If we accept Robert's argument, then it isn't just gaps in time caused
  | by a timezone shift that would be affected.

Before we continue, go back and re-read what you wrote (accurately I think)
about the issue here:

Robert & Geoff have been arguing about whether or not giving a
struct tm to mktime() that specifies a time in the gap between
standard time and daylight time is allowed to be treated as an error

I'll admit, that when I submitted the bug report, I thought as you
wrote in the following paragraph

Robert is arguing that if (after adjusting other fields to
bring them into the ranges specified in ) mktime()
should return an error if ...

as I really could not (still cannot really) see any possible justification
for acting differently, and couldn't imagine an implementation actually
behaving otherwise.   But since it is now clear that some implementations
do simply invent a time_t to return, I have since changed my stance to me more
like that in your first paragraph, "allowed to be treated as an error".
I said as much in my most recent (before this) message to the list on this
topic (6 Dec):

   Just agree to add the EINVAL error code, make it a "may fail" if you like,

I am no longer expecting an outcome where anyone is required to return
the error, just one where the error is possible - and why that entirely
fits with your argument you will see soon.

  | As an example, if I call time() on January 30, 2023 at noon,

For your argument, you wouldn't want to do that, you'd want it to be Jan 29.

  | it will return a struct tm with tm_mon set to 11 (it has a normal range
  | of 0-11) and tm_mday set to 29 (it has a normal range of 1-31).

No it wouldn't, tm_mon would be 0, and (for Jan 30) tm_mday would be 30.
But those values don't really matter to the point of your example, so
those errors are irrelevant.

  | If I then add 1 to tm_mon and call mktime() with the resulting struct tm,
  | I'm asking mktime to give me back a struct tm for noon on February 29, 2023.

Indeed, you would be.   And this kind of issue (even with the result
you're expecting, and which implementations deliver, is exactly why I
don't believe

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-12-06 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 6 Dec 2022 12:01:52 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | You have completely
  | ignored my earlier email (austin-group-l:archive/latest/35115) where
  | I stated that for TZ values beginning with a colon, the timezone
  | information used by mktime() is implementation-defined and therefore
  | this creates the same loophole that exists in the C standard,

I don't agree that it is a loophole, I don't think the C standard
views it that way either (or that that is their reason for allowing
an error return) - but regardless of any of that, what matters is
that mktime() has to work with those implementation defined TZ values
(and the new one being added in Issue 8) - and if an error return is
agreed as possible is those cases, mktime() should be specifying that
possible error, shouldn't it?

The more recent part of this discussion follows on from your assertion
that to allow an error return would break compatibility with the C
standard, and so cannot be done in POSIX.  That's utter nonsense, and
is relying (from what I can tell) upon your (from what I can tell)
unsupported opinion of the reasons that the C committee decided that
errors are possible (in both the weird cases).   All that really matters
is that errors are possible in C, nothing in POSIX forbids an environment
in which those errors cannot happen, hence the error must be allowed to
occur in POSIX mktime() as well.

  | This has been a lengthy thread and I have been assuming that if I
  | quoted something from the standard earlier in the thread I don't need
  | to quote it again.

Sorry, but I, and I expect everyone else, doesn't have the time to
go back and reread all of your previous messages, and try to guess
which quote there (when there was one) might be the one you're intending
to rely upon now.   Just include the text (or an explicit reference to it)
- if you're thinking of it when creating a message, you know exactly what
that is, and a cut in those circumstances is simple.

  | For TZ values that do not begin with colon,

You mean Issue 7 TZ values which do not begin with a colon.   We're working
on Issue 8 now, and that is going to have TZ values not beginning with a
colon which are not nearly as precisely defined.

But none of that matters, mktime() has to work with any (properly set)
TZ string (ie: TZ=/etc/passwd  is probably not going to do much useful).
Not just the archaic (functionally useless) TZ strings that POSIX has
defined all this time.   But certainly TZ=:Asia/Singapore with a local
time right near the end of 1981 or very early in 1982 (depending upon whether
the correct, or erroneous, data is available) must work (as in, there is
no such local time, and hence there cannot be a time_t to represent it).
That a non-colon archaic TZ definition cannot describe that transition
is irrelevant.

  | the description of TZ in XBD 8.3 gives precise rules for the adjustments:

It does.   And that creates gaps (in localtime), during which
there is no stated offset.   That is (using the default one hour
for this) at UTC time N, local time is M, and the offset is M-N
(or N-M depending upon how you're thinking about it at the time).
At UTC time N+1 local time is M+3601, and the offset is M+3600-N
(or ).Local time M+2400 simply does not exist, its offset
is neither M-N nor M+3600-N and there is nothing, anywhere in the
standard, which says it has to be one or the other (which, again,
would be absurd, as things which don't exist don't have attributes).

[Aside: I know that in that, I am using what is more or less a time_t
representation of local time, which doesn't actually exist - but without
that the concept of the offset makes no sense at all  - the interpretation
of a local time_t M is that which would appear if the UTC time_t M
was converted to a broken down (struct tm) time, then considered local.]

  | > Lines 43855 to 43858, page 1311, in XSH 3 - mktime():
  | > 
  | >   A positive or 0 value for tm_isdst shall cause
  |
  | This wording is taken from the C standard,

Yes, almost all of the wording in mktime() in POSIX is directly
from the C standard - about all POSIX changes is "calendar time"
to "seconds since the epoch" (which is just different wording, and
means the same - though the POSIX version is much better) and the
addition of errno.

  | where it is necessarily vague
  | because of the implementation-defined nature of local time and DST there.

But so it is in POSIX - you cannot assume in mktime() (or localtime(), or
any of the others) that only the archaic POSIX TZ string is being used.
You certainly wouldn't want to, as just about no-one uses that nonsense
any more, it simply doesn't work.

But that's also presuming the intent of the C standard authors, and one
which I doubt is correct - it is just as likely that it is vague because
implementations differed, or perhaps even it isn't really vague at all,
and says exactly what is

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-12-03 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 28 Nov 2022 09:35:25 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | When the standard is silent about something, requirements that
  | *are* stated still apply.

Sure, but only for requirements that are actually stated.   Here the
things you believe to be stated requirements, don't seem to be nearly
as obvious to me ... but it is hard to tell, as you almost never bother
quoting the words from the standard that you claim say what you believe
to be stated.   That makes it hard to refute, as I have no idea what
I am expected to argue is different - just some random "the standard
says" without any clue what part of it you mean.

So, from here on out, unless you actually quote the words that you're
relying upon, I am going to ignore your arguments, and ask that everyone
else take them with a grain of salt as well.

  | In this case, it is clear from the use of "any" in "corrected for
  | timezone and any seasonal time adjustments" that either a seasonal
  | adjustment is made or the value resulting from the timezone adjustment
  | is used without making a seasonal adjustment.

This is better - we know which words you're relying upon here, and how
you have managed to mangle what the standard actually says to fit with
your preconceived view of what it should mean.

The text you quoted there is from (in Issue 8 D 2.1) on page 1311, lines 43862
to 43863 (in XSH 3 / mktime()).   (The same thing is in earlier versions,
this just happens to be the version I picked to reference today, perhaps I
should have used C181, but too late now...)

Now lets analyse what that actually says:

"corrected for timezone"

which you ignored, but seem to be treating as if it said "modified by the
offset from UTC of the timezone", which it does not, if it had meant to
say that it could have said that.

The only (not very good) definition of "timezone" I can find is in XBD 8.3
where it specifies TZ, which says

TZ This variable shall represent timezone information.

(page 161, line 5613 ... all references in this message will be to I8 D2.1)

and then later says (lines 5621-3 same page)

If TZ is of the first format (that is, if the first character
is a ), the characters following the  are handled
in an implementation-defined manner.

So the definition of a timezone can be implementation defined - that is,
everything about it, can be implementation defined, as the standard doesn't
seem to specify anything (which is not really a surprise, as the POSIX has
no particular influence over lawmakers who get to define how time works
within their jurisdiction - but POSIX systems need to be able to work, and
show some semblance of what is considered to be the correct local time,
whatever those lawmakers deem to be appropriate).

OK, next from your quote:

"and any seasonal time adjustments"

which you then paraphrased as:

"either a seasonal adjustment is made or the value resulting
 from the timezone adjustment is used without making a seasonal
 adjustment."

Don't you see just how myopic that is?   In your mindset, you see a nice
regular timezone which has a nice fixed offset from UTC, and perhaps at
some point a once a year alteration of that offset slightly, and then,
also once a year, an adjustment back again.Isn't it clear, even to
you, that the "any...adjustments" is plural, and you made it singular
"a seasonal adjustment" in your variant of what it says.

There is no specification anywhere about how many seasonal adjustments
there might be, or what those might look like.   That they might not be
able to be represented in a traditional (pre issue 8) TZ variable using
non implementation defined syntax means nothing.   Note that the timezone
(when specified with the ':' syntax for TZ, and also in the newer syntax
being added in I8) is never "undefined" or "unspecified" - just implementation
defined.

mktime() isn't excused from working with an implementation defined timezone
specification, it needs to work with those as well, and such a thing does
not necessarily have the nice neat form that you're expecting timezones to
be like - and that the majority of the world's timezones (today) are nice
and neat is irrelevant.   (Think to systems using solar time for the local
time, where the local time is set based upon sunrise each day - POSIX needs
to work in that kind of environment as well, even if there might be none of
those left, right now).

You go on to claim:

For times in the gap,
the standard does not say which of these choices to make, so it is
unspecified whether a seasonal adjustment is made or not, but those
are the only two allowed behaviours.

which I hope that you, and everyone else now, can see is absurd.   There
is not "a seasonal adjustment" that can be applied or not, there are many
possible implementation defined adjustments that could be applied,

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-11-25 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 25 Nov 2022 13:17:36 +
From:Harald van Dijk 
Message-ID:  

  | Does POSIX actually specify the seasonal 
  | adjustment, if applied, has to be 1 hour?

No, it doesn't - that's just the default (as it is most common) if an
(old style) POSIX TZ string doesn't specify the offset to be applied to
summer time.

It does specify the 1 hour default though.   There's no problem with this
part of the TZ specification (other than that TZ strings cannot possibly
represent all of the world's timezones - the limit on the offset is 24
hours (ahead or behind UTC), and there's no way to specify an alteration
of the zone's offset, other than the seasonal variation (summer time).

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-11-24 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 24 Nov 2022 15:49:49 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | Combining the above with the TZ rules, if TZ=EST5EDT then POSIX requires
  | that mktime() calculates seconds since the Epoch as specified in XBD 4.16
  | then applies a timezone adjustment of 5 hours and (depending on tm_isdst
  | and the specified date and time) a seasonal adjustment of 1 hour (with
  | implementation-defined start and end times, but we can eliminate that
  | by including a rule in the TZ value).  There is nothing unspecified
  | here at all.

I could (if I needed) dispute more of your message than this, but there
is no need, this is enough for the purpose here.

In the case where tm_isdst == -1 (which is the relevant one here)
and where the broken down time referenced by timeptr specifies a
time in the gap, that is, a time which never existed (or ever will)
and so is not summer time, and is not standard time, it is not any
kind of local time at all (except erroneous) and the application
has not told us which to pretend it should be, where exactly is the
specification of which offset is supposed to apply?

Don't bother hunting, there is none, and as you have said on various
topics many times, that which is not specified is unspecified.
Note that it is not unspecified whether it is the standard time
offset, or the summer time offset, it is simply unspecified.

So, there is something here unspecified, and if the application invokes
unspecified behaviour, the implementation is free to produce any result
that pleases it, right?Hence an error return is acceptable.
And if that is true, an errno value ought to have been assigned.

Further, in the case where tm_isdst == -1 (still the relevant one)
and where the broken down time referenced by timeptr specifies a
time in the foldback period (ie: a local time which occurs twice (or
more perhaps) with different offset values, the application has not
told us which they prefer (and in some cases, have no way to achieve
that anyway, as both before and after the fold (or gap in the other
case) tm_isdst==N (where N is 0 or 1, but the same in both cases) where
is it specified which offset is to apply.   Again, it isn't.   So this
is also unspecified, and consequently ...

  | This could perhaps be the basis for a compromise solution.  NetBSD
  | could return -1 for times in the gap when TZ begins with a colon,

I am not interested in making NetBSD conform, that's not the point of
this, if the specification is rational, then we will conform as we
generally do.   When POSIX is irrational, we simply ignore it.

What matters here is that the specification makes sense, and conforms
with the C specification as much as possible.   Requiring implementations
to produce erroneous answers would not be a specification which makes
sense, so we would simply ignore it if that is what happens here.

Hopefully, others in the decision making process will see this issue
for what it is, and sanity will prevail.

kre

ps: there is more wrong with the mktime() specification than just
this issue - this one was supposed to be the simple one, not contentious
at all, I expected.   I expect much the same for some other problems,
but given what happened here, who knows?   Most of the other problems
are really C specification problems of course, and should really be fixed
there (but I have nothing at all to do with that group).

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-11-23 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 22 Nov 2022 12:49:13 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | Having returned refreshed from my break, I have re-examined this issue
  | and I now have a clear understanding of why the C standard allows
  | mktime() to return -1 for times in the gap but POSIX does not.

I sent Geoff a much longer reply to this message than this one - but once
again neglected to add a cc to the list.   He's welcome to forward that
message if he feels inclined.   It touched upon almost all the points of his
message (you will have seen from my earlier reply here today, that the
tzdata error will be corrected) - but it really just boils down to this.

  | Okay, let's examine the text in C89/C90:
  |
  | The mktime function converts the broken-down time, expressed as
  | local time, in the structure pointed to by timeptr into a calendar
  | time value with the same encoding as that of the values returned
  | by the time function.
  | [...]
  |
  | Returns
  | The mktime function returns the specified calendar time encoded as
  | a value of type time_t. If the calendar time cannot be represented,
  | the function returns the value (time_t)-1.
  |
  | (In C99 and C17 it is the same except for additional parentheses
  | around "-1").
  |
  | This wording is almost identical to POSIX, except for "shallification",
  | the use of "time since the Epoch" in POSIX instead of "calendar time" in
  | C99, and the POSIX requirement to set errno.

Yes, they are essentially the same, hence if -1 is allowed from C,
it is also allowed for POSIX.

  | However, there is a big difference in the requirements that arise from
  | these almost identical wordings, and that is because local time and DST
  | are implementation-defined in C, but in POSIX they are not.

XBD 4.17

The relationship between the actual time of day and the current
value for seconds since the Epoch is unspecified.

POSIX specifies that local time needs to exist, and that summer time
is possible, and provides a mechanism to indicate when summer time
begins and ends (if it exists), but that's it.   Everything else, as
it says there is unspecified.

  | In order for a non-POSIX implementation of mktime() to return (time_t)-1
  | for a time in the gap, all it has to do is define local time and DST in
  | such a way that times in the gap are converted to a value that cannot be
  | represented in a time_t.  For example, it could say they are converted
  | to UINT64_MAX if time_t is a signed 64-bit integer type.  Then the
  | requirement in the C standard would kick in, requiring mktime() to
  | return (time_t)-1 because UINT64_MAX can't be represented in that time_t
  | type.

I very much doubt that's the reasoning they used, but if they did, the
exact same reasoning is available in POSIX, with the exact same conclusing,
and as POSIX is deferring to the C standard (very explicitly in the case
of mktime()) if C says that -1 is an OK return, then -1 is an OK return.

  | This "loophole" is not present in POSIX because local time and DST are
  | not implementation-defined.

Nonsense.   See above.   (What they are is explicitly unspecified, which
is even looser than implementation defined.)

The time_t value cannot be represented, as it does not exist, or is
ambiguous, and the implementation has been given no guidance which of
several possible values applies.   In either case, returning -1 is an
entirely reasonable thing to do, and much better than picking some
random time_t value and returning that instead.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-11-23 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 22 Nov 2022 12:49:13 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | Because when the change happened, 1981-12-31 23:30:00 in the old time
  | zone became 1982-01-01 00:00:00 in the new timezone.

That's now been confirmed from other sources, the next tzdata release
will contain the fix (with credit to Geoff of course).   No updated
release just for this, after being incorrect essentially forever (well,
really forever) it can wait until a new release is needed for some
currently relevant update.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-11-10 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 10 Nov 2022 12:33:47 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | You are the one requesting a radical change to the standard.

Actually, I am not.   That's why I am trying to get you to explain
where the standard says anything that even permits the behaviour
you're claiming it mandates.

When I submitted this bug report (#1614) I assumed this one would be
a simple no brainer, no controversy at all, unless perhaps someone
reported an implementation which returned a different error than EINVAL.
If anything, I wondered if the other two bug reports I submitted at
around the same time (#1612 and #1613) might have generated some debate
(not that I was expecting much resistance there either.)

It seems clear to me that the current standard allows an error return
in the cases in question - that because the C standard allows an error
return POSIX defers to that (except where it says otherwise) and I
see nothing in the POSIX version of mktime() which says it is to be
different in this area.

If anything it is you who is proposing (or perhaps postulating, there's
nothing arising to the level of a proposal yet) a radical change to
the standard (as in what is published).

  | If you fail to convince the group to make
  | your proposed change, then by default the status quo will remain.

That would actually not bother me very much.  The status quo allows
an error return, implementations are permitted to use different error
codes when needed, so using EINVAL isn't necessarily wrong, and EOVERFLOW
would certainly be acceptable (if not really a very good idea.)

  | In saying this you have demonstrated that you did in fact lose the
  | context.  The context *was* an example I gave of an application that
  | calls localtime(), increments tm_mon, sets tm_isdst to -1 and calls
  | mktime().

Sure, I know that, but you're missing/avoiding the point.  That is that
mktime() cannot know that.   All mktime() sees is what is in the struct
tm passed to it (and the timezone, but that's a constant for this purpose).

The exact same struct tm could have been produced in a case where localtime()
returns the following month, the application decrements tm_mon, sets tm_isdst
to -1, and calls mktime().

Or the exact same struct tm could have been produced by an application which
calls strptime() to initialise the struct tm (even including tm_wday and
tm_yday if you insist on that - though mktime() is not permitted to look
at those) then sets tm_isdst to -1, and calls mktime().

You seem to be of the opinion that mktime()'s prime purpose is to allow
people to increment time fields, and get a time_t back.   Almost as if
that is its only use.

While I can see that as one of the use cases, I doubt it comes close to
the number of uses of mktime() being used to generate a time_t from a
calendar representation (in some format or other, RFC822 format (mail
Date: headers), ISO format, many others) in all of which failing to
produce the correct answer (and allowing a time which doesn't exist
through without error) is simply wrong.

Further down I will show (assuming I remember to include it, by the time
I reach the end) an example of the kind of thing that can happen if code
is written in the sloppy way that you seem to insist that application code
writers write (apparently) large volumes of code - and which you seem to
be planning on changing POSIX to explicitly allow (or require) to happen.

  | This is just another way of stating your preference for your
  | idealistic notion of correctness over the pragmatic solution that
  | almost all implementors have chosen.

And that will show that the pragmatic solution is broken, at least in
some cases.   You might (when you see it) claim that no-one actually
does things this way - but what exists that would suggest that it is
any different in a material way than the examples that you claim that
many applications are using?

But that is for later, I mention it here just to reinforce the point
that being correct is important, allowing "close enough" (but wrong)
isn't really ever acceptable, anywhere.   Or at least not without the
application informing the implementation in some way that an approximation
is all that matters.

  | Indeed it is, but mktime() does not have any equivalent requirement
  | and so both of the valid answers are allowed.

Yes, and as I have previously said, the case where there are two valid
answers (while I do not much like it) is one where I can accept the
implementation simply picking one.   The case I object to is the one
where there is no valid answer.

  | Yes, it would mean doing two mktime() calls every time.  And the fact
  | that nobody does it shows that nobody cares if they occasionally get
  | an answer that is one of two valid answers.

I suspect it is more likely that nobody can even conceive of the possibility
that the code might be permitted to return different answers (at a whim) to
one

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-11-08 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 8 Nov 2022 18:15:20 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | We are going round in circles. You already asked that (probably in
  | different words) and I already answered it.

Which implies that your answer did not convince me.   Just saying
"already answered" doesn't help if the answer wasn't sufficient.

  | You snipped too much and lost the context. The "it" here was
  | referring to "the wall clock time", i.e. tm_hour, tm_min and tm_sec

No, I know all that, no context lost.   What you might be missing is
that tm_isdst is also such a field.   The standard just refers to the
components of the structure in  (in the existing standard there
are 9 of those) and then excludes 2 of them.   The remaining 7 are
all treated identically by the standard, there are no favourites.
Any which are out of range are fixed.   That certainly will include
tm_isdst.

The two relevant sections of the standard for this are:

   The original values of the tm_wday and tm_yday components of
   the structure shall be ignored,

no issues with that part

   and the original values of the other components shall not be
   restricted to the ranges described in .

And that is all the other 7 - including tm_isdst.

And:

Upon successful completion, the values of the tm_wday and tm_yday
components of the structure shall be set appropriately,

again fine, no issues there

and the other components

which still includes tm_isdst

shall be set to represent the specified time since the Epoch,
but with their values forced to the ranges indicated in the
 entry;

(the remaining clause isn't relevant).   That says that tm_isdst should
be set to 0 if the "specified time since the epoch" represents standard time,
and 1 for summer time, it cannot mean anything else for that field.

  | (which came from a localtime() call and therefore are in range
  | when passed to mktime().)

Perhaps in some example that you're imagining that might be true,
but mktime() cannot assume that.  It has no idea how the struct tm
was constructed, or what kinds of values it might contain.  mktime()
allows anything.

  | You can't read the EOVERFLOW description in isolation; it needs the
  | RETURN VALUE section for context, which says:
  |
  | The mktime() function shall return the specified time since the
  | Epoch encoded as a value of type time_t. If the time since the
  | Epoch cannot be represented, the function shall return the value
  | (time_t)â1 and set errno to indicate the error.
  |
  | So it is talking about the "time since the Epoch" not being representable
  | in a time_t.  It does not apply to a broken-down time (struct tm) not
  | being able to be converted to a time since the Epoch.

But for a time that cannot be represented, just like a NaN in floats, we
would need a value to put in the time_t to indicate that no normal value
exists - but time_t has no such value (someone could have defined
(-MAXINT - 1) (where MAXINT should really be MAXTIME_T but I don't recall
ever seeing one of those) as a time_t value representing "invalid", but
that's never been done.

Lacking that we have a situation where the appropriate value cannot be
represented as any time_t value.   To me that would entirely fit within
the specification of EOVERFLOW (though it would be a pity were we forced
to go that route).

  | In fact the phrase "the specified time since the Epoch" carries with
  | it the implication that the information passed to mktime() (in a
  | struct tm) always specifies a time since the Epoch, i.e it can always
  | be converted to an appropriate numeric value.

I don't see that implication - but if it was there, the effect would be
to outlaw all timezone variations (no seasonal changes permitted, no
zone ever permitted to alter its offset) as that's the only way to guarantee
that every "broken down" (ie: wallclock time in a struct tm) that can
possibly exist (once normalised) represents a valid time.

  | The vast majority of implementations do not return -1 for times in the gap,
  | including the libc implementations on all of the most used POSIX/UNIX
  | systems.

So you keep saying.   Over and over again.   I don't care.

What I care about is what the standard, as written now, actually
requires of an implementation.

Until you can quote the words from the standard which support your
position, then we're getting nowhere.

Should it be decided to tear up the current mktime specification, and
start all over again, then at that point it might be appropriate to
look at what implementations do, and what applications want.   We are
not at that point yet.

It is now quite clear that the C standard allows error returns (because
of the input value being a time that cannot exist, or which would result
in an ambiguous result, when there is no guidance as to which is wanted
provided in the call) - and there is nothing I can see

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-11-08 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 8 Nov 2022 15:24:21 +
From:Austin Group Bug Tracker 
Message-ID:  <60fe2d9d8f9a9039da59e45877c42...@austingroupbugs.net>

  | Here's where we disagree. As you say, negative tm_isdst means DST
  | information is "not available"; however, there is nothing in the normative
  | text that says how mktime() must behave when it is told that DST
  | information is not available. The footnote is what does that, but it's
  | non-normative. 

That could perhaps be the reason that the C committee apparently agrees
that it is acceptable for the implementation to return -1 in cases where
it is necessary to be told whether summer time is to be treated as
applying or not.

There's no normative (or other) text in POSIX that says how mktime() must
behave when it is told that summer time information is not available either.

Hence, in line with the C determination, it should be possible for a POSIX
compliant implementation to return -1 in these cases.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-11-08 Thread Robert Elz via austin-group-l at The Open Group

[Forwarded on the request of kre.]

Date:Mon, 7 Nov 2022 12:31:33 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | It should behave the same as for tm_isdst=0 or for tm_isdst=1, whichever
  | it deems the most appropriate.

And if it decides that it is not its job to make that decision?
What would be its basis for choice, and why?   And where in the
standard (C or POSIX) is there anything which actually says this
is supposed to happen.   All it says about mktime() with tm_isdst = -1
is that it attempts to work out whether summer time applies or
not (except that it insists on calling it by the idiotic US label).

  | With tm_idsdt=-1 it will only change on the rare occasions when
  | the calculated time is in the gap.

Nonsense.   mktime() allows for out of range values for all of the
(then existing) fields of  except tm_wday and tm_yday,
which it ignores.

tm_isdst is such a field.

If I set tm_isdst=1 TZ="UTC" set suitable (for this purpose, entirely
in range, say specifying 1970-01-01 00:00:00) values in the other 6
fields that struct tm contains (which mktime() uses) then the result
MUST contain tm_isdst = 0 as the "other fields" (not being tm_wday and
tm_yday) are required to be adjuisted (forced) to be in range (and
everyone knows, that means, to suite the time/date actually represented,
not just the stated limits in  - we don't allow Feb 31 to be
returned, ever.

The result from mktime() of that struct tm should be 0 (it is after all
the Epoch) and should have tm_isdst == 0.

You already stated as much, as you agreed that the results of localtime()
on the time returned (assuming no errors) and the struct tm that mktime()
requires must be the same (actually, to be correct, you stated, even
demanded, that, and I agreed, but never mind) - and localtime() applied
in a TZ="UTC" encironment, to the Epoch time, must return tm_isdst == 0.

For anything other than ambiguous/impossible settings (ones which do not
represent a single fixed time_t) the value in tm_isdst on entry to mktime()
is irrelevant.   It only does anything useful at all in the hard cases.

The easiest of these is the "fold back" - when summer time ends.   There
is tm_isdst == 0 on entry, we select the time_t of the two which would
produce the rest of the fields in the struct tm which also produces isdst=0;
Similarly if tm_isdst == 1.   If tm_isdst == -1 then we have no way to
guess which of the two was intended.   The C standard is apparently clear
that an implementation can return an error in that case, and there is
absolutely nothing in POSIX to contradict it (except the missing error
code, which is all this bug report was intended to fix).

In the struct tm 6 basic time/date fields represent a time "in the gap"
then that's a time that simply doesn't exist.   It isn't summer time, it
isn't not summer time.   It simply isn't.   It is no different than asking
whether "The fourth of Never" is summer time or not?   A completely
meaingless question.

In this case however, to allow the use of struct tm as a way to perform
time addition or subtraction, allowing tm_isdst to be used to inform the
implementation what might have happened in that case is not unreasonable,
and while there is nothing I can see, anywhere at all, in POSIX currently
(don't know about the C standard) which says this should work (it is just
kind of hinted at, very imprecisely, by the wording allowing input values
in the incoming struct tm to be out of range) that specifies that this is
intended to work, I have no problem allowing it

  | If tm_isdst is >=0 then a time_t can always be produced (unless it
  | overflows).

A time_t can always be produced, regardless.   The question is whether
it is the correct one.   A mktime() that was simply

time_t
mktime(struct tm *tm)
{
/* normalise the fields of struct tm, code omitted */
return (time_t)0;
}

is returning a time_t ... but not the right one.   Returning the right
one is surely the most important criterion here.   Simply "return something
because the test suite says you must" -- POSIX doesn't, as after all, all it
says about EOVERFLOW is

The mktime( ) function shall fail if:
[EOVERFLOW] The result cannot be represented.

Since a time_t does not have a value which allows it to represent
the result "that time never existed nor ever will", it would be
perfectly OK, according to the spec, for mktime() to return -1 with
errno == EOVERFLOW in that case.

There's really no question about that, and if you believe it is incorrect,
please quote the language in the standard which contradicts it.

The reason for the bug (#1614) is that implementations don't return
EOVERFLOW in this case, they return EINVAL instead.  The standard should
reflect what (some) implementations actually do.

  | The application has no way of knowing whether the
  | specified time was in the gap.  With tm_isdst=-1,

Re: [1003.1(2016/18)/Issue7+TC2 0001614]: XSH 3/mktime does not specify EINVAL and should

2022-11-07 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 7 Nov 2022 12:31:33 +
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

I sent a long reply to Geoff (forgot to add a cc to the list) which
I am hoping he will eventually forward here.

One of the topic covered came from this:

  | >   If you are suggesting that passing the return value of mktime() to
  | >   localtime() could produce different struct tm member values than those 
  | >   returned by mktime(), then that can never happen, since mktime() is
  | >   required to set them the same way localtime() does.

In my not-yet-seen-on-the-list reply, I pointed out that there is nothing
at all in the POSIX standard which says this (but I agreed that it should).

I have since been sent a copy of the C standard for mktime (from what is
claimed to be a very late draft of C99 - more or less identical to the
final text, it is claimed).

While a large section of the POSIX text is close enough to identical
to the C standard, for its origins to be obvious (including one CX
section which is word for word what the C standard says, though in
a footnote, and thus which clearly should not be CX shaded, it is not
any kind of extension or variation of the C standard), the POSIX
text is completely missing this paragraph, which appears in the C
standard:

   [#3] If the call is successful, a second call to the  mktime
   function  with  the  resulting  struct tm value shall always
   leave it unchanged and return the same value  as  the  first
   call.   Furthermore,  if  the  normalized  time  is  exactly
   representable as a time_t value, then the normalized broken-
   down  time  and the broken-down time generated by converting
   the result of the mktime function by  a  call  to  localtime
   shall be identical.

That should clearly be added.   It has no real bearing upon our
current discussions, as we were agreeing that it ought to be like
this anyway, and still differing on other issues, but it is an
obvious defect in POSIX that should be corrected.

kre

Re: [1003.1(2008)/Issue 7 0000375]: Extend test/[...] conditionals: ==, <, >, -nt, -ot, -ef

2022-10-31 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 31 Oct 2022 19:03:53 +
From:"Stephane Chazelas via austin-group-l at The Open Group" 

Message-ID:  <20221031190353.ar33l2s6dwkor...@chazelas.org>

  | [ is perfectly fine after we deprecate -a, -o binary operators
  | and "(", ")".

Which was done ages ago.test (or its '[' synonym) is just fine now.

Wrt the current issue, I support the new operators being added to '['
that Chet Ramey mentioned in his message or note about that (those listed
in the Subject of the bug) - with the exception of == which is just a
meaningless frill that adds nothing useful at all (and isn't supported
in many test implementations, unlike the others proposed, which are).

As best I can tell there is no intent to add anything about the [[
extension that some shells have, so discussing that on this list isn't
really appropriate.

kre

ps: note that not adding == (or any other proposed test operators) doesn't
mean that any implementations that support them need remove that support,
just that applications cannot rely upon those things working everywhere.

Re: [Issue 8 drafts 0001611]: exit status from fg is either badly specified or is simply wrong

2022-10-31 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 31 Oct 2022 16:39:06 +
From:Austin Group Bug Tracker 
Message-ID:  

  | This is already being fixed by bug
  | https://austingroupbugs.net/view.php?id=1254,

OK, thanks, that is fine.  If I trusted my ability to conduct a
search in mantis, I might have even found it, but that never seems
to work for me.   I just thought it better to ensure this wasn't
just forgotten (or never even considered).   Sorry for the noise.

Feel free to close 1611.

kre

Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell

2022-10-19 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 19 Oct 2022 08:26:46 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  

  | I can't see anything "a few lines earlier" that implies quotation-mark
  | needs to be escaped.  Please give the exact wording change you would
  | like to see.

I think Steffen is referring to:

   \" yields a  (double-quote) character.

the first bullet point in the (new) section 2.2.4, and that all he
means to change would be to add to that sentence something like:

, but note that the double-quote character is not required to be
escaped to be included

(just before the '.' that ends the existing sentence).

kre

Re: [1003.1(2008)/Issue 7 0000767]: Add built-in "local"

2022-08-08 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 08 Aug 2022 17:24:59 +0200
From:"Christoph Anton Mitterer via austin-group-l at The Open 
Group" 
Message-ID:  <708410359c03bc0cfb89bfc29baaa9000b0d00b1.ca...@scientia.org>

  | Just wondered, whether it was ever considered to "simply" specify a new
  | keyword (e.g. "loc" or something more generic similar to bash's
  | declare),..

It isn't the keyword that is the problem, it is the desired behaviour,
which depends upon the model for variables that the shell implements
(or desires to implement).

Until we can agree on what the objective is, there's no chance of unifying
anything else.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001457]: Add readlink(1) utility

2022-07-22 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 22 Jul 2022 07:58:47 -0500
From:"Eric Blake via austin-group-l at The Open Group" 

Message-ID:  <20220722125847.tidcrt7a6ntvy...@redhat.com>

  | [If readlink is implemented as a shell builtin, then you could have an
  | extension where:
  |
  |   readlink -v var -n -- "$name"

If something like that were implemented, the -n would be a waste of
space (there) the variable would always be assigned the value of the
symlink, the -n is only to suppress the \n that is printed after that
when writing it to stdout.

The uses in cmdsubs you dissected are clearly not what -n is intended
for (though I wonder if perhaps something similar in csh, if that
ability is there - it has been so long since I looked at that - might
have a different outcome).

Aside from that possibility the only reason would seem to be the same
as why echo (real ones) have -n (and trashy ones have \c) and why
printf(1) needs a \n to print one ... there are times that it is useful
to write a partial line to stdout (or wherever) and there's no reason
that the output of readlink could not be intended to be a part of such
a gradually constructed output line.

kre

Re: [1003.1(2013)/Issue7+TC1 0001068]: Binding to a system-assigned port.

2022-07-22 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 22 Jul 2022 09:20:55 +0800
From:"DannyNiu via austin-group-l at The Open Group" 

Message-ID:  

  | Might I ask how did we resolve this? Just for the sake of record. 
  | Or the next minute will contain these info?

It probably will, but the messag you're replying to contained a URL to
the accepted text ... but as the URLs in the message you included in
this reply are mangled beyond recognition, I can only assume that some
protection from dangerous spam/phishing messages in your e-mail system is
stopping you getting them.  The URL was, with the https colon slash slash
stuff stripped off, so that should not be a problem (except you will need
to add that back):

austingroupbugs.net/view.php?id=1068#c5902

But it is more or less (standards wording applied) exactly what you
requested be done, bind to port 0, and the system picks a port for you
(which is what systems actually do).

kre

Re: [Issue 8 drafts 0001592]: Add %n$ support to the printf utility

2022-07-16 Thread Robert Elz via austin-group-l at The Open Group

Once again, mantis bit...   what's in the e-mail is only a half complete,
and fully unedited, version of what now appears in the note.

Did I ever say what I think of mantis?

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001273]: glob()'s GLOB_ERR/errfunc and non-directory files

2022-07-13 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 13 Jul 2022 15:34:39 +

Re:  https://austingroupbugs.net/view.php?id=1273#c5885 

  | Does anyone know if any implementation has made changes to glob() in the
  | last three years? 

The last change to NetBSD's glob() was late May 2019, so not here.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001538]: what -s is poorly described, uses the word "quit"

2022-06-21 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 21 Jun 2022 09:16:15 +
From:Austin Group Bug Tracker 
Message-ID:  <5c79b6e05af68bfbeaebf987e9c80...@austingroupbugs.net>


  | -- 
  |  (0005857) geoffclare (manager) - 2022-06-21 09:16
  |  https://austingroupbugs.net/view.php?id=1538#c5857 
  | -- 
  | Suggested new resolution (note that bug
  | https://austingroupbugs.net/view.php?id=1563 already fixed STDOUT, so
  | this should just be about -s) ...


Looks OK to me.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001538]: what -s is poorly described, uses the word "quit"

2022-06-20 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 20 Jun 2022 15:02:55 +
From:"Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
Message-ID:  

WRT:

  | A NOTE has been added to this issue. 
  | == 
  | https://austingroupbugs.net/view.php?id=1538 
  | == 

  |  (0005854) kre (reporter) - 2022-06-20 15:02
  |  https://austingroupbugs.net/view.php?id=1538#c5854 
  | -- 
  | Re https://austingroupbugs.net/view.php?id=1538#c5821
  |
  | Apologies for the delay of this response.

Also apologies for that - please ignore the e-mail, mantis decided to steal
my note when I was half way through entering it.   I updated the note (but
mantis doesn't have the good manners to forward edited notes ... I can see
delaying several (tens of perhaps) minutes after an edit, in case there's
another correction immediately after, but it would be really nice to see
updates to the text on the list, so we don't have to go fight with mantis
quite so much.  Would that be possible?

Anyway, for this one, if you care (for most readers of this, I doubt that's
true, "what" is a relatively insignificant command) you'll need to look at
what's in mantis - nothing of substance made it into the part of the note that
was e-mailed.

kre

Re: POSIX gettext(): lifetime of returned values

2022-05-25 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 25 May 2022 02:57:52 +0200
From:"Bruno Haible via austin-group-l at The Open Group" 

Message-ID:  <5462894.CAdn2TfLgq@omega>

  | IMO, it's useful to distinguish bounded and unbounded memory leaks:
  |   - A _bounded_ memory leak is one where the amount of leaked memory is
  | bounded by an a-priori computable constant.
  |   - An _unbounded_ memory leak is one where such a bound does not exist.

Personally I would first determine whether there is a memory leak
at all.  For this I like to imagine that we are using a garbage
collecting memory allocator ( no equivalent of free() ) and ask
whether such a system would reclaim any memory that has not
been subject to free() whike using C memory management.

Alternately, can the memory be reached by following pointers
from some visible starting point (whether in the app, or some
library does not matter).  If so, it is not leaked, even if
never free()'d.

If those tests do show a leak, then the above tests can help
determine if it matters or not.

But from your description, I'd assume (guess perhaps) that there
is no leak at all in what you have described, in which case that
classification scheme is irrelevant.

I would also guess that a side effect of the way it was described
is that changes to the on disc backing store (the .mo file, or
whatever) will not be detected while the application remains
running, and that aside from execing itself to restart clean
there is no way for an application designed to run forever
to ever see updated data.

If that's not the case, then given the guarantees you seem to
be making about the lifetime of returned pointers, it looks
like a memory leak would be unavoidable.   Consider one thread
which does gettext() after which you have no idea when it
might use that pointer again, while another keeps changing,
and then causing to be loaded, the altered data.  Forever.

kre

Re: When can shells remove "known" process IDs from the list?

2022-05-16 Thread Robert Elz via austin-group-l at The Open Group

Chet and I can continue thus conversation off list, what is
being discussed now has nothing at all to do with anything
related to posix.

kre

Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Robert Elz via austin-group-l at The Open Group

Date:Sat, 14 May 2022 03:56:32 +0700
From:"Robert Elz via austin-group-l at The Open Group" 

Message-ID:  <2459.1652475...@jinx.noi.kre.to>

  |   | Show your work.

  | I no longer remember the exact command I used (cannot even locate the
  | message you're quoting from),

I finally did ...

This is what I see:

bash5 $ echo $BASH_VERSION
5.1.16(1)-release
bash5 $ jobs
bash5 $ set +m
bash5 $ sleep 20 | sleep 20 & sleep 30 | sleep 30 & jobs -l; ps jT
[1] 1868
[2] 1847
[1]- 29632 Running sleep 20
  1868   | sleep 20 &
[2]+  2715 Running sleep 30
  1847   | sleep 30 &
USER   PID  PPID PGID   SESS JOBC STAT TTY   TIME COMMAND
kre355  1847 5699 d0d6d70 S+   pts/26 0:00.00 sleep 30 
kre410 29632 5699 d0d6d70 S+   pts/26 0:00.00 sleep 20 
kre   1687  1868 5699 d0d6d70 S+   pts/26 0:00.00 sleep 20 
kre   1847  5699 5699 d0d6d70 S+   pts/26 0:00.00 -bash 
kre   1868  5699 5699 d0d6d70 S+   pts/26 0:00.00 -bash 
kre   2715  5699 5699 d0d6d70 S+   pts/26 0:00.00 -bash 
kre   4319  2715 5699 d0d6d70 R+   pts/26 0:00.00 sleep 30 (bash)
kre   5333  5699 5699 d0d6d70 O+   pts/26 0:00.00 ps -jT 
kre   5699  3620 5699 d0d6d70 Ss+  pts/26 0:00.03 -bash 
kre  29632  5699 5699 d0d6d70 S+   pts/26 0:00.00 -bash 
bash5 $ echo $$
5699
bash5 $ 

Note that pids 29632 and 1868 (which jobs claims are "sleep") are actually
bash, the sleep processes are 410 and 1687.   Similarly for job 2. Everything
is in process group 5699 (the interactive shell's pid).

When one kills %1 processes 29632 and 1868 get killed, processes 410 and 1687
do not.

You can decide whether the extra interposed bash processes are intentional or
not, as I said in the previous message, that is not wrong.  The inability to
signal the (unknown) grandchildren is expected (the same kind of thing would
happen if the command were "make" and there's a whole tree of make, compiler,
linker, ... processes running - this is unavoidable).

kre

Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 13 May 2022 11:22:20 -0400
From:Chet Ramey 
Message-ID:  

  | Show your work.
  |
  | I tested this on macOS 12 and RHEL 7, using interactive shells with job
  | control enabled,

That is likely the difference.   The question was about what happens when
job control is not enabled.

When job control is enabled, the kill kills that job's process group, and
all of it gets signalled.   Without job control, that's not possible, the
shell can only kill its known children, their children (absent relaying of
the signal down the tree) never see it.

I no longer remember the exact command I used (cannot even locate the message
you're quoting from), which caused bash to fork a sub-shell, in which to
run the pipeline, rather than running it directly from the parent - but
that's not really the point, doing that was not wrong, whatever provoked it,
it simply meant that the parent shell did not know the actual processes
running in the pipe, so could not do anything to them.

kre

Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 13 May 2022 10:20:49 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220513092049.GB17043@localhost>

  | [Robert Cc'ed this to austin-grou...@netbsd.org which presumably bounced.
  | I'm taking that as indication that he intended it to go to this list,
  | and am quoting it in full.]

Oops.   And yes, I did, and thanks.   Didn't even notice that this one
hadn't appeared on the list (I ignore bounce messages).

  | However, what the standard requires here does not match existing
  | practice in some shells and so the standard should change.

OK, let's just agree on that, whatever our opinions of what it
currently says.

  | It's not clear at all, and I would say the opposite is implied.
  | The definition of "Job" is:
  |
  | A set of processes, comprising a shell pipeline, and any processes
  | descended from it, that are all in the same process group.
  |
  | Notice it says "that are all in the same process group".

Yes, I did.

  | In the case of a background command started with job control disabled,
  | the processes all have the same process group

Exactly.   That meets the definition, doesn't it?

  | as the parent shell.

Not relevant.

  | By a strict reading, this counts as a job, but I don't think that
  | was intended.

Intended or not, that's what the standard says.   It also largely matches
what is implemented.

  | In any case we already know that the current definition of "job" is
  | very wrong, so using it to support either position is futile.

"very wrong" I think is too much - it is very close to the implementations.

But given the last clause, we probably need to wait upon proposed new
definitions, and specs for the relevant usages, to see if those are a
closer fit to reality.

kre

Re: wait and stopped processes (was: When can shells remove "known" process IDs from the list?)

2022-05-11 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 11 May 2022 09:58:38 -0400
From:"Chet Ramey via austin-group-l at The Open Group" 

Message-ID:  <4d0598b4-efb3-d5c2-1267-b8a807399...@case.edu>

  | > It is already what the standard requires, and with good reason.
  |
  | Sure. It simply isn't what many (most) shells do.

You're right about that, given this test (in an interactive shell, with set -m)

date; sleep 30 & X=$! ; ( sleep 5;  kill -STOP $X) &
echo sleep=$X kill=$!; wait $X; jobs -l; date

(which I entered on one line, but wrapped here for e-mail convenience)

All shells but FreeBSD and zsh (--emulate sh) finished in 5 seconds, leaving
a stopped sleep job running.   (We can ignore The NetBSD sh for this, it
is definitely broken - what happens depends upon that "sleep 5", as the
wait behaves differently if the waited upon process is already stopped,
vs if it stops while waiting).

The FreeBSD and zsh shells didn't terminate that command until a SIGCONT
was directed at the sleep process (rather more than 30 seconds after all
of this started).

  | Maybe. And yet I can't recall ever receiving a bug about this.

That is most likely because users generally don't wait in interactive
shells, and in non-interactive shells, 99.9% of the time if a job stops,
is parent shell stops along with it - when they are resumed, they
both resume, and simply continue from where they left off.

The circumstances to provoke a problem need to be contrived.

kre

Re: When can shells remove "known" process IDs from the list?

2022-05-11 Thread Robert Elz via austin-group-l at The Open Group

Date:Wed, 11 May 2022 09:17:15 -0400
From:"Chet Ramey via austin-group-l at The Open Group" 

Message-ID:  <573bc015-dd85-f86e-b89d-33a0bcc4b...@case.edu>

Again, apologies, still very little time for any of this.

  | For neither the first nor the last time.

Including now.

  | > I think they should remain independent.
  | Sure, I agree.

I don't.  I cannot think of a single reason why the shell should be
forced to maintain two separate lists of its child processes.  The jobs
table needs to have them, so processes in the job can be identified as
they finish.  Duplicating that in another table, for no particular reason
I can imagine makes no sense to me.   Still, if others want to implement
it that way, I don't object - but the standard has never required that,
and should not, absent some very good reason, be changed to require it now.

In a later message Chet said:
| > The normative text relating to creation of job numbers/IDs is all
| > conditional on job control being enabled.

| Where is that? It's not in the definition of Job ID, it's not in 2.9.3
| Asynchronous Lists, it's not in the `jobs' description, it's not part of the
| definition of Background Job or Foreground Job, it's not in any of fg/bg/kill/
| wait. I feel like I'm missing something obvious here. 

Again, I disagree.   You're missing nothing.   There has not been anything
like Geoff is postulating - there might be in his unpublished new draft text,
but there is no reason I can imagine that such a change should be adopted.

kre

Re: When can shells remove "known" process IDs from the list?

2022-04-29 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 29 Apr 2022 20:11:55 +0100
From:"Harald van Dijk via austin-group-l at The Open Group" 

Message-ID:  

  | >| It also appears that dash still implements remove-before-prompting.
  |
  | busybox ash and my shell do as well, but both are derived from dash and 
  | have merely retained dash's behaviour.

All ash derived shells work that way.

  | > Does anyone not?
  |
  | bash does not. bosh does not. ksh does not. mksh does not. posh does 
  | not. yash does not. zsh does not.

I did a test (not the same one you did) after I sent the mail, and saw
that bosh and yash don't.   For the other shells, it is not nearly as
clearcut what is happening.

  | You can test this by doing
  |
  |true &
  |
  |wait $!; echo $?
  |
  | This should print 0. Then do the same, except with the first command 
  | changed to false &. That should print 1.

Yes, in the shells you mention it does, indicating that something different
is happening.   It is interesting that in bash you can do that wait over and
over again, and it keeps returning the 0 status (until one does a plain "wait"
command, even the "jobs" command doesn't remove it, though the standard
requires that it do so).   bash is the only shell that acts like that, whether
it is intentional or not I have no idea.

But try a different test

true & X=$!

(the assignment to X is just in case there is a shell which implements that
"no need to retain" stuff when $! is not referenced).

Then repeat that line over and over. (Consecutive lines).

In ash derived shells (and pdksh) the first will report job 1 starting
(assuming you had none already running), the 2nd line will report job 2
starting, and before prompting for the 3rd, report job 1 has finished.
The third will be job 1 again, and report job 2 has finished, and that
continues over and over again.

This is all consistent with how we know that they work.

In bosh and yash, the job number just keeps on climbing, even though they
report the previous job finished as each subsequent one is started.  That's
also consistent with how they operate.   A simple "wait N" for one of the
jobs removes that one from the list, then more true& commands add more jobs.
A simple "wait" clears up everything.   In yash "jobs" reports them all 
finished and clears everything, as it should.  In bosh "jobs" reports them all
finished, but clears nothing (the jobs command can be repeated over and over
and keeps reporting all the completed jobs).   That's clearly broken.

zsh does something different, once a job has been reported as finished
at a prompt, it is removed from the jobs table, and you can no longer do
"wait %3" for it, but the pid and status seem to be remembered somewhere
else, and wait  gets the status from the job.   That seems odd to me,
it should be possible to use either form to wait on a job.   (I should note
that there is something odd about my zsh install - I tend to need to type
two newlines after a command to get it executed, both are seen by the shell.
Most of the time that's just mildly annoying, when I forget the 2nd, nothing
happens, and I have to wake up and remember that zsh is waiting for the 2nd
before it will do anything with the command - but in testing like this, where
the newlines generate prompts, and the accompanying the prompt is an action
we care about, it kind of ruins the test.)

ksh93 is similar (without the double newline issue).

mksh is almost similar, but in it I saw
internal error: j_async: bad nzombie (161)
twice (once, then more testing, then again), which does not look good.
I don't know what the 161 represents, it was not the same each time, but
is not a pid of any of the jobs started.  A count?

In that one, with this sequence, there are only ever 2 jobs (as in job
numbers) assigned, as each is started, the previous one is reported finished,
and removed from the jobs table.  It is possible to wait %n for the job
number most recently started, but only that one (were the commands to run
for longer, then presumably it would be possible to wait on any not completed
and reported as completed).

bash is different again, it counts up the job numbers, like bosh and
yash, but as it reports each earlier one finished, removes it from the
jobs table, so the "jobs" command only ever shows (and then removes) the
last one started.   It still allows wait N to return the status, as many
times as you want to do that command, but not wait %n for any but the
most recently created one.

  | I consider the dash behaviour a bug, but do not want to 
  | fix it in a way that introduces another bug.

While removing jobs that have been reported (ie: removing them as
soon as possible) might reduce the risk of getting duplicate pids,
it doesn't actually solve the problem.   In particular, the removal
only happens in interactive shells (ones which prompt) so does nothing
at all for scripts, which have the same issue.   It can also happen in
an interactive

Re: When can shells remove "known" process IDs from the list?

2022-04-29 Thread Robert Elz via austin-group-l at The Open Group

Date:Fri, 29 Apr 2022 15:39:23 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220429143923.GA22521@localhost>

Sorry, been too busy to participate here much recently, will catch up
someday soon (I hope).

  | However, today it threw a last curve ball when I was working on an
  | update to the description of set -b ...

How many shells actually implement that?

  | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
  | remain known until:
  |
  |  1. The command terminates and the application waits for the process ID.
  |
  |  2. Another asynchronous list is invoked before "$!" (corresponding to
  | the previous asynchronous list) is expanded in the current execution
  | environment.

Does anyone implement that bit (#2) at all?  In a non-interactive shell it
might almost be possible, but in an interactive shell, if the job isn't in
the list (whether $! has been referenced or not - usually it will not have
been) because it has been removed, what is the shell supposed to do if the
job stops?   Further users (even in scripts) are allowed to use % %- %1
etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should
work).   I'd suggest that #2 should simply be removed.

But do note that the definition of the jobs command says:

When jobs reports the termination status of a job, the shell shall
remove its process ID from the list of those ``known in the current
shell execution environment''; see Section 2.9.3.1 (on page 2338).

(quote from I8 Draft 2.1 -- but that text has been there forever, or seemingly).

So that's another way that an entry is removed, and this one is "shall remove"
whereas "remain known until" puts a minimum on how long the job is supposed
to remain known, but doesn't actually require removal.   For #2 that's obvious,
shells aren't required to make that optimisation (that's some academic view of
what was thought should be possible - but isn't in practice), but for #1 if
the job isn't removed (when wait happens) then it could still be there, again,
and again, forever - even if the system uses the same pid later (days, weeks,
months later perhaps) for another job started by the same shell -- against which
there is no protection of any kind currently, though a shell could do WNOWAIT
waits so zombies remain in the process table, even though the shell has 
already collected the exit status - but that's difficult to actually
code correctly, especially given the definition of how SIGCHLD works, which
as best I can tell has to be used as the only thing that would make it
even conceivable to use WNOWAIT.   Without that, when the shell acts like
I believe most, or all do, and cleans up zombies ASAP, just keeping the
job in its jobs table, marked terminated, with the status ready to give
back when requested, the kernel is free to assign the reclaimed pid to any
new process it likes, whenever it likes.

  | My initial reaction to this was that the above quote from set -b is
  | likely a left-over from before the decision to disallow the historical
  | remove-before-prompting behaviour was made.

I doubt that -b is particularly relevant to this, other than that it provides
an alternate time at which termination status of a process can be shown.

  | However, then I spotted that the text from wait, which seems to be an
  | attempt to justify that decision, first says it was historical
  | behaviour for *interactive* shells but then talks about the problems
  | it could cause for *scripts*.  So it seems to me that the
  | justification does not stand up to scrutiny.

The justification doesn't, but for scripts I don't recall there ever
really being an issue - the removal happens when the status of jobs which
have changed status is reported just before PS1 is written, and
non-interactive shells (scripts) don't do that.

On the other hand, users of interactive shells are not in the habit of
issuing wait commands (even jobs commands, without some reason do do so).
They expect to be told when a background job has finished (without -b both
working, and set, that might require causing new prompts to appear from time
to time) and simply expect that when a job has been reported as done, it is
done, and no longer exists.

  | It also appears that dash still implements remove-before-prompting.

Does anyone not?

  | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to
  | add a third list item (for interactive shells only) and deleting the
  | above quoted text from the wait page.

This is necessary, we would be making use of the shell too difficult for
interactive users otherwise.   But there is no particular need for an
"interactive only" here, scripts can (though usually don't) use the jobs
command as well (it is a convenient way to get rid of any jobs from the
table that have finished, without knowing what they are, and without
potentially hanging waiting for something

Re: [Issue 8 drafts 0001564]: clariy on what (character/byte) strings pattern matching notation should work

2022-04-14 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 14 Apr 2022 09:42:37 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220414084237.GA15370@localhost>

  | That is how things are at present. The suggested changes just make it
  | explicit.

Yes, I know, but that's what I am suggesting that we not do in this one case.

  | Do you have an alternative proposal?

Only to the extent of "do nothing".   I am certainly not suggesting that
we attempt to solve the problem.

Except perhaps it might be worth adding something to the Rationale (but
about what, ie: where there, I have no idea) along the lines of:

It is often unclear whether a string is to be interpreted as
characters in some locale, or as an arbitrary byte string.
While it would have been possible to arbitrarily make the various
cases more explicit, or explicitly unspecifried, it was considered
better, in this version of  to
make no changes, as it is believed that much additional work is
required to enable a standards-worthy specification possible.
This work is beyond the scope of this standard.

The problem I see, is that any specification at all of any of this,
allows implementors to just say "that is what posix requires" and do
nothing at all, where we really need some innovation, by someone who
actually understands the issues and how to deal with them in a rational
way - or at least who can come up with some kind of plan, and without
any possibility of being considered a non-conformant implementation
because of it.

  | The application can document that it requires pathnames to be in the
  | same encoding as the user's locale.

That's not sufficient.Try encoding a find command to look for pathnames
containing currency symbols.   It should be just a simple find -name '*[ABCD]*'
type operation, with appropriate substitutions for the ABCD chars.

No problem if not all the world's currency symbols are encoded, if we find
one that has been forgotten, it can simply be added.  Currency symbols are
things like the $ sign, British pound, Euro, Yen, Baht, ... (there are a
whole bunch of them).   If there were a [:currency:] class, it would be easy
(and I'd need to come up with a different example).   But there isn't.

If we cannot do something this simple, and expect it to work reliably,
everywhere, then what we have is useless, and needs to be replaced or
reworked.   That's not a standards' body type task.   But we should be
doing nothing to interfere with the production of a solution.

  | The C locale is specified as containing 256 single-byte characters.
  | Thus in the C locale all pathnames are valid character strings.

Sure, understood.

  | > Even worse perhaps, ???.doc which should match 7 char
  | > names that end in ".doc" (or is that 7 byte names?) (not counting the \0).
  |
  | It would match 7-byte names.

Yes, in the C locale it would.   But do you believe that is what the user
would have intended?   Are they to be required to work out how many bytes
their local filenames are encoded as, and enter the appropriate number of '?'
chars?

kre

Re: [Issue 8 drafts 0001564]: clariy on what (character/byte) strings pattern matching notation should work

2022-04-12 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 12 Apr 2022 08:51:51 +
From:"Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
Message-ID:  <1541e949d4c9cd28467acf6033bfd...@austingroupbugs.net>

That is, Geoff Clare:

  | 1. The vast majority of apps will never need to do that because they know
  | (or can assume) that the pathnames they handle either always use the
  | portable filename character set or use the user's locale.

The latter, perhaps, the former, certainly not in an international context.
The point was that, at least as I read the proposed text, you're defining
things like '*' to only work (reliably as specified) when the locale is
POSIX (aka C).   In the user's locale, who knows what happens?

  | I.e. the pathnames are not abitrary (a word I was careful to
  | include in the proposed changes).

Sure, the problem is that when dealing with user input (as in, for example,
the command line args) the application cannot assume that the pathnames are
not aribtrary.   They're anything that's OK for the user.

  | 2. In apps that truly do need to do matching or expansion on arbitrary
  | pathnames, a C program can call uselocale() before and after calls to
  | fnmatch(), glob(), and wordexp(). A shell script can set LC_ALL=C before
  | handling pathnames (and unset it or restore it afterwards). 

But how does that help *.doc (in a defined way, as opposed to "of course
that works in all glob implementations") match a filename that isn't
entirely ascii (by which I mean, using characters only from the portable
character set)?Even worse perhaps, ???.doc which should match 7 char
names that end in ".doc" (or is that 7 byte names?) (not counting the \0).

Anyone from outside the English speaking world is likely to encounter many
of those.

kre

Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-07 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 07 Apr 2022 18:15:55 +0700
From:"Robert Elz via austin-group-l at The Open Group" 

Message-ID:  <5473.1649330...@jinx.noi.kre.to>

  |   | e.g. adding:
  |   |
  |   | For example, the context address "\.[.][0-9]." is equivalent
  |   | to "/\.[0-9]/".
  |
  | Looks good to me.

Actually, to make things even clearer, you might want to add to that:

, however with "\.\.[0-9]." it is unspecified which of
"/\.[0-9]/" or "/.[0-9]/" is its equivalent form.

kre

Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-07 Thread Robert Elz via austin-group-l at The Open Group

Date:Thu, 7 Apr 2022 10:37:06 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220407093706.GA7005@localhost>

  | The new definition in bug 1546 is specific to regular expressions
  | (since it talks about the backslash not being in a bracket expression),

Yes, of course.

  | e.g. adding:
  |
  | For example, the context address "\.[.][0-9]." is equivalent
  | to "/\.[0-9]/".

Looks good to me.

kre

Re: [Issue 8 drafts 0001550]: clarifications/ambiguities in the description of context addresses and their delimiters for sed

2022-04-05 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 5 Apr 2022 15:54:40 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220405145440.GB6489@localhost>

  | Okay, I'll see what I can do.  It may make sense to use the new
  | definition of "escape sequence" from bug 1546.

  | It won't be possible in the y command, as that doesn't use an RE (so
  | would need its own definition of "escape character").

I wasn't paying attention to just where any of this was to be placed in
the final doc, but couldn't the definition of "escape sequence" (and those
related to it) be somewhere generic?   It might even be worth (since it
is so common) defining a "backslash escape sequence" in XBD - but for
that allow it to have an application specified following sequence (one or
more following characters, as defined by the application), and then for REs
just define only the case for a single following char.

Or just define "escape sequence" and leave it for the application also to
define what the escape character is (are there many that don't use \ though?)

  | What matters is that the delimiter can only be escaped with an
  | _unescaped_ backslash, and that it doesn't end the RE when it is in a
  | bracket expression. I believe my proposal makes both of those things
  | clear.

I suspect that the point was more related to when 2-pass parsing is used,
and an escaped delimiter is seen, does the second pass still see the escaped
delimiter, or is it now unescaped.   I'm no sed expert (I use it a lot,
but have never really looked into an implementation, and don't push the
wacky boundary cases in my uses) - but I believe this is to be explicitly
unspecified (that is, implementations can do either, and applications must
not depend upon which is done).

  | It really is hardly any limitation on applications if they need to
  | avoid using special RE characters as delimiters in order to be portable

I agree.

Not just portable, but sane.   Only a moron would actually use . ? * [ ( ...
as a delimiter, there are plenty of perfectly good alternatives available
when good old / isn't the best choice (which it often isn't when
manipulating path names).   Personally I'm quite fond of ascii BEL (^G)
as the delimiter in the cases when neither / nor ; (my 2 favourites)
are really available (while BEL probably isn't technically portable,
it always works in my experience).

It still needs to be clear that it is possible to be a moron if one
wants, but in such cases, some things just might not be possible.

  | It might be worth altering this somehow, but "literal" is wrong
  | (specifically if the delimiter is '^' or '-', or things like ':' in
  | [[:alpha:]]).

That depends upon the context of the word "literal" there - I just took
it to mean that the character would mean the same thing as it would if it
were not also the delimiter, not that it would be deprived of any other
magic properties it might gain by such use.

  | > => And perhaps something like "should put it inside a bracket
  | > expression __with not other characters__" to make clear, that one
  | > cannot re-use one e.g. 'sX\X[0-9]XfooX' can NOT be written as
  | > 'sX[X0-9]XfooX' but only as 'sX[X][0-9]XfooX'.
  |
  | Incorrect, sX[X0-9]XfooX is required to "work"

I think the point there was that it doesn't mean the same thing, in that
one a single char is being substituted, in the others, it is a 2 char sequence,
the delimiter, followed by a digit, not either the delimiter or a digit.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001546]: BREs: reserve \? \+ and \|

2022-04-05 Thread Robert Elz via austin-group-l at The Open Group

Date:Tue, 5 Apr 2022 09:41:26 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220405084126.GA6489@localhost>

  | > > ��� An escape sequence is defined as the escape character followed
  | > > ��� by any single character.� The escape character is a 
  | > > ��� that is neither in a bracket expression nor itself escaped.

  | Okay, I'll propose that wording in Thursday's teleconference.

Actually, if this (or something to the same intent) doesn't already exist,
then it might be worth adding a third sentence:

A character is considered "escaped" if it appears as the second
character in an escape sequence.

I was first going to suggest that you switch from "nor itself escaped"
to the way I originally worded it ("nor the 2nd char of an escape seq")
but I realised it would be better to explicitly define "escaped" instead,
so that can be used elsewhere, and be properly defined (not just rely upon
being what is obvious).

Whether this new sentence goes 2nd (between the existing two) or 3rd
(after them) I don't think matters -- but a slight preference for 2nd,
in which case it could also just be an additional clause on the first
sentence ", that character is escaped." or something like that, perhaps
"which is thereby escaped".

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001546]: BREs: reserve \? \+ and \|

2022-04-04 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 4 Apr 2022 15:24:25 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220404142425.GA23024@localhost>

  | I don't see a need for an xref to XBD 6.1,

That's fine too, I just suggested that as a replacement, just in case...

  | A minimal fix to the current proposed text would be something like this:
[...]
  | or it could be split into two sentences along the lines of your original
  | suggestion:

Either would work.   I (kind of obviously) slightly prefer the 2nd, I
think it is slightly clearer (easier to follow), but the version that's
closer to what is currently there would also work.

kre

Re: [1003.1(2016/18)/Issue7+TC2 0001546]: BREs: reserve \? \+ and \|

2022-04-04 Thread Robert Elz via austin-group-l at The Open Group

Date:Mon, 4 Apr 2022 08:46:56 +
From:Austin Group Bug Tracker 
Message-ID:  

That is, really from Geoff Clare:

  | Personally I don't see that there is a problem with the current wording.

It is almost OK, and if you consider the readers must be able to
interpret the words in a rational, obvious, way, would be.

The problem is that an escape character cannot be escaped, if it is,
it isn't an escape character (so there is a contradiction).

the escape character  ('\\'),
when neither [...] nor itself escaped, 

There are plenty of ways to rewrite this to make the point that it
is an unescaped backslash (rather than an unescaped escape char) which
becomes the escape char, my suggestion was just one possibility.

The same issue applies to being within a bracket expression, an escape
char cannot be there, so it makes no real sense to exclude it - though
it does to say that a backslash that is in there is not an escape char.

kre

ps: I'm also not greatly in favour of writing the backslash character as
a C character constant, rather than just as a character (as in a sh quoted
string for example) as '\'.   Since there will always people who will object
to either of those, I wouldn't give the character's glyph form at all, but
rather refer to XBD 6.1 where it is presented without the quotes, and so
there's no problem.   So " ([xref XBD 6.1])".

1 2 3 4 5 6 7 >

1 - 100 of 668 matches

Mail list logo