Re: Thread queue position after unlocking PRIO_PROTECT mutex

2022-10-11 Thread shwaresyst via austin-group-l at The Open Group
Re: the last bitI think that's to account for the case where an implementation 
may asynchronously allow an app thread to modify the base priority while 
another thread is blocked on it, not that the example sched_setparam() has to 
be in the same thread. While just adding it to the tail of the new queues list 
may be fastest to accomplish, I think doing an insertion, at head or a spot 
where relative time to get to head of queue nearest the same as old queue 
position is more the desired behavior.
 
 
  On Tue, Oct 11, 2022 at 4:41 AM, Geoff Clare via austin-group-l at The Open 
Group wrote:   I wrote, on 10 Oct 2022:
>
> I'm trying to understand the second sentence in this paragraph on the
> pthread_mutexattr_getprotocol() page:
> 
>    While a thread is holding a mutex which has been initialized
>    with the PTHREAD_PRIO_INHERIT or PTHREAD_PRIO_PROTECT protocol
>    attributes, it shall not be subject to being moved to the tail
>    of the scheduling queue at its priority in the event that its
>    original priority is changed, such as by a call to sched_setparam().
>    Likewise, when a thread unlocks a mutex that has been initialized
>    with the PTHREAD_PRIO_INHERIT or PTHREAD_PRIO_PROTECT protocol
>    attributes, it shall not be subject to being moved to the tail of
>    the scheduling queue at its priority in the event that its original
>    priority is changed.
> 
> The first sentence is no problem. It's pointing out that items 7 and 8a
> in the description of SCHED_FIFO don't apply to this change of the
> thread's normal priority (since it isn't currently executing at that
> priority).
> 
> But when a thread unlocks a PRIO_PROTECT mutex, in the simple case
> where locking the mutex caused its priority to be raised and unlocking
> it causes its priority to revert to its original value, it has to be
> moved from the queue for the higher priority to the queue for its
> original priority, so it doesn't make any sense to me that the text
> above talks about moving within a queue, and why does it say "in the
> event that its original priority is changed"?

After further reading, I'm not sure it is in a queue at all when it
unlocks the mutex. The rationale implies that it is:

    The process at the front of the ready list is executed until it
    exits or becomes blocked, at which point it is removed from the list.

(it says "process" not "thread", but I think that's just because it is
out of date compared to the normative text it is commenting on).
However, the normative text says the queue is "a thread list that is
ordered by the time its threads have been on the list without being
executed".  The use of "without being executed" here implies that the
thread at the head of the list is removed from the list when it starts
execution, not left there until it blocks or exits.

> The only way I can get any sense out of it is to take it as meaning
> that when the thread moves from the queue for the higher priority to
> the queue for its original priority, it should be placed at the head
> not the tail, which seems reasonable, but it's very unclear.

If the thread is not on a queue at all while it is running, then the
point of the second sentence in the paragraph I originally quoted is
presumably to stop item 7 in the SCHED_FIFO description from requiring
that the thread is placed on a queue when it unlocks the mutex.  I.e. it
keeps running, but now at its original priority (unless unlocking the
mutex makes a higher priority thread runnable, in which it would be
pre-empted by the higher priority thread).

However, the last bit "in the event that its original priority is changed"
still makes no sense.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

  


Re: Can struct sockaddr_un.sun_path be a flexible array member?

2022-07-16 Thread shwaresyst via austin-group-l at The Open Group
Short answer, no. It was erroneously specified as such in the  header 
because there wasn't an agreed upon symbolic constant for the size and I 
believe this notation was the convention before the C standard adopted flexible 
arrays. While an implementation should declare a symbolic constant, some have 
just used an integer constant instead so it's left unspecified.
 
 
  On Sat, Jul 16, 2022 at 1:13 PM, John Scott via austin-group-l at The Open 
Group wrote:   Hi list,

I do not represent any implementations, I ask this merely as an
application developer who has asked around.

Can .sun_path be a flexible array member? The standard says it has
unspecified size, but also normatively says
"The sockaddr_storage structure defined in  shall be large
enough to accommodate a sockaddr_un structure." This doesn't clear
things up unless we have a notion of whether "size of a structure"
includes its flexible array member, and even if that is true, whether
including a flexible array member on sockaddr_storage (albeit one which
a portable application wouldn't know how to access) would satisfy this.

The example for bind() uses sizeof() on .sun_path, suggesting the answer
to my question is "no," but examples aren't normative.

If the standard could say whether this is permitted more clearly, that
would make me happy.

Thanks for your attention to my inquiry,
John
  


Re: POSIX msgfmt: effect of LC_CTYPE on PO file parsing

2022-05-11 Thread shwaresyst via austin-group-l at The Open Group
This is for files that do not specify a separate codeset at all, and for 
interpreting a file that does specify one before it gets to the line with the 
codeset directive, is my understanding, so needs to be there (for now, maybe 
not in future). It maybe should be more explicit codeset changes start with the 
next directive, not applies to a rewind/reread of the whole file.
 
 
  On Wed, May 11, 2022 at 7:30 PM, Bruno Haible via austin-group-l at The Open 
Group wrote:   
https://posix.rhansen.org/p/gettext_draft
Line 960

"Do we need to say this isn't used for message strings, only for parsing
 the .po file?"

The .po file format has a mechanism for specifying the codeset of the
PO file. See line 1009. Therefore LC_CTYPE is *not used* for the
interpretation of the input .po file, only for producing diagnostics
(in combination with the LC_MESSAGES category).



  


Re: When can shells remove "known" process IDs from the list?

2022-04-29 Thread shwaresyst via austin-group-l at The Open Group
It appears to me the set -b wording needs updating, to clarify "may remove the 
job's process ID" is intended to exclude the blocking circumstances listed, and 
since it's a "may", not "shall", whether those exclusions are handled properly 
now is more a quality of implementation than conformance issue.
 
 
  On Fri, Apr 29, 2022 at 10:40 AM, Geoff Clare via austin-group-l at The Open 
Group wrote:   I've been gradually making 
progress on bug 1254 as a background task.
However, today it threw a last curve ball when I was working on an
update to the description of set -b ...

That description includes this near the end:

    When the shell notifies the user a job has been completed, it may
    remove the job's process ID from the list of those known in the
    current shell execution environment

This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
remain known until:

 1. The command terminates and the application waits for the process ID.

 2. Another asynchronous list is invoked before "$!" (corresponding to
    the previous asynchronous list) is expanded in the current execution
    environment.

Then there is the following in the APPLICATION USAGE for wait:

    Historical implementations of interactive shells have discarded
    the exit status of terminated background processes before each
    shell prompt. Therefore, the status of background processes was
    usually lost unless it terminated while wait was waiting for it.
    This could be a serious problem when a job that was expected to
    run for a long time actually terminated quickly with a syntax or
    initialization error because the exit status returned was usually
    zero if the requested process ID was not found. This volume of
    POSIX.1-202x requires the implementation to keep the status of
    terminated jobs available until the status is requested, so that
    scripts like:
    [...]
    work without losing status on any of the jobs.

My initial reaction to this was that the above quote from set -b is
likely a left-over from before the decision to disallow the historical
remove-before-prompting behaviour was made.

However, then I spotted that the text from wait, which seems to be an
attempt to justify that decision, first says it was historical
behaviour for *interactive* shells but then talks about the problems
it could cause for *scripts*.  So it seems to me that the
justification does not stand up to scrutiny.

It also appears that dash still implements remove-before-prompting.

There would seem to be two options to resolve this:

A. Uphold the decision to disallow remove-before-prompting.  This
would mean removing the conflicting text from set -b and updating the
justification on the wait page to something that holds water.
(And dash would need to change in order to conform.)

B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to
add a third list item (for interactive shells only) and deleting the
above quoted text from the wait page.

I'm particularly interested to get the opinions of shell authors on
this.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

  


Re: how do to cmd subst with trailing newlines portable

2022-02-21 Thread shwaresyst via austin-group-l at The Open Group
The compliance factor for locales is more of documentation than exclusionary, 
so Thorsten is correct, the standard allows what he does. Just the fact 
localedef can use any encoding model via suitable charmap data makes this 
somewhat obvious. An implementation may provide locales using any character 
encoding as long as it says which of these have the same requirements of the C 
locale's encoding for the portable character set, and these all map to the same 
wide character encoding. That subset, effectively embodying the POSIX 
'universe' the standard references as "all locales provided", can be expected 
to work with portable code, any others are in the much wider 'universe' of 
unspecified behavior. That subset can be just the POSIX locale too, and the 
implementation is conforming.
 
  On Mon, Feb 21, 2022 at 12:30 AM, Christoph Anton Mitterer via austin-group-l 
at The Open Group wrote:   On Fri, 2022-02-18 at 
00:35 +, Thorsten Glaser wrote:
> You can have nōn-POSIX locales. For example, in mksh, I have a UTF-8
> mode, but I specify that only the "C" locale attempts POSIX
> conformance.

But that sounds like a violation of POSIX, e.g. if you had a locale 'C'
which would encode '.' as 0x2E and another one which encodes the same
as something else - you couldn't just say that only the other locale is
non-POSIX, but then your whole implementation wouldn't be compliant.

Same if any of the other hard rules are broken... like chars from the
portable charset being just one byte long, etc..



> Switching the locale during shell runtime is not allowed to change
> the way the script is parsed, so the variables etc. are all that is
> permitted to “change”, by means of reinterpretation.

Yes I found that now in the standard and I guess it's more or less
clear when a locale change does apply and when not:

- foo=x
  => clear,... all lexical, change doesn't apply

- printf '%s' 'foo'
  => clear, printf is like a command.. so printf would use the change
    locale, depending on what the format actually is (which is also a
    bit ambiguous, see https://www.austingroupbugs.net/view.php?id=1562)

- ${var%foo}
  => well,.. semi-clear

- expanding $# or $?
  => in principle, POSIX seems to allow locales that have different
    encoding for the chars from the portable charset
    So in principle, one could ask whether $# and $? gives the digits
    from the new locale, when it was changed in internally.
    However, POSIX also says, when the portable charset chars are
    encoded differently, and such locales are used, the results are
    unspecified.
    So no doesn't really matter.


> But here you’re lucky again that  has to have the exact same
> encoding across *all* locales supported in one POSIX “universe”, and
> that it must not occur as part of a multibyte encoding in a supported
> locale on the same universe.

Despite that.. and despite the solution that had been discussed here
before, ... having spent quite some thought (and hopefully learned a
bit) about it... I'm still unsure about how to make it the command
substitution with trailing newlines really 100% portable in any
situation (i.e. locale) allowed by POSIX and especially with any shell
conforming to POSIX.

... (below)



> > But at least, it should still work portably, when doing the
> > LC_ALL=C
> 
> No, absolutely not.
> 
> In all supporta̲b̲l̲e̲ scenarios (i.e. those in which you’re not
> entering
> unspecified behaviour already anyway), you’ll be safe with:
> 
> x=$(command; echo .); x=${x%.}
> 
> (Or a variant that carries over $?, of course.)

...

I know you've said earlier, that you considered using '.' enough but
Chet Ramey, Geoff Clare and other still said the LC_ALL=C switch would
be necessary.

It was brought up before that an implementation would be allowed to not
handle it gracefully, if the string was say: "."... and even if that coulnd't form a new character because
of the special properties of '.' ... it could still fail to being
stripped of properly.

Your argument was, that and shell that fails to do that would have a
bug... but it's unclear whether that's really mandated by the standard.


That's why I've asked before:
> I tried to find out in the standard, what POSIX actually says that
> "${tmp%∈}" operates on: bytes or characters.
> 
> And that seems a bit ambiguous (well, to me at least).
> 
> - In some earlier discussion it was pointed out that shell variables
>  should be strings (of bytes, other than NUL)

If variables are byte strings... (which is also disputed, btw.)...

> 
> - 2.6.2 Parameter Expansion
>  doesn't seem to say, what the #, ##, % an %% special forms of
>  expansion work on: bytes or characters
> 
> - 2.13. Pattern Matching Notation says:
>  "The pattern matching notation described in this section is used to
>  specify patterns for matching strings in the shell."
>  => strings... would mean bytes

... and pattern matching notation works on strings (=bytes)...

> 
> - 2.13.1 Patterns Matching a Single 

Re: POSIX gettext() and uselocale()

2022-01-16 Thread shwaresyst via austin-group-l at The Open Group
Historically, gettext domains are process wide, making use in multi-threaded 
apps problematic to begin with. The *_l versions only partially address this. 
The uselocale() interface is included there for the cases where a locale is 
used by both a uselocale() and one or more of the *_l versions, in that a 
second uselocale() call after the retrievals, with a different locale, may 
cause the memory mapping many implementations use for .mo files to be released 
on the next *_l call. Yes, it is not the call itself that causes these 
releases, or shouldn't, but as the root reason, imho, it should stay in the 
list. 
 
  On Sun, Jan 16, 2022 at 4:11 PM, Bruno Haible via austin-group-l at The Open 
Group wrote:   [First sent on 2021-05-03. 
Resending because it has not been handled.]

https://posix.rhansen.org/p/gettext_draft
says (line 358):

  "The returned string may be invalidated by a subsequent call to
  bind_textdomain_codeset(), bindtextdomain(), setlocale(),
  textdomain(), or uselocale()."

While in most programs setlocale(), textdomain(), bindtextdomain(),
bind_textdomain_codeset() are being called at the beginning of the
program execution, before any call to gettext(), the situation is
very different for uselocale().

1) uselocale() is meant to have effects ONLY on the thread in which it
  is called.

2) uselocale() is a helper function to implement *_l functions where
  the POSIX standard does not specify them or the system does not have
  them.
  For example, when a program wants to have a function to parse
  a number, recognizing only the ASCII digits and only '.' as decimal
  separator, a reliable way to implement such a function is by calling
  uselocale of the "C" locale, strtod(), and then uselocale() again
  to switch the thread back to the previous locale.

  If POSIX did not have uselocale(), it would need to provide many
  more *_l functions.

If the gettext() result may be invalidated by a uselocale() call (in
any other thread!), this would mean that

  ** Programs can use gettext() or uselocale() but not both. **

and - more or less -

  ** Multithreaded programs that use libraries (that may use uselocale())
    cannot use gettext(). **

I think that specifying gettext() to be so restricted is not useful.
It would make more sense to allow concurrent uselocale() calls.

Proposed wording:

  "The returned string may be invalidated by a subsequent call to
  bind_textdomain_codeset(), bindtextdomain(), setlocale(),
  or textdomain()."



  


Re: Future of locale, will there be POSIX.utf-8, what will it bring?

2022-01-07 Thread shwaresyst via austin-group-l at The Open Group
C11 tried to add the minimal support for UTF-8, with the u8 string constant 
prefix, but in a broken manner. C2x provides what can be considered a fix for 
this, but does it in a mostly unusable way from the aspect of supporting 
multiple locale languages. That is why you don't see anything about a Unicode 
enabled locale; the fix enables the C locale to stay unchanged.
Because POSIX is adding the  header 16 and 32 bit encodings will be 
supported, separate from wchar_t, via the char16_t and char32_t types. How 
UCS-2 and UCS-4, as encodings, map to the wide character set used by a platform 
is left as a quality of implementation issue for the interfaces in that header, 
so how wchar_t is encoded is considered a non-issue.
What's there is adequate to say minimal support for the 3 primary encoding 
forms has been added, imo. While more aspects of Unicode could be considered in 
scope of the C standard, I think a lot has been left out so implementations, or 
standards like POSIX, aren't locked into having to provide things that their 
end users will rarely, if ever, need. 
 
  On Fri, Jan 7, 2022 at 1:46 PM, Steffen Nurpmeso wrote:   
Hello.

shwaresyst wrote in
 <1494661216.220561.1641574109...@mail.yahoo.com>:
[i resort a bit]
 |  On Thu, Jan 6, 2022 at 3:40 PM, Steffen Nurpmeso via austin-group-l \
 |  at The Open Group wrote:  Hello!
 |
 |I wonder about POSIX.utf-?8, i tried to remember any statement
 |i had read, and Mantis did not show up results.
 |
 |In particular i am interested in whether LC_CTYPE results will
 |bring true Unicode support or not, the reason i am asking is that
 |the upcoming version of my work-box GNU LibC-based (2.34) Linux
 |distribution will provide it like
 |
 |  localedef -i POSIX -f UTF-8 $PKG/usr/lib/locale/C.UTF-8 2> /dev/null \
 ||| true
 |
 |and then this thing is detected as an UTF-8 locale, but causes
 |three test failures of the MUA i maintain because character set
 |conversion behaves differently.
 |
 |My personal opinion was that POSIX.utf8 will bring the complete
 |range of Unicode characters to at least LC_CTYPE, i wonder about
 |LC_COLLATE, as language matching is, hm, very language specific.
 |The rest not (maybe LC_MESSAGES going for UTF-8 though).
 |
 |Is that approximately correct?

 |The first Issue 8 draft is focusing, afaik, on adding the C1x changes \
 |and Mantis Issue 8 tagged items. The changes to XBD 6, 7, etc., that \
 |will formally add a POSIX UTF8 locale are to be part of the second, \
 |maybe third, draft. This is why you don't see them yet.
 |For maximum compatibility with existing practice the required base \
 |repertoire for this will likely be some subset of UCS-2, plus ISO-6429 \

16-bit characters i do not see in POSIX, going that route would
make impossible implementations which use specific bit patterns in
wchar_t, which, if i recall correctly from 2014 or when i was
looking into the issue, is used by at least the Citrus
implementation of the mb* and w* series for at least some asian
languages.  And more .. but that was not the issue i am concerned
about at the moment anyhow, i personally would assume 8-bit aka
UTF-8 character strings to be predominant in Unix based systems,
they surely are in the predominant ones.  (Even though, i have to
say, UTF-16 aka 16-bit characters do have their value for the
majority of the massively declining number of human languages, and
the older i get the more i think using that as a base is a good
decision.)

 |in full, not the complete range. I've hopes this will be significantly \
 |more than the minimal repertoire of C2x, but it may not as a matter \

That made me look for and download a 2020 draft of ISO C2X, i did
not have a look until now.

 |of deferral to the C standard. It should be left up to implementations \
 |still, in my opinion, how much of the range beyond this base they want \
 |to support as extensions, including UTF16 as an encoding. How the LC_* \
 |categories will be extended to fully support that base repertoire accord\
 |ing to the Unicode requirements hasn't been determined yet either, \
 |but this is the nominal goal. 

And from a glance i do not see anything Unicode-enabled-locale
wise.  UTF-16 specifically i do not see ... as you will have to
convert on input and on output in order to use it in your program,
and then you can very well convert to the transparent wchar_t, or
use the wide I/O series which gives it to you.  Minimizing the
tremendous deficiency that many traditional Unix programs have to
face because the historic string interfaces do not provide proper
functionality to deal with human languages is out of scope is it?

At least it seems as if ISO C2X introduces support for UTF-8 as
a native string representation ... in practice it seems Unix
people use GNU libunicode (which explicitly supports UTF-(32|16|8)
i think) as well as ICU (which i think used UTF-16 internally but
offered improved UTF-8 interface performance by then), so the ISO
standard people 

Re: Future of locale, will there be POSIX.utf-8, what will it bring?

2022-01-07 Thread shwaresyst via austin-group-l at The Open Group
The first Issue 8 draft is focusing, afaik, on adding the C1x changes and 
Mantis Issue 8 tagged items. The changes to XBD 6, 7, etc., that will formally 
add a POSIX UTF8 locale are to be part of the second, maybe third, draft. This 
is why you don't see them yet.
For maximum compatibility with existing practice the required base repertoire 
for this will likely be some subset of UCS-2, plus ISO-6429 in full, not the 
complete range. I've hopes this will be significantly more than the minimal 
repertoire of C2x, but it may not as a matter of deferral to the C standard. It 
should be left up to implementations still, in my opinion, how much of the 
range beyond this base they want to support as extensions, including UTF16 as 
an encoding. How the LC_* categories will be extended to fully support that 
base repertoire according to the Unicode requirements hasn't been determined 
yet either, but this is the nominal goal. 
 
  On Thu, Jan 6, 2022 at 3:40 PM, Steffen Nurpmeso via austin-group-l at The 
Open Group wrote:   Hello!

I wonder about POSIX.utf-?8, i tried to remember any statement
i had read, and Mantis did not show up results.

In particular i am interested in whether LC_CTYPE results will
bring true Unicode support or not, the reason i am asking is that
the upcoming version of my work-box GNU LibC-based (2.34) Linux
distribution will provide it like

  localedef -i POSIX -f UTF-8 $PKG/usr/lib/locale/C.UTF-8 2> /dev/null || true

and then this thing is detected as an UTF-8 locale, but causes
three test failures of the MUA i maintain because character set
conversion behaves differently.

My personal opinion was that POSIX.utf8 will bring the complete
range of Unicode characters to at least LC_CTYPE, i wonder about
LC_COLLATE, as language matching is, hm, very language specific.
The rest not (maybe LC_MESSAGES going for UTF-8 though).

Is that approximately correct?

Thanks and Ciao! from Germany,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

  


Re: cut -DF

2021-12-04 Thread shwaresyst via austin-group-l at The Open Group
Yes, there's a path; file an Enhancement Request in Mantis. However, if toybox 
wants to be more POSIX conforming it'll have to add an awk implementation 
anyways, eventually, so not sure such a request would get much traction for 
sponsorship. Those with awk already might not want to add it to their version 
of cut, as unnecessary duplication of functionality. 
 
  On Sat, Dec 4, 2021 at 9:37 AM, Rob Landley via austin-group-l at The Open 
Group wrote:   Since toybox doesn't have its own 
awk yet (and thus awk '{print $3 $4 $5}'),
back in 2017 toybox added the -D, -F, and -O options to cut:

    -D  Don't sort/collate selections or match -fF lines without delimiter
    -F  Select fields separated by DELIM regex
    -O    Output delimiter (default one space for -F, input delim for -f)

-O is -d for output, -F is a regex version of -f, and -D says to show the raw
matches in the order requested (and ONLY those matches, not passing through
lines with no matches).

This lets you do:

  $ echo one two three four five six seven eight nine | cut -DF 7,1-3,2
  seven one two three two

Elliott Hughes (the Android base OS maintainer) asked if I could get the feature
more widely adopted:

  http://lists.landley.net/pipermail/toybox-landley.net/2021-June/012453.html

> your non-POSIX cut(1) extension covers 80% of the in-the-wild use of awk
> anyway :-) if you still talk to any of the busybox folks, we should suggest
> they copy that --- it would be nice for it to be a de facto standard so we
> can get it into POSIX sometime around the 2040s... (and have made lives
> better for the folks who don't care about standards and just want to "get
> things done" in the intervening decades!)

So I offered to implement it in busybox:

  http://lists.busybox.net/pipermail/busybox/2021-June/06.html

And the busybox maintainer merged it here:

  https://git.busybox.net/busybox/commit/?id=0068ce2fa0e3

Is there a path to try to get this option set into posix?

Rob

  


Re: Interpretation starting for a 30 day review (1440)

2021-10-29 Thread shwaresyst via austin-group-l at The Open Group
This is felt required to get POSIX accurately describing what the C standard 
version of system() requires, taking into account where sh differs from the 
minimal requirements of the command shell in that standard. POSIX is as it is 
because it was assumed no programmer would use a option switch character as a 
utility name first character as recommended and so was superfluous, and the 
vast majority don't, but the C standard requires this as it allows any 
characters, besides NUL, are permitted as command name first characters. So, 
the standard is more precise with it than without it.
Because the use of "--" is in an "shall behave as if" clause it is expository, 
not a coding requirement. Some libraries use posix_spawn() to implement 
system(), for example. Some may only add "--" if a check of the string 
determines it is necessary, as also allowed by the standard. 
 
  On Fri, Oct 29, 2021 at 7:50 AM, Robert Elz via austin-group-l at The Open 
Group wrote:       Date:        Fri, 29 Oct 2021 
09:51:09 +0100
    From:        "Andrew Josey via austin-group-l at The Open Group" 

    Message-ID:  <5bf8909a-6cc2-4089-87c1-5fac762fa...@opengroup.org>

  | The following interpretation is starting a 30 day review 
  |
  | 0001440: System Interfaces Calling `system("-some-tool")` fails (although 
it is a valid `sh` command)    
  |
  | Comments are due back no later than November 29 2021.

I object to this one.

In the recent added note (5510) the following appears at the start of
the Rationale for this change:

    There is nothing known that applications can usefully do if the "--"
    is omitted,

That's true, in fact, it is almost possible to prove it (and maybe it
even is).

But
    therefore there is no reason that the standard should not require
    the "--".

that does not follow.  What one could conclude is that no applications
will be broken by adding the "--", but that does not mean the standard
should specify it.

If the standard specifies that the "--" appears, then usages like the
the one in the Subject ( system("-some-tool") ) would be expected to
work, and we know that with current implementations, they do not.

What should be done here, is to advise implementations that

    sh -c -- cmd

is exactly "as if"

    sh -c cmd

and so that adding the "--" does no harm, and is acceptable (the standard
does not require the "--" be omitted, not even by its old wording).  And
that doing this makes things work better, so it is a good idea for
implementations to do that.

It might even be, in fact probably is, worthy of a "Future directions"
stating that the "--" might be required by a future revision of the standard.

But that needs to wait until implementations actually do it.

The problem is that with current implementations, if the cmd is going to
start with '-' or '+' it will be misinterpreted, so to work if a cmd like
that is possible, the application must protect that character, usually by
including some white space before it (though in particular situations there
are other possibilities).

So, please do not approve this interpretation, return it to the group with
instructions that the group not attempt to act as a legislature, deciding
what it feels is good for the world, but as a standards body, correctly
documenting how this can be expected to work, and what a new implementation
needs to do to be compatible with what exists now.

kre

ps: as it happens, I am (or should be if I was not wasting time replying
to this) testing a change to NetBSD that adds the "--" in both system()
and popen() ... but that we will (I expect) have an implementation that
would conform with the proposed text does not mean that it is the correct
thing for POSIX to specify.

  


Re: What string representations of "zero" expr should consider as "zero"?

2021-07-02 Thread shwaresyst via austin-group-l at The Open Group
To the extent XBD 11.1, #6 applies and 2's complement notation is the internal 
representation required, the standard is pretty clear. The first 3 cases all 
evaluate to numeric 0, whether specified in paired quotes or not since the 
shell does quote removal, the +0 case is always a string since + is disallowed 
as a sign character. For the -0 case, since 2's complement does not have a 
representation for it, the practice is it is treated as equivalent to 0. XBD 
11.1 permits leading zeroes, including on a 0 value, for the 00 case, since the 
interpretation is always as decimal. For $'\0' this is effectively a zero 
length string, not a number, even if 2 NUL chars get stored as the 
argument..Similar to +0 is $'\r0',  is not a permitted sign char so that's 
a string. 
Now if more implementations than not are treating a single argument that might 
be a number as an implied "= 0" test, despite it being pretty clear the 
argument chars in this case have to be considered a string, then perhaps the 
Exit Status needs to reflect that as the predominant practice.
 
  On Fri, Jul 2, 2021 at 4:31 AM, Geoff Clare via austin-group-l at The Open 
Group wrote:   Stephane Chazelas wrote, on 01 Jul 
2021:
>
> BTW, for "expr", what is "zero" meant to be?
> 
> I see some variation in behaviour for "00", " 0", "-0", "+0",
> $'\r0', which some (but not all) also treat as zero.

> Also 0,000 or 0,000,000 in locales where "," is a thousand
> separator with ast-open expr (also the builtin expr of ksh93 if
> built as part of ast-open).

I would say the standard is unclear.  To me the most reasonable
interpretation of "The expression evaluates to null or zero" is
that it evaluates to either a null string or a zero-valued integer.
However, that would require "expr 0" to exit with status 0 (because
the 0 argument is treated as a string in this case), which does not
match existing practice.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

  


Re: Minutes of the 14th June 2021 Teleconference

2021-06-15 Thread shwaresyst via austin-group-l at The Open Group
That was a typo, it looks, 723 for 713. Correct link is: 
https://austingroupbugs.net/view.php?id=713


 
  On Tue, Jun 15, 2021 at 4:45 PM, Fred J. Tydeman via austin-group-l at The 
Open Group wrote:   On Tue, 15 Jun 2021 18:13:35 
+0100 Andrew Josey via austin-group-l at The Open Group wrote:
>
>The floating
>point sub-committee will discuss bug 723
>(https://austingroupbugs.net/view.php?id=723 remquo) and advise us
>on what to do.

That link takes me to 723: time is not allowed to write error messages to 
stderr


---
Fred J. Tydeman        Tydeman Consulting
tyde...@tybor.com      Testing, numerics, programming
+1 (702) 608-6093      Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.

  


Re: behavior of printf '\x61'

2021-04-15 Thread shwaresyst via austin-group-l at The Open Group
It is covered in Item 7 of those 11 exceptions, 'x' falling under the blanket 
"every character not specified is unspecified". Portable code is expected to 
use the work alike octal escape, not hex codes. 
 
  On Fri, Apr 16, 2021 at 12:05 AM, Philip Guenther via austin-group-l at The 
Open Group wrote:   The general question is what 
requirements the standard put on the printf utility when the format argument 
contains a \x or other unspecified backslash escape, but the example in the 
subject is a nice concrete example: what's required for or about the output of
        printf '\x61'
?

1003.1-2016 describes the handling of the format argument like this:
-
The format operand shall be used as the format string described inXBD Chapter 5 
(on page 121) with the following exceptions:
-

...followed by a list of 11 exceptions that do not cover \x.  So, let's look at 
XBD Chapter 5:
-
The format is a character string that contains three types of objects
defined below:
   1. Characters that are not "escape sequences" or "conversion
      specifications", as described below, shall be copied to the output.

   2. Escape Sequences represent non-graphic characters and the
      escape character ().

   3. Conversion Specifications specify the output format of each
      argument; see below.
-

Okay, so if it's not an escape sequence or conversion specification, it _shall_ 
be copied to the output.  To jump forward to conversion specifications:
-
Each conversion specification is introduced by the character 
('%').
-

Okay, so \x61 isn't a conversion specification.  Is it an escape sequence?  
Well, there's just a table for those, which lists the following: \\ \a \b \f \n 
\r \t and \v.  There's no "other sequences starting with  are 
unspecified" statement that I can find.

It therefore appears to me that
        printf '\x61'

is required by the standard to output
        \x61

without a following newline.  Unfortunately, the systems I've tested (CentOS 6 
and 7, MacOS, FreeBSD 12, and OpenBSD 6.9) all output an ascii 'a' without a 
following newline.

Did I miss a statement about  somewhere that renders this behavior 
unspecified?


If a wording tweak is deemed to be in order, please note that it should be 
placed or duplicated such that it also applies to the argument interpreted by 
the %b format conversion, because the same "apparently specified but no one 
behaves that way" is true of this:
        printf %b '\x61'


Philip Guenther
  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread shwaresyst via austin-group-l at The Open Group
Then that is conformance bugs in those kernels, to me, in that files of this 
type are not load images exec() is to handle that are usable with dl*(). The 
allowance is for magics differentiating formats of that nature, as I see as the 
intent, not one bypassing what the shell is supposed to determine and in the 
process making illegal what the shell description asserts is required to be 
possible. The way to get shebang processing is as I outlined by adding to set, 
not trying to take advantage of the current language of exec() being too 
permissive.

 
 
  On Mon, Apr 12, 2021 at 9:04 AM, Joerg Schilling via austin-group-l at The 
Open Group wrote:   "shwaresyst via 
austin-group-l at The Open Group"  wrote:

> No, it's not nonsense. The definition of comment has all characters, 
> including '!', shall be ignored until newline or end-of-file being 
> conforming. Then tokenization which might discover an operator, keyword or 
> command continues. This precludes "#!" being recognized as any of those. 
> There is NO allowance for '!' being the second character as reserved for 
> implementation extensions.

#!/bad of course is a normal comment from the vew if a normal shell. 
An execption is mz old "bsh" (not bosh) on a historic UNIX without support for
#! in the kernel.

On all recent platforms, #! is just another *magic number* that is handled by 
the kernel only.

POSIX of course does not limit what magics are recognised by the kernel.

Jörg

-- 
EMail:jo...@schily.net                  Jörg Schilling D-13353 Berlin
                    Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/

  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread shwaresyst via austin-group-l at The Open Group
We are talking about the shell, not some bastardization of execve(), that sees 
it's not a directly loadable process image so treats it as a script. For those 
shells implementing shebang as an extension it is still them piping the body of 
the script after the shebang line, without any token expansion, to an alternate 
interpreter via an exec() of some sort. Second, conforming applications can not 
rely on unspecified behaviors, so having a use beyond that specified makes the 
shell nonconforming. Calling it out like that simply acknowledges a lot of 
shell implementations choose to make themselves nonconforming, I do not see it 
as an endorsement or allowance. The requirement explicitly specified behavior 
shall be implemented as specified takes priority. Some conforming script 
authors may simply want the first line to be a# IMPORTANT USAGE NOTE 
headline, or similar, not want a utility named "!!!" to be exec'd.
What the standard does allow as an extension, and I would support adding to the 
standard, is adding an option to turn off token expansion in here-doc bodies, 
and back on, via set. This allows the effect of shebang to be accomplished 
anywhere in a script, at the expense of a few extra characters for the here 
delimiter and set commands, without any other changes to tokenizing or the 
grammar. 
 
  On Sun, Apr 11, 2021 at 12:15 PM, Harald van Dijk wrote:   
On 11/04/2021 17:09, shwaresyst via austin-group-l at The Open Group wrote:
> No, it's not nonsense. The definition of comment has all characters, 
> including '!', shall be ignored until newline or end-of-file being 
> conforming. Then tokenization which might discover an operator, keyword 
> or command continues. This precludes "#!" being recognized as any of 
> those. There is NO allowance for '!' being the second character as 
> reserved for implementation extensions.

This is wrong on two counts. The first is that you're assuming that this 
will be interpreted by a shell. If execve() succeeds (and the #! line 
does not name a shell), it will not be interpreted by a shell at all, 
and the shell syntax for comments is irrelevant. The second is about 
what happens when it does get interpreted by a shell: POSIX allows 
shells to treat files starting with "#!" specially: "If the first line 
of a file of shell commands starts with the characters "#!", the results 
are unspecified."

Cheers,
Harald van Dijk
  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread shwaresyst via austin-group-l at The Open Group
No, it's not nonsense. The definition of comment has all characters, including 
'!', shall be ignored until newline or end-of-file being conforming. Then 
tokenization which might discover an operator, keyword or command continues. 
This precludes "#!" being recognized as any of those. There is NO allowance for 
'!' being the second character as reserved for implementation extensions.

 
 
  On Sun, Apr 11, 2021 at 11:37 AM, Robert Elz wrote:       
Date:        Sun, 11 Apr 2021 10:46:48 + (UTC)
    From:        shwaresyst 
    Message-ID:  <1413127944.766378.1618138008...@mail.yahoo.com>

  | That's bugs in those shells for POSIX mode then, that I see.

That's nonsense.

  | The conforming behavior is /usr/gcc is found and succeeds at doing nothing,

Nonsense.

That would be a conforming behaviour, it is not "the" conforming behaviour.

POSIX does not define what format a file must be to succeed in being
exec'd by one of the exec*() commands.  The system can have a thousand
different types that work, if it wants, and #! executables are one of
those.  That they're not required to work by POSIX doesn't mean they're
not allowed to work.

For the rest of your message, the reply I just made to Harald's message
applies.

kre

  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread shwaresyst via austin-group-l at The Open Group
That's bugs in those shells for POSIX mode then, that I see. The conforming 
behavior is /usr/gcc is found and succeeds at doing nothing, since it contains 
just a comment line. Other elements of path never get checked. Even in 
non-POSIX mode, trying to process it as a shebang with "/bad" as a ENOEXEC 
because not present, or other reason, does not imply the rest of the path 
should be searched, it should simply return a failure code.
 
 
  On Sun, Apr 11, 2021 at 6:07 AM, Harald van Dijk via austin-group-l at The 
Open Group wrote:   On 10/04/2021 17:08, Robert 
Elz via austin-group-l at The Open Group wrote:
>      Date:        Sat, 10 Apr 2021 11:54:34 +0200
>      From:        "Jan Hafer via austin-group-l at The Open Group" 
>
>      Message-ID:  <15c15a5b-2808-3c14-7218-885e704cc...@rwth-aachen.de>
> 
>    | my inquiry is a question about the potential unexpected behavior of the
>    | shell execution environment on names. It is related to shortcomings of
>    | the command utility.
> 
> I'm not sure I understand.  I read the rest of the message, and I
> couldn't find anything really about any shortcomings, other than perhaps
> some mistakes in interpretation, and usage.

If they are mistakes, they are widespread mistakes. As hinted in the 
links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing 
as files with execute permission, but /bin/gcc as a text file containing 
#!/bad so that any attempt to execute it will fail, there are a lot of 
shells where command -v gcc returns /bin/gcc, but running gcc actually 
executes /usr/bin/gcc instead without reporting any error: this 
behaviour is common to bosh, dash and variants (including mine), ksh, 
and zsh.

Cheers,
Harald van Dijk

  


Re: SIGSTKSZ is now a run-time variable

2021-03-09 Thread shwaresyst via austin-group-l at The Open Group

Yes, it's not something an application would expect to need to keep increasing, 
just that's the part of  I'd move it to. The definition could also be 
the max required by a processor family, with sysconf() reporting a possible 
lower value for a particular processor stepping. At least that way the 
application that doesn't use sysconf() won't be getting SIGSEGV faults.

Additionally, I believe the definition can be calculated at compile time as a 
multiple of ( sizeof(ucontext_t)+sizeof(overhead_struct(s)) ), whatever other 
overhead applies, so I don't see any real need to use sysconf(). This may mean 
having to munge a  by configure, based on config.guess, but that's 
not the standard's headache.


The CS, SC, and PC constants are not in the XSH 2.2.2 table deliberately, from 
Issue 6 TC1, as adding any also requires a bump in POSIX_VERSION or 
POSIX2_VERSION, and often XSI_VERSION. This is so each usage of a constant 
doesn't need individual #ifdefs to test option group availability. The previous 
text was allowing if an implementation wasn't supporting an option group they 
could skip including the related constants in . A simple check of 
VERSION at the top of a source C file suffices now to indicate those constants 
shall be available.
On Tuesday, March 9, 2021 Eric Blake  wrote:
On 3/9/21 10:14 AM, shwaresyst wrote:
> 
> To me that looks like a conformance violation and should be reverted. There 
> is no _SC_SIGSTKSZ defined in  by the standard, to begin with, so 
> that use of sysconf() is a non-portable extension on its own.

Portable apps can't use _SC_SIGSTKSZ, but the standard generally permits
implementations to define further constants.  Then again, re-reading XSH
2.2.2:

" Implementations may add symbols to the headers shown in the following
table, provided the identifiers for those symbols either:

    Begin with the corresponding reserved prefixes in the table, or
..."

but the table lacks a row for  with _CS_* and _SC_* constants.
 Looks like you found an independent defect.

> 
> I could see the definition of SIGSTKSZ being changed to the static minimum a 
> particular processor requires, or is initially allocated as a 'safe' amount, 
> rather than static "default size", and moving SIGSTKSZ to . This 
> would contrast to MINSIGSTKSZ as the lowest value for a platform for all 
> supported processors. Then an application could use sysconf() to query for 
> the maximum size the configuration supports if it wants to use more than 
> that, as a runtime increasable limit.

As I understand it, the concern in glibc is less about runtime
increasability, so much as ABI compatibility with applications compiled
against older headers at a time when the kernel had less state
information to store during a context switch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.          +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: SIGSTKSZ is now a run-time variable

2021-03-09 Thread shwaresyst via austin-group-l at The Open Group

To me that looks like a conformance violation and should be reverted. There is 
no _SC_SIGSTKSZ defined in  by the standard, to begin with, so that 
use of sysconf() is a non-portable extension on its own.

I could see the definition of SIGSTKSZ being changed to the static minimum a 
particular processor requires, or is initially allocated as a 'safe' amount, 
rather than static "default size", and moving SIGSTKSZ to . This 
would contrast to MINSIGSTKSZ as the lowest value for a platform for all 
supported processors. Then an application could use sysconf() to query for the 
maximum size the configuration supports if it wants to use more than that, as a 
runtime increasable limit.
On Tuesday, March 9, 2021 Eric Blake via austin-group-l at The Open Group 
 wrote:
[adding glibc and Austin group lists]

On 3/6/21 12:50 PM, Bruno Haible wrote:
> Hi,
> 
> Carol Bouchard wrote in 
> :
>> A change that was introduced is the
>> #define SIGSTKSZ is no longer a statically defined variable.  It's value can
>> only be determined at run time.
>>
>> # define SIGSTKSZ sysconf (_SC_SIGSTKSZ)
> 
> This is invalid. POSIX:2018 [1] defines two lists of macros:
> 
>  1) "The  header shall define the following macros which shall
>      expand to integer constant expressions that need not be usable in
>      #if preprocessing directives:"
> 
>  2) "The  header shall also define the following symbolic 
>constants:"
> 
> SIGSTKSZ is in the second list. This implies that it must expand to a constant
> and that it must be usable in #if preprocessing directives.

The question becomes whether glibc is in violation of POSIX for having
made the change, or whether POSIX needs to be amended to allow SIGSTKSZ
to be non-preprocessor-safe and/or non-constant.

> 
> Besides being invalid, it is also not needed. The alternate signal stack
> needs to be dimensioned according to the CPU and ABI that is in use. For 
> example,
> SPARC processors tend to use much more stack space than x86 per function
> invocation. Similarly, 64-bit execution on a bi-arch CPU tends to use more 
> stack
> space than 32-bit execution, because return addresses and other pointers are
> 64-bit vs. 32-bit large. But once you have fixed the CPU and the ABI, there is
> no ambiguity any more.
> 
>> This affects m4 code since the code assumes a statically defined variable 
>> which
>> can be determined at preprocessor time.
> 
> POSIX guarantees this assumption.
> 
>> Please advise how I can get past this.
> 
> Fix your .

https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=6c57d320484988e87e446e2e60ce42816bf51d53
shows where glibc made the change, and I've now seen reports of several
projects failing to build when using glibc with this change included.

> 
> Bruno
> 
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.          +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [1003.1(2016/18)/Issue7+TC2 0001454]: Conflict between "case" description and grammar

2021-02-19 Thread shwaresyst via austin-group-l at The Open Group

At that point in the grammar TOKEN is "esac)" or "(esac)", from which the WORD 
"esac" is extracted, not converted to Esac, as right paren is not an operator 
character that terminates token recognition. Rule 4 applies to "esac ;" or 
"esac" linebreak, no right paren discovered on lookahead, that I see. Same with 
the '|' char, it does not terminate the TOKEN. It could be more explicit that 
the pattern production is subcontext delimited by the ')', I suppose.
On Friday, February 19, 2021 Chet Ramey via austin-group-l at The Open Group 
 wrote:
On 2/19/21 11:21 AM, Geoff Clare via austin-group-l at The Open Group wrote:

>> There is no way to apply rule 4 to produce "a token identifier acceptable at
>> that point in the grammar". The only token identifier acceptable at that
>> point in the grammar is WORD, and rule 4 does not produce WORD. Rule 4
>> reads:
>>
>>    When the TOKEN is exactly the reserved word esac, the token identifier
>>    for esac shall result. Otherwise, the token WORD shall be returned.
>>
>> Here, the TOKEN is exactly the reserved word esac, and you agree that this
>> rule is applied. This therefore produces the token identifier for esac.
>> There is nothing else that turns it into WORD, which is needed to parse it
>> as a pattern.
> 
> I see your point.  The wording of rule 4 itself does not yield WORD in
> this case; it's only when read in combination with the introductory text
> from 2.10.1 that it becomes apparent that this is the intention.

So "acceptable at that point in the grammar" is indeed carrying a heavy
load here. You might want to add the qualifying language you suggested.


> Incidentally, bash 3 on macOS gets the '|' case wrong, e.g.:
> 
> case esac in foo|esac) echo match;; esac
> 
> whereas bash5 accept that.  So it would appear that Chet fixed the
> preceded-by-'|' case at some point but not the preceded-by-'(' case.

It's just another special case in the grammar that lexical analysis
has to handle.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
        ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/



RE: clarification needed: shell 'exec' + function (builtin, …)

2020-12-09 Thread shwaresyst via austin-group-l at The Open Group

I agree more clarification is desirable. The reason I see as why the function 
isn't executed is it may be treating it as an invoke of "sh -c ls", because ls 
is a function, but this new sh does not inherit that definition so it looks on 
path instead and finds the utility.
On Wednesday, December 9, 2020 Thorsten Glaser via austin-group-l at The Open 
Group  wrote:
Hi *,

I’ve got a report in IRC by a user who spotted a cross-shell difference.

In my opinion, the invocation…

    sh -c 'ls() { echo meow; }; exec ls'

… is supposed to output "meow\n and return to the caller with a zero
errorlevel.

Some shells execve() the ls(1) binary instead.
In particular, this was ksh88 behaviour, according to the comments
found in the pdksh-originating mksh source code.

My reading of this is:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#exec

⇒ exec is specified with 'command'
⇒ it will replace the shell with 'command' and never return to the shell

(note this does NOT mandate an actual execve(2) syscall or something)

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09

  A command is one of the following:
    * Simple command (see [134]Simple Commands)
    * Pipeline (see [135]Pipelines)
    * List compound-list (see [136]Lists)
    * Compound command (see [137]Compound Commands)
    * Function definition (see [138]Function Definition Command)

In the subsequent section 2.9.1 Simple Commands, Command Search and Execution,
step 1.c. finds the function.

Therefore, I believe that exec shall invoke the function, then terminate
the shell with the function’s $? as exit status.

(For builtins, 1.a. and 1.d. and 1.e.i.a. will find them.)

Thanks in advance,
//mirabilos
-- 
(gnutls can also be used, but if you are compiling lynx for your own use,
there is no reason to consider using that package)
    -- Thomas E. Dickey on the Lynx mailing list, about OpenSSL



Re: [1003.1(2016/18)/Issue7+TC2 0001346]: Require support for CLOCK_MONOTONIC

2020-12-03 Thread shwaresyst via austin-group-l at The Open Group

It's my understanding the practice predates Issue 6 (I just used that as 
example) and stems from a desire to not break code similar to:
#include 
#if defined(POSIX_OPT) && POSIX_OPT == _POSIX_VERSION
... Add code that presumes option availability ...
#endif

or at runtime:
#ifdef POSIX_OPT
if sysconf(_SC_POSIX_OPT) == _POSIX_VERSION {
... Use code that takes advantage of option ...
}
else 
#endif
{ ... Use code that doesn't or checks for earlier definition in platform 
defined manner... }

as the standard leaves fairly unspecified how a vendor is to support multiple 
versions of the standard with one runtime and set of headers.
On Thursday, December 3, 2020 Robert Elz  wrote:
    Date:        Thu, 3 Dec 2020 18:11:51 + (UTC)
    From:        shwaresyst 
    Message-ID:  <684426419.4103424.1607019111...@mail.yahoo.com>

  | The 20yymmL shall be replaced with the value specific to Issue 8 when that
  | is finalized, not that an implementation may choose an arbitrary value
  | after 2000. It's a placeholder to indicate this for the bug report only.

Yes, that's what I assumed, and said in my message:

austin-group-l@opengroup.org (that was me...) said:
  | (I read the latter as meaning that it will become the actual date of the
  | standard, not yet known).

Back to quote from shwares...@aol.com:

  | The other 200809L values all get a blanket change eventually too,

If that is the standard procedure, then sorry, but that's insane.

  | consistent with the changes from Issue 6 to Issue 7.

If the reason that NetBSD has 200112L and the standard (Issue 7) now
requries 200809L, is solely that (ie: there were no other changes to
the CLOCK_MONOTONIC specification between whichever version 200112L
identifies, and Issue 7) then that's a defect in the standard, and
should be fixed.

Making arbitrary changes that render all implementations non-conforming
and break applications that relied upon the earlier specification is
totally bizarre behaviour.

kre



Re: [1003.1(2016/18)/Issue7+TC2 0001346]: Require support for CLOCK_MONOTONIC

2020-12-03 Thread shwaresyst via austin-group-l at The Open Group

The 20yymmL shall be replaced with the value specific to Issue 8 when that is 
finalized, not that an implementation may choose an arbitrary value after 2000. 
It's a placeholder to indicate this for the bug report only. The other 200809L 
values all get a blanket change eventually too, consistent with the changes 
from Issue 6 to Issue 7.
On Thursday, December 3, 2020 Robert Elz via austin-group-l at The Open Group 
 wrote:
    Date:        Thu, 3 Dec 2020 17:21:47 +
    From:        "Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
    Message-ID:  

  | A NOTE has been added to this issue

The issue is now closed, so I cannot append a new note [Aside:
adding proposed text, and immediately closing the bug report is not
a good way to operate - even if the issue is regarded as finalised,
there can be wording issues that are worthy of discussion].

So...


  | On page 436 lines 14851 - 14854,
  | change_POSIX_MONOTONIC_CLOCKThe implementation
  | supports the Monotonic Clock option. If this symbol is defined in
  | , it shall be defined to be��-1, 0, or 200809L. The value of
  | this symbol reported by sysconf( ) shall either be�-1 or 200809L.
  | 
  | to_POSIX_MONOTONIC_CLOCKThe implementation
  | supports a monotonic clock. This symbol shall always be set
  | to the value 20yymmL.
  | and remove the [MON] shading.

Why the change from 200809L to 20yymmL ?  (I read the latter as meaning
that it will become the actual date of the standard, not yet known).

As best I can see, for implementations that already support the (previously
optional) CLOCK_MONOTONIC nothing changes - except that they will apparently
be required to alter the definition of _POSIX_MONOTONIC_CLOCK.  Why?

What's more, applications which believed the previous text, and actually
test for 200809L will no longer find it, even though nothing else changed.

To me that makes no sense.

In NetBSD, we have:
    #define      _POSIX_MONOTONIC_CLOCK          200112L
which seems to indicate that we support some older version of the
standard - but I haven't looked to see whether there are actual
changes of substance between that version and the 200809L version.

In general, the values of these "This is supported" constants should
only ever change if there is a feature difference between one version
and the next (then the different values can be used to determine what
support is to be expected - though that's a very crude mechanism).

kre




Re: make(1) parallelization, but especially .WAITing

2020-11-03 Thread shwaresyst via austin-group-l at The Open Group

I agree that's the probable intent, but like other undefined things, what isn't 
precluded is a spot where a conformance distinction can't be drawn. There how 
the identifier ends isn't specified, it's left implied implementors will only 
use  after the prefix that is specified.
On Tuesday, November 3, 2020 Paul Smith via austin-group-l at The Open Group 
 wrote:
On Mon, 2020-11-02 at 15:44 +, shwaresyst via austin-group-l at The
Open Group wrote:
> With that phrasing  is also reserved, since it
> is not " followed ONLY by uppercase". Using ".NO_parallel"
> would be similarly conforming, it could be argued.

I don't agree.  By saying "names consisting of" the standard requires
that the entire name must consist of those characters, not just the
first part of the name.

> (The last sentence before the "Macros" heading says "Targets with
> names consisting of a leading  followed by one or more
> uppercase letters are reserved for implementation extensions."




Re: make(1) parallelization, but especially .WAITing

2020-11-02 Thread shwaresyst via austin-group-l at The Open Group

With that phrasing  is also reserved, since it is not 
" followed ONLY by uppercase". Using ".NO_parallel" would be similarly 
conforming, it could be argued.
On Monday, November 2, 2020 Geoff Clare via austin-group-l at The Open Group 
 wrote:
Joerg Schilling wrote, on 31 Oct 2020:
>
> Well this is true. As long as POSIX does not mention parallel builds at all, 
> it makes no sense for .WAIT to appear in a POSIX standard - except as a 
> reserved special target.

It's already in the reserved namespace, so no need to reserve it
explicitly.  (The last sentence before the "Macros" heading says
"Targets with names consisting of a leading  followed by one
or more uppercase letters are reserved for implementation extensions."

> Now it would be nice to have support for .NO_PARALLEL:  and for 
[...]

That name isn't reserved, because it has an underscore.

However, SunPro make seems to have several special targets with an
underscore, so it's possible underscore was left out of the reserved
name space by mistake.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: printf (the utility) expected range of integer values

2020-10-24 Thread shwaresyst via austin-group-l at The Open Group

Could an implementor represent integers as an internal
form with 0 bits (in which the only value that doesn't overflow is 0)
and hence always print 0 for any %d (%u/%x/%d) conversion, with an error
message about overflow for any value with any bits set?

No, the standard requires the internal representation to be two's complement 
for conforming applications; other internal format use is considered 
unspecified behavior. While a utility may support other formats, it is implicit 
by default they support two's complement also for interaction with those 
applications. This ties into the ranges that can be expected to be output are 
between the *_MAX and *_MIN values from the  used to compile the 
utility, and supposedly the implementation as a whole. If something to this 
effect really needs to be added it would go in XBD 2 as an implementation 
conformance requirement, I'd think. 

The last value to be output on error, nominally, is the one before a multiply 
by 10 or add of next digit causes the overflow, is how I'd construe it. For a 
short %d, I'd expect "32769" to output "3276", as the most digits capable of 
fitting in a 16 bit 2's comp. internal format as an actual value.
On Saturday, October 24, 2020 Robert Elz  wrote:
    Date:        Sat, 24 Oct 2020 16:47:41 + (UTC)
    From:        shwaresyst 
    Message-ID:  <160402159.2963847.1603558061...@mail.yahoo.com>

  | The text relevant to all this I see is the paragraph at line 104150, page 3=
  | 114, c181.pdf,

That is the text I quoted in the previous message (I got it from 202x d1.1
but that's irrelevant, the page & line numbers have changed, but the words
are the same).  For reference, here it is again:

    If an argument operand cannot be completely converted into an internal
    value appropriate to the corresponding conversion specification, a
    diagnostic message shall be written to standard error and the utility
    shall not exit with a zero exit status, but shall continue processing
    any remaining operands and shall write the value accumulated at the
    time the error was detected to standard output.

  | which limits outputs to the internal representation range of
  | the format characters used, converted back to text.

Yes.  But what does that actually mean to someone who wants to use
printf (the utility) and wants to be sure it will be able to print the
numbers needed?  Could an implementor represent integers as an internal
form with 0 bits (in which the only value that doesn't overflow is 0)
and hence always print 0 for any %d (%u/%x/%d) conversion, with an error
message about overflow for any value with any bits set?

If not, what text in the standard prohibits that?    We know it can't happen
for printf(3) (XSH.3.fprintf) as the minimum size of a C int (in POSIX)
is 32 bits.  But where is the required range of printf(1) (XCU.3.printf)
integers stated?  Surely not nowhere?

  | This should probably be explicit that the conversion shall detect
  | overflows,

It is, particularly when combined with what is in the APPLICATION USAGE
section.  In c181 see page 3115, the paragraph that starts at line 104190:

    If an argument cannot be parsed correctly for the corresponding
    conversion specification, the printf utility is required to report
    an error. Thus, overflow and extraneous characters at the end
    of an argument being used for a numeric conversion shall be reported
    as errors.

This part isn't a problem, or an issue, this is quite clear (and, aside
from ksh93, which is obviously broken) is what everything I tested does.

Now back to the questions from the original mnessage, neither of which did
you even attempt to answer.

Where, if anywhere, is it started what range of integers is required to be
supported by printf the utility?  Or in other words, is there a smallest
value which is permitted to generate an overflow (for present purposes just
consider positive numbers, we can all easily extrapolate to negative when
appropriate.)  Further, and related, is there any value which is required
to be treated as overflow (perhaps related to something in  rather
than an absolute constant in the printf page)?  And if so, where is that
stated?

For this, remember that printf the utility has no length modifiers for the
numeric conversions (at least the integer ones, the floats aren't required
at all, so obviously nothing is there to distinguish float from double, etc).
That is, there is only one "kind" of integer that it is able to print, a
simple %d (or %u %x %o), there is no %ld %jd %zd %lld ...

And second, when an overflow does occur, and an error message is printed to
stderr (and the eventual exit status from printf when it completes is set to
something greater than 0) then, as required, printf is still required to
print a value for the conversion that overflowed.  What value should be
printed - the maximum that could be handled, which is the common result
(presumably 

RE: printf (the utility) expected range of integer values

2020-10-24 Thread shwaresyst via austin-group-l at The Open Group

The text relevant to all this I see is the paragraph at line 104150, page 3114, 
c181.pdf, which limits outputs to the internal representation range of the 
format characters used, converted back to text. This should probably be 
explicit that the conversion shall detect overflows, positive or negative, when 
converting input text, and to treat this as an error. While the C standard 
permits silent overflows in converting C source this makes the utility 
non-portable.
On Saturday, October 24, 2020 Robert Elz via austin-group-l at The Open Group 
 wrote:
Is there somewhere, anywhere, where it is possible to infer what
range of values printf (the utility, not the C library function)
is expected to handle?

I can find nothing in the XCU 3.printf page, nor in XBD 5 (and also
not in XBD 12, which would be another plausible place).  There doesn't
seem to be anything about integers at all in XBD 3.

XBD 14.limits.h gives the minimum allowed value for the maximum value
of an integer (2^31 - 1) (ie: requires at least 32 bit int), but I can
find nothing that says explicitly that that applies to printf the utility.

Is there some expected minimum integer size for printf (the utility)
that is actually specified somewhere?

Further, since printf (the utility) is really just converting text
strings from one format to another, there's really no reason that there
needs to be any limit at all - there's no particular reason that integers
thousands of digits long couldn't be handled.  The standard does say that
if overflow occurs, an error message, and non-zero exit status, must
occur, but it doesn't ever say that overflow must occur.

Second question - if overflow does occur (at whatever point) what is the
value that must be printed (in addition to the error message) from a
numeric conversion.

Given a printf that uses 64 bit integers (which seems to be a very common
choice) then what should be printed from

    printf '%d\n' 0xc000

?

(This is the example that made me think about all of this - we (NetBSD)
have been offered a patch to make the error message go away, and the
result be:
    -70368744177664
That is, treating the value as a bit pattern for the 64 bits, which then
has the sign bit set, and so prints as a negative value.

We will not be doing that.

But what should we print?  (In addition to the error).

Every shell I tested (with 2 exceptions) does:

printf '%d\n' 0xc000
-bash: printf: warning: 0xc000: Result too large or too small
9223372036854775807

That one, obviously, is from bash.  Note that the "every shell" for this
is not all that meaningful, many don't have printf built in, and so are
simply running the NetBSD filesystem printf utility .. so it isn't then
surprising that they all do the exact same thing as that does!  But it
is obvious that at least the NetBSD sh, bash, bosh, zsh, and ksh93 have
a builtin printf (the error messages differ...)

But that value might not be what the standard calls for (even though it
is what almost everyone does), what the standard says is:

    If an argument operand cannot be completely converted into an internal
    value appropriate to the corresponding conversion specification, a
    diagnostic message shall be written to standard error and the utility
    shall not exit with a zero exit status, but shall continue processing
    any remaining operands and shall write the value accumulated at the
    time the error was detected to standard output.

The question is, what is "the value accumulated at the time the error was
detected".

What zsh does is:

    zsh $ printf '%d\n' 0xc000
    zsh: number truncated after 15 digits: c000
    1152917106560335872

which makes some sense to me, I had been thinking this might be the
correct value, before I started testing to see what was produced.
That is, after the first 15 hex digits are consumed, that is the value
(0xc00 in decimal) and then when an attempt is made to
add one more zero, we detect the overflow, and so the value that had
been accumulated when the overflow was detected was 1152917106560335872
(when printed via %d).

The value "everybody" else prints, 9223372036854775807, is simply 2^63-1
(the max possible value) which most likely was never actually encountered
during the conversion, but is just what strtoll() returns as its value.

kre

ps: the other shell which didn't produce 9223372036854775807 was ksh93,
which actually does
    ksh93 $ printf '%d\n' 0xc000
    -70368744177664
Sad that.  Good thing that we don't use ksh as the basis of the standard!




RE: Overflow conditions for read() and fread() (was: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function)

2020-10-07 Thread shwaresyst via austin-group-l at The Open Group

The C standard leaves it undefined for fread() because it doesn't require 
EOVERFLOW in , that I see, or presumes size_t will always be a short 
or int type. Since POSIX does have it and does not presume a limited width I 
feel this is a place where a CX extension is warranted as a portability 
consideration.
On Wednesday, October 7, 2020 Geoff Clare via austin-group-l at The Open Group 
 wrote:
> -- 
>  (0005036) shware_systems (reporter) - 2020-10-07 14:28
>  https://austingroupbugs.net/view.php?id=697#c5036 
> -- 
> That is an error in read(), and fread() as well; that these should have
> that case also as a may fail type.

The above was in reply to my note about posix_getdents() EOVERFLOW
that said:

    This set me thinking about why that part of the EOVERFLOW error is
    there at all. There is no equivalent EOVERFLOW for read(), nor
    should there be.

I continue to believe that for read() there should not be an EOVERFLOW
error.  There is absolutely no reason for read() to fail when it could
instead successfully return SSIZE_MAX bytes.  Perhaps we should add a
statement:

    If nbyte is great than SSIZE_MAX, read() shall
    behave as if nbyte had the value SSIZE_MAX.

For fread(), the return type is size_t not ssize_t, so it doesn't
have quite the same problem. The question is what should happen if
the mathematical product of the size and nitems arguments is greater
than SIZE_MAX.  POSIX defers to the C standard on this and there is no
reason for us to state anything specific about it.  (The C standard
is silent on the matter, so the behaviour is implicitly undefined.)

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



RE: [1003.1(2016/18)/Issue7+TC2 0001406]: clarification of SEEK_END when current pointer doesn't match buffer size

2020-09-28 Thread shwaresyst via austin-group-l at The Open Group

As I read it, file size and *seek(SEEK_END, 0) will still be 16, reflecting how 
many bytes were written to the buffer and which had to be malloc'd. The rewind 
overwrites the first bytes and a flush, close reflects the size of data 
considered to be valid after the rewind, since there is no guarantee such a 
write maintained alignment with whatever data was written to expand it to the 
16 bytes. Maybe it was 2 doubles, for example, and the rewrite trashes the 
first half of the original value. It is the application's responsibility to do 
a SEEK_END after such a rewrite if it knows it is simply modifying a same size 
and type value, so flush and close include the rest of the data area.
On Monday, September 28, 2020 Austin Group Bug Tracker via austin-group-l at 
The Open Group  wrote:

The following issue has been SUBMITTED. 
== 
https://www.austingroupbugs.net/view.php?id=1406 
== 
Reported By:                djdelorie
Assigned To:                
== 
Project:                    1003.1(2016/18)/Issue7+TC2
Issue ID:                  1406
Category:                  Base Definitions and Headers
Type:                      Clarification Requested
Severity:                  Editorial
Priority:                  normal
Status:                    New
Name:                      DJ Delorie 
Organization:              Red Hat Inc 
User Reference:              
Section:                    open_memstream 
Page Number:              
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html 
Line Number:                n/a 
Interp Status:              --- 
Final Accepted Text:        
== 
Date Submitted:            2020-09-28 21:26 UTC
Last Modified:              2020-09-28 21:26 UTC
== 
Summary:                    clarification of SEEK_END when current pointer
doesn't match buffer size
Description: 
Consider a stream created by open_memstream(), where 16 bytes are written,
fseek(0,SEEK_POS) to rewind, then write 4 bytes, and fflush().  At this
point, the value pointed to by the sizep argument to open_memstream()
should be 4 (please confirm).
At this point in the state of the stream, what are the semantics of
SEEK_END?  What will be the "file size" if you fclose() at this point?
The example explicitly SEEK_SETs to the buffer size before fclose(),
eliding the issue.
Desired Action: 
Please clarify if SEEK_END is relative to the current position or the
current buffer length, and if it's changed by a call to fflush() at that
time.
Please clarify if a SEEK_SET to set the current pointer less than the
current buffer size, itself (without read/write), changes the SEEK_END
semantics, or the value stored in *sizep after fflush().

== 

Issue History 
Date Modified    Username      Field                    Change              
== 
2020-09-28 21:26 djdelorie      New Issue                                    
2020-09-28 21:26 djdelorie      Name                      => DJ Delorie      
2020-09-28 21:26 djdelorie      Organization              => Red Hat Inc    
2020-09-28 21:26 djdelorie      Section                  => open_memstream  
2020-09-28 21:26 djdelorie      Page Number              =>
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html
2020-09-28 21:26 djdelorie      Line Number              => n/a            
==




Re: Proposal to update reference to POSIX in the ISO C++ standard

2020-09-28 Thread shwaresyst via austin-group-l at The Open Group

It's my understanding ISO/IEC was to bump their distribution also, to keep in 
synch. Nick S. would be more conversant with the details of thay, though.
On Monday, September 28, 2020 Jonathan Wakely  wrote:
On 28/09/20 14:36 +, shwaresyst wrote:
>
>The 2018 edition is the latest ISO/IEC/IEEE version, in that it was balloted 
>and approved to keep the IEEE "current standard" clock from timing out. The 
>2008 edition plus TCs is now the prior version, in the formal sense.

Is that not in the ISO store?

I don't see an update to https://www.iso.org/standard/50516.html
except for the corrigenda.



Re: Proposal to update reference to POSIX in the ISO C++ standard

2020-09-28 Thread shwaresyst via austin-group-l at The Open Group

The 2018 edition is the latest ISO/IEC/IEEE version, in that it was balloted 
and approved to keep the IEEE "current standard" clock from timing out. The 
2008 edition plus TCs is now the prior version, in the formal sense.
On Thursday, September 24, 2020 Jonathan Wakely via austin-group-l at The Open 
Group  wrote:
On 24/09/20 08:23 -0700, Nick Stoughton wrote:
>ISO/IEC 9945:2009 including Corrigenda 1 (2013) and Corrigenda 2
>(2017) is the current latest approved ISO standard. The Austin Group
>is in the process of revising this, with a publication date in 2022
>expected. You state "Since the TCs are just lists of changes, not a
>complete document, ..." which is technically true for ballot purposes,
>but The Open Group and IEEE publish a fully amended version, and this
>is what most people see when they try to obtain a copy of the latest.

Yes, I use the 2018 version from the Open Group for my own purposes,
but as C++ is an ISO/IEC standard I believe we're supposed to refer to
the ISO/IEC/IEEE version of POSIX, which means 9945:2008 rather than
the fully amended documents available elsewhere.

But we could add the two TCs to the references as well. The C++14
standard referred to C that way:

— ISO/IEC 9899:1999/Cor.1:2001(E), Programming languages — C, Technical 
Corrigendum 1
— ISO/IEC 9899:1999/Cor.2:2004(E), Programming languages — C, Technical 
Corrigendum 2
— ISO/IEC 9899:1999/Cor.3:2007(E), Programming languages — C, Technical 
Corrigendum 3

So I'll propose changing the current reference to:

ISO/IEC/IEEE 9945:2009, Information Technology — Portable Operating System 
Interface (POSIX)
ISO/IEC/IEEE 9945:2009/Cor 1:2013, Information Technology — Portable Operating 
System Interface (POSIX), Technical Corrigendum 1
ISO/IEC/IEEE 9945:2009/Cor 2:2017, Information Technology — Portable Operating 
System Interface (POSIX), Technical Corrigendum 2

Thanks!


>-- 
>Nick
>
>On Thu, Sep 24, 2020 at 7:42 AM Jonathan Wakely via austin-group-l at
>The Open Group  wrote:
>>
>> On 24/09/20 15:28 +0100, Jonathan Wakely via austin-group-l at The Open 
>> Group wrote:
>> >Hello,
>> >
>> >I am writing a proposal for the ISO C++ standard committee (WG21) to
>> >update the reference to the POSIX standard in the C++ International
>> >Standard. My colleague Eric Blake suggested I ask on this list whether
>> >anybody here has any comments on the proposal.
>> >
>> >The draft is at https://kayari.org/tmp/posix.html
>> >
>> >The abstract is:
>> >
>> >  The C++ standard has a normative reference to ISO/IEC 9945:2003 (aka
>> >  POSIX.1-2001 aka The Single UNIX Specification, version 3). However,
>> >  the C++ standard library refers to POSIX functions and macros which
>> >  are not defined in that document, as they weren't added until ÂÂ
>> >  ISO/IEC/IEEE 9945:2009 (aka POSIX.1-2008 aka SUSv4). The C++
>> >  standard should update its reference.
>>
>> Ugh, sorry for the borked indentation.
>>
>> >If you see any errors or incorrect claims from an Austin Group
>> >perspective, I'd be very grateful for your feedback.
>> >
>> >Thanks in advance to anybody who makes time to read through it,
>> >Jonathan
>> >
>>
>



Re: behaviour of pthread_attr_[sg]etguardsize with thread maintained stack

2020-09-22 Thread shwaresyst via austin-group-l at The Open Group

It will not be used by the implementation in managing the thread, and a 
guardsize value might not even be stored in the thread_t data if setstack() has 
been called as there is no pthread_getguardsize() interface; it is just stored 
in the attribute then for possible, not required, use by the application.
On Tuesday, September 22, 2020 Robert Elz  wrote:
    Date:        Tue, 22 Sep 2020 14:38:07 + (UTC)
    From:        shwaresyst 
    Message-ID:  <32911555.5186984.1600785487...@mail.yahoo.com>

  | Yes, it is no longer a factor,

I would have guessed that is what "not used" means, but:

  | and no, it will return what last setting was, be it from init()
  | or a setguardsize()

How is that "not used" ?

  | A set only affects that one attr object, not all of them,

Not the issue.

kre




Re: behaviour of pthread_attr_[sg]etguardsize with thread maintained stack

2020-09-22 Thread shwaresyst via austin-group-l at The Open Group

Does that include calculating the amount of available stack space,
and or the return value of a later getguardsize() ?

Yes, it is no longer a factor, but may be a value the application code uses to 
simulate what the implementation does with memory it manages; and no, it will 
return what last setting was, be it from init() or a setguardsize() call. A set 
only affects that one attr object, not all of them, or any thread the attr was 
used to initialize. The standard expects all relevant attr values to be copied 
into the thread_t or sigev structure being initialized, not store only a 
pointer to the attr object.
On Tuesday, September 22, 2020 Robert Elz  wrote:
    Date:        Tue, 22 Sep 2020 11:05:05 + (UTC)
    From:        "shwaresyst via austin-group-l at The Open Group" 

    Message-ID:  <1248402378.5117076.1600772705...@mail.yahoo.com>


  | Once pthread_attr_init() successfully completes the guardsize should be
  | set to the default value and may be examined by pthread_attr_getguardsize(),
  | that I see.

Fine.  Not the issue.

  | A call to setguardsize() should store the value and be returned
  | by subsequent getguardsize() calls,

Fine, still not the issue.  Again, except as  workaround to what might
be a NetBSD bug (or might just be unspecified behaviour), nothing is calling
setguardsize();

  | even though it is not used after pthread_attr_setstack() is called.

That is closer to the issue.  What does "not used" mean here?

Does that include calculating the amount of available stack space,
and or the return value of a later getguardsize() ?

  | Once setstack() is called the standard provides only
  | pthead_attr_destroy() followed by an init() as the portable means of
  | reenabling the use of the default guardsize.

Not the issue.

  | It is left unspecified, not even directly mentioned, that an implementation
  | may provide a special stackaddr value for use with setstack() that says
  | next time allocate an arbitrary stack area that does take the current
  | guardsize, and stacksize if that was set, into account

I don't much like "not even directly mentioned" - though that scenario
is perhaps so far outside what might be expected of an implementation that
it is a reasonable thing to have omitted.

But that's not the issue.

  | It is left implied by a getstack() before any setstack() being unspecified
  | behavior as to result;

Also not the issue, but again "left implied" is not nice.

kre



RE: behaviour of pthread_attr_[sg]etguardsize with thread maintained stack

2020-09-22 Thread shwaresyst via austin-group-l at The Open Group

Once pthread_attr_init() successfully completes the guardsize should be set to 
the default value and may be examined by pthread_attr_getguardsize(), that I 
see. A call to setguardsize() should store the value and be returned by 
subsequent getguardsize() calls, even though it is not used after 
pthread_attr_setstack() is called. Once setstack() is called the standard 
provides only pthead_attr_destroy() followed by an init() as the portable means 
of reenabling the use of the default guardsize.

It is left unspecified, not even directly mentioned, that an implementation may 
provide a special stackaddr value for use with setstack() that says next time 
allocate an arbitrary stack area that does take the current guardsize, and 
stacksize if that was set, into account without needing to call destroy(). It 
is left implied by a getstack() before any setstack() being unspecified 
behavior as to result; an implementation using such a value would be expected 
to set stackaddr to it during an init() call as the default value, and which 
getstack() would then succeed in returning.
On Tuesday, September 22, 2020 Robert Elz via austin-group-l at The Open Group 
 wrote:
Note this is forwarding a NetBSD query ... I claim no knowledge
about any of this ...  but I can relay any replies (and I have included
Thomas Klausner in the Reply-To so he doesn't need to wait for me, and/or
so you can ask him for more details if needed ... I'm not sure if this list
allows contributions from non-subscribers though).

In XSH/pthread_attr_getguardsize it is stated:

    If the stackaddr attribute has been set (that is, the caller is
    allocating and managing its own thread stacks), the guardsize attribute
    shall be ignored and no protection shall be provided by the
    implementation. It is the responsibility of the application to
    manage stack overflow along with stack allocation and management
    in this case.

In the 202x Draft 1 version that is on page 1494, lines 49730-3 but this
hasn't changed from the current published std, in TC2 it is page 1568
lines 51425-8.

The question (I think) is when an application uses a user-provided stack,
should the guard size (default, or that set by pthread_attr_setguardsize(),
get used by the implementation for anything at all, including when
pthread_attr_getguardsize() is called.

In case I don't have the scenario quite right, you can see the original
(currently quite brief) discussion at:
    https://mail-index.netbsd.org/current-users/2020/09/21/msg039578.html

At the very least the standard doesn't appear to say anything about what
should be returned by pthread_attr_getguardsize() when an application has
set the stackaddr attribute.

Does anyone know what is intended to happen here?

kre



RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread shwaresyst via austin-group-l at The Open Group

No, it does not need to be aligned to a multiple of 4, except on some lame RISC 
architectures. The logical model is unaligned accesses are always permitted; 
aligned accesses are the exception, not the rule. This is why the language is 
padding bytes may be added, not shall be added. The standard expects 
applications to use int_fastN_t or int_leastN_t types if it wants to take 
advantage of platform specific alignment optimizations. The allocation 
functions only recently added the only alignment requirement, namely any 
pointer returned be aligned for an access to an intmax_t value, and the region 
be minimally sizeof(intmax_t) in length.
On Wednesday, September 2, 2020 Wojtek Lerch  wrote:
#yiv9121566835 #yiv9121566835 -- _filtered {} _filtered {} _filtered 
{}#yiv9121566835 #yiv9121566835 p.yiv9121566835MsoNormal, #yiv9121566835 
li.yiv9121566835MsoNormal, #yiv9121566835 div.yiv9121566835MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 a:link, #yiv9121566835 span.yiv9121566835MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv9121566835 
p.yiv9121566835MsoPlainText, #yiv9121566835 li.yiv9121566835MsoPlainText, 
#yiv9121566835 div.yiv9121566835MsoPlainText 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonormal, #yiv9121566835 li.yiv9121566835msonormal, 
#yiv9121566835 div.yiv9121566835msonormal 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonospacing1, #yiv9121566835 li.yiv9121566835msonospacing1, 
#yiv9121566835 div.yiv9121566835msonospacing1 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonormal4, #yiv9121566835 li.yiv9121566835msonormal4, 
#yiv9121566835 div.yiv9121566835msonormal4 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonormal31, #yiv9121566835 li.yiv9121566835msonormal31, 
#yiv9121566835 div.yiv9121566835msonormal31 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 span.yiv9121566835EmailStyle36 {font-family:New 
serif;color:windowtext;}#yiv9121566835 span.yiv9121566835PlainTextChar 
{font-family:sans-serif;}#yiv9121566835 .yiv9121566835MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv9121566835 div.yiv9121566835WordSection1 
{}#yiv9121566835 
Yes I made the flexible member a "short" on purpose -- I wanted that byte of 
padding before the flexible array.
 
  
 
No, the sizeof can't be 5 or 6 unless the implementation is okay with unaligned 
access.  If I declare an array of these structs, the int32 inside each element 
needs to be aligned to a multiple of 4 -- therefore the size of the struct must 
be a multiple of 4 as well.  The same applies to a struct without a flexible 
member.
 
  
 
No, the requirements on sizeof have nothing to do with how many flex members 
are "present".  All that is required is that the sizeof is either the same as 
it would be for a struct without the flexible member (which is still 8, on any 
implementation that requires alignment), or greater, if the struct requires 
more padding (presumably also for alignment).  Apart from that, the C standard 
says nothing about whether there's enough room between the offsetof and the 
sizeof for one or more elements of the flexible array.
 
  
 
What you described with malloc() has nothing to do with what the C standard 
refers to as “padding”.
 
  
 
Also, while I understand the need to page-align data structures in some 
situations, I still don’t see its relevance to a discussion of the C standard’s 
requirements regarding padding in struct types and how it’s affected by 
flexible arrays.
 
  
 
From: shwaresyst  
Sent: September 2, 2020 1:58 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 
That example still has a byte of added padding, or the offsetof would be 5. The 
sizeof value is just incorrect, as it assumes one flex member is present. It 
should be 5 or 6, and which is the required value is what is ambiguous.
 
As you say, these are used most often with malloc(). Padding after the array is 
usually an artifact of this operation. You do a malloc(12) and you may get 16 
or 32 bytes actually allocated. Mapping this as a short s[] an application can 
safely access s[5], but a compiler may not block an access to s[7] too, in that 
the memory for it is allocated. You map a long long l[] and you can only access 
l[0] safely, the remaining 4 bytes out of the 12 plus what malloc adds are tail 
padding, but a compiler may allow an l[1] access because the total allocated 
permits it.
 
I mentioned page aligned because when you are buffering multiple sectors 
directly from media the malloc()s for these will usually be in multiples of 
pages, and efficient managem

RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread shwaresyst via austin-group-l at The Open Group

That example still has a byte of added padding, or the offsetof would be 5. The 
sizeof value is just incorrect, as it assumes one flex member is present. It 
should be 5 or 6, and which is the required value is what is ambiguous.


As you say, these are used most often with malloc(). Padding after the array is 
usually an artifact of this operation. You do a malloc(12) and you may get 16 
or 32 bytes actually allocated. Mapping this as a short s[] an application can 
safely access s[5], but a compiler may not block an access to s[7] too, in that 
the memory for it is allocated. You map a long long l[] and you can only access 
l[0] safely, the remaining 4 bytes out of the 12 plus what malloc adds are tail 
padding, but a compiler may allow an l[1] access because the total allocated 
permits it.

I mentioned page aligned because when you are buffering multiple sectors 
directly from media the malloc()s for these will usually be in multiples of 
pages, and efficient management of these happens when these don't straddle 
pages so are page aligned too. Such isn't required by the standard, but it's 
common enough as desirable aligned_alloc() was added. As I've seen no one use 
FLA as an acronym for flexible array, I consider VLA as applying to any array 
of indeterminate size, sorry if this confuses anyone.
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
#yiv4376059201 #yiv4376059201 -- _filtered {} _filtered {} _filtered 
{}#yiv4376059201 #yiv4376059201 p.yiv4376059201MsoNormal, #yiv4376059201 
li.yiv4376059201MsoNormal, #yiv4376059201 div.yiv4376059201MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 a:link, #yiv4376059201 span.yiv4376059201MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv4376059201 
p.yiv4376059201msonospacing, #yiv4376059201 li.yiv4376059201msonospacing, 
#yiv4376059201 div.yiv4376059201msonospacing 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 p.yiv4376059201msonormal, #yiv4376059201 li.yiv4376059201msonormal, 
#yiv4376059201 div.yiv4376059201msonormal 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 p.yiv4376059201msonormal3, #yiv4376059201 li.yiv4376059201msonormal3, 
#yiv4376059201 div.yiv4376059201msonormal3 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 span.yiv4376059201EmailStyle33 {font-family:New 
serif;color:windowtext;}#yiv4376059201 .yiv4376059201MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv4376059201 div.yiv4376059201WordSection1 
{}#yiv4376059201 
My understanding is that they meant to allow an implementation where  “struct a 
{ int32_t x; char y; short flex[]; }”  produces  sizeof(struct a)==8  but  
offsetof(struct a,flex)==6.
 
  
 
I don’t like that they talk about padding “after” the flexible member – since 
the flexible array has a flexible size, rather than a zero size, that padding 
really overlaps the beginning of the array.
 
  
 
Personally I think that the standard could be made clearer if a structure with 
a flexible member were considered an incomplete type.  You wouldn’t be allowed 
to applysizeof to it at all, and you wouldn’t be able to declare objects whose 
type is the structure, but you could still use pointers to it and dereference 
members – since the main purpose of such structures is to allocate them via 
malloc(), I don’t think anybody would mind those restrictions.
 
  
 
Also, I don’t understand whystruct s would need to be page aligned or why you 
mention a VLA.  A flexible array is not a VLA, in the sense C uses the term.
 
  
 
From: shwaresyst  
Sent: September 1, 2020 4:55 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 
What that refers to, it looks, is any tail padding for the structure as a 
whole. The standard still permits internal padding between individual fields as 
required, e.g. a struct s { short a; double b[] } might need 6 bytes of this 
padding to align access for b[0]. This would still be needed if b[] only has a 
few members as a VLA but s is being page aligned, and so would reserve a lot of 
tail padding too. There would be 2 padding regions, however, is what that 
change forces.
 
  
 
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
 
Actually the intent was the opposite.  The original C99 did contain a wording 
that matches your interpretation:
 
 
 
… the size of the structureshall be equal to the offset of the last element of 
an otherwise identical structure that replaces the flexible array member with 
an array of unspecified length.
 
 
 
But this was reported as a defect, and corrected in TC2.
 
 
 
Summary
 6.7.2.1 Structure and union specifiers, paragraphs 15 and 16 require that any 
padding for alignment of a structure containing a flexible array member must 
preceed the flexible array member.  This contradicts existing

Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

No, that is not what I would want nor would anyone else. NAME_MAX doesn't 
guarantee no d_name will ever be longer than this value, what it says is all 
drivers for file systems provided by the implementation are capable of 
processing names up to that length. Some provided may support much longer names 
too, the standard leaves open. Because of this latter possibility no compile 
time constant guarantees EINVAL won't occur, that is suitable for use in a 
macro. Something that examines the media at runtime is required, which a macro 
might be an alias for, as a wrapper, but something still needs to be 
implemented to be wrapped.
On Tuesday, September 1, 2020 Steffen Nurpmeso  wrote:
shwaresyst wrote in
 <1739483391.1543785.1598977118...@mail.yahoo.com>:
 |No, it couldn't introduce such a macro, because such would have to \
 |assume all d_name entries are the same length. Adding an option to \

Well it has to go for NAME_MAX + the_size_of_posix_dent for each
and every entry, this is what you want here?  Except for what
Philip Guenther said, of course.  But if it would be left
implementation defined then even that could be covered by the
macro, better than by anything else.

I for one feel you are very brave to apply sizeof() to anything
with a "flexible array member", i would not dare that for portable
code.  (But my code has to work with ISO C89 too, so i have to use
macros to switch between [a-number] and [] as applicable, and also
to SIZEOF these types.)

Really, you are very brave!  Just the bugs i had to work around
since 2018 or what for a really tiny set of primitive tools!
(Like some gregarious animal not inlining for -Os, and another
huge one requiring explicit this-> to find superclass fields in
one class, but not the other.)

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

What that refers to, it looks, is any tail padding for the structure as a 
whole. The standard still permits internal padding between individual fields as 
required, e.g. a struct s { short a; double b[] } might need 6 bytes of this 
padding to align access for b[0]. This would still be needed if b[] only has a 
few members as a VLA but s is being page aligned, and so would reserve a lot of 
tail padding too. There would be 2 padding regions, however, is what that 
change forces.
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
#yiv7361582445 #yiv7361582445 -- _filtered {} _filtered {} _filtered 
{}#yiv7361582445 #yiv7361582445 p.yiv7361582445MsoNormal, #yiv7361582445 
li.yiv7361582445MsoNormal, #yiv7361582445 div.yiv7361582445MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv7361582445
 a:link, #yiv7361582445 span.yiv7361582445MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv7361582445 
p.yiv7361582445MsoNoSpacing, #yiv7361582445 li.yiv7361582445MsoNoSpacing, 
#yiv7361582445 div.yiv7361582445MsoNoSpacing 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv7361582445
 p.yiv7361582445msonormal, #yiv7361582445 li.yiv7361582445msonormal, 
#yiv7361582445 div.yiv7361582445msonormal 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv7361582445
 span.yiv7361582445EmailStyle27 {font-family:New 
serif;color:windowtext;}#yiv7361582445 .yiv7361582445MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv7361582445 div.yiv7361582445WordSection1 
{}#yiv7361582445 
Actually the intent was the opposite.  The original C99 did contain a wording 
that matches your interpretation:
 
  
 
… the size of the structureshall be equal to the offset of the last element of 
an otherwise identical structure that replaces the flexible array member with 
an array of unspecified length.
 
  
 
But this was reported as a defect, and corrected in TC2.
 
  
 
Summary
 6.7.2.1 Structure and union specifiers, paragraphs 15 and 16 require that any 
padding for alignment of a structure containing a flexible array member must 
preceed the flexible array member.  This contradicts existing implementations.  
We do not believe this was the intent of the C99 specification.
 
Details
 
If a struct contains a flexible array member and also requires padding for 
alignment, then the current C99 specification requires the implementation to 
put this paddingbefore the flexible array member.  However, existing 
implementations, including at least GNU C, Compaq C, and Sun C, put the 
paddingafter the flexible array member.
 
The layout used by existing implementations can be more efficient. Furthermore, 
requiring these existing implementations to change their layout would break 
binary backwards compatibility with previous versions.
 
  
 
See DR282 for more 
details:http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_282.htm
 
  
 
  
 
From: shwaresyst  
Sent: September 1, 2020 2:27 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 
I agree some additional clarity might be useful there, in the C standard. I'm 
reading it as the intent being sizeof is equivalent to offsetof the VLA in 
accordance with the restrictions placed on it by use of the . or -> operators, 
which may not need extra bytes (so >vla == ( + sizeof(s)) is a truism, in 
other words) but it is not that specific.
 
  
 
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
 
That sounds a little backwards – it’severything else that works as if the 
flexible (not “variable”) member were not present.  The sizeof operator, as an 
exception, can return a greater value.  (The “.” and “->” operators are another 
exception.)
 
 
 
The standard does not sayhow much greater the value may be, or promise that it 
must be greater, even if padding is necessary to align the flexible member – as 
far as I can tell, sizeof(structure) can beless than offsetof(structure, 
flexible).
 
 
 
From: austin-group-l@opengroup.org 
Sent: September 1, 2020 10:52 AM
To: g...@opengroup.org; austin-group-l@opengroup.org
Subject: Re: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
 
 
It's my understanding, by C11 6.7.2.1p18, sizeof on a struct with a variable 
array works as if the variable member was not present, but does count any bytes 
added for alignment padding, as this will be a fixed amount for each use of the 
struct. It is up to the application, like with variable argument lists, to 
establish a protocol that allows it to determine the effective size of the 
final member.
 
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is p

RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

I agree some additional clarity might be useful there, in the C standard. I'm 
reading it as the intent being sizeof is equivalent to offsetof the VLA in 
accordance with the restrictions placed on it by use of the . or -> operators, 
which may not need extra bytes (so >vla == ( + sizeof(s)) is a truism, in 
other words) but it is not that specific.
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
#yiv0502119094 #yiv0502119094 -- _filtered {} _filtered {}#yiv0502119094 
#yiv0502119094 p.yiv0502119094MsoNormal, #yiv0502119094 
li.yiv0502119094MsoNormal, #yiv0502119094 div.yiv0502119094MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv0502119094
 span.yiv0502119094EmailStyle20 {font-family:New 
serif;color:windowtext;}#yiv0502119094 .yiv0502119094MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv0502119094 div.yiv0502119094WordSection1 
{}#yiv0502119094 
That sounds a little backwards – it’severything else that works as if the 
flexible (not “variable”) member were not present.  The sizeof operator, as an 
exception, can return a greater value.  (The “.” and “->” operators are another 
exception.)
 
  
 
The standard does not sayhow much greater the value may be, or promise that it 
must be greater, even if padding is necessary to align the flexible member – as 
far as I can tell, sizeof(structure) can beless than offsetof(structure, 
flexible).
 

 
  
 
From: austin-group-l@opengroup.org 
Sent: September 1, 2020 10:52 AM
To: g...@opengroup.org; austin-group-l@opengroup.org
Subject: Re: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 

It's my understanding, by C11 6.7.2.1p18, sizeof on a struct with a variable 
array works as if the variable member was not present, but does count any bytes 
added for alignment padding, as this will be a fixed amount for each use of the 
struct. It is up to the application, like with variable argument lists, to 
establish a protocol that allows it to determine the effective size of the 
final member. This transmission (including any attachments) may contain 
confidential information, privileged material (including material protected by 
the solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

No, it couldn't introduce such a macro, because such would have to assume all 
d_name entries are the same length. Adding an option to the interface to do a 
count, as a vararg parameter, and directly malloc the necessary space, returned 
via my suggested change to buf as a **, is plausible. Since we are merging 
common behaviors with this interface introduction, not describing a single 
reference implementation, such changes are permitted if someone commits to 
doing an implementation, afaik.
On Tuesday, September 1, 2020 Steffen Nurpmeso via austin-group-l at The Open 
Group  wrote:
Geoff Clare via austin-group-l at The Open Group wrote in
 <20200901143300.GB24606@localhost>:
 |> -- 
 |>  (0004953) philip-guenther (reporter) - 2020-08-28 22:52
 |>  https://www.austingroupbugs.net/view.php?id=697#c4953 
 |> -- 
 |> I think the unspecified nature of the d_name member in the new posix_dent
 |> makes writing portable software more difficult while providing only \
 |> minimal
 |> benefit to programs that don't care.  I would support requiring it \
 |> to be a
 |> flexible array member and thus eliminating the error of declaring \
 |> an array
 |> and trying to walk it via indexing instead of by advancing a char pointer
 |> by d_reclen.
 |
 |I think we should keep the requirements for d_name the same between
 |struct dirent and struct posix_dent.  Some implementations of
 |getdents() and getdirentries() use struct dirent and they should be
 |able to make posix_getdents() a synonym (or a light wrapper) for the
 |existing function by making struct posix_dent be identical to struct
 |dirent.  We can't require d_name in struct dirent to be a VLA since
 |there are implementations where it is not.

The standard could also introduce a macro which could be used to
space a buffer accordingly, something like (very ugly)
POSIX_GETDENTS_BYTES_FOR_DENTS(number-of-desired-dents), and use
it in the example.
Like that any possible errors with buffer space allocation would
not even be introduced (except for possible integer overflows,
maybe).

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: Pseudoterminal terminology in POSIX

2020-08-05 Thread shwaresyst via austin-group-l at The Open Group

The slave side is ancillary to the master, sorry, as physical terminals are 
ancillary to the processor hardware, imo. Inverting the relationship makes it 
look like it is the intent of the slave side to source the majority of the 
data, when more often it is only monitoring output data sourced by the master, 
or producer/processing, side with relatively infrequent input required to be 
sourced by the monitoring side. For a full duplex connection, the producer side 
is doing echoes of everything the monitoring side sources along with what it 
sources unilaterally, so it is primary user of the connection.
On Wednesday, August 5, 2020 Geoff Clare via austin-group-l at The Open Group 
 wrote:
Steffen Nurpmeso wrote, on 05 Aug 2020:
>
> Michael Kerrisk via austin-group-l at The Open Group wrote in
>  :
>  |Elliot Hughes and I both noticed a point from "Minutes of the 3rd August \
>  |2020
>  |Teleconference":
>  ..
>  |On Tue, Aug 4, 2020 at 5:52 PM Andrew Josey  wrote:
>  ...
>  |> * General news
>  |>
>  |> We discussed terminology usage, in particuler terms such as
>  |> master/slave, blacklist/whitelist.  It was agreed some terminology
>  |> for pseudo-terminals could be better described using more functionally
>  |> descriptive terms, but the details of this are left to a future bug
>  |> report.  Andrew and Geoff took an action to investigate further
>  |> and come back with an analysis.
>  ...
>  |The essence of the idea is simple. Let's not invent completely new
>  |terms, but rather rework existing (familiar) terminology a little, as
>  |follows:
>  |
>  |    pseudoterminal (device) ==> "pseudoterminal device pair"

I'm okay with that, but ...

>  |
>  |  slave ==> "terminal device"

many other things are also terminal devices, so this doesn't work unless ...

>  |          (or "terminal end of the pseudoterminal device pair")

you use this cumbersome phrasing every time you refer to it.

>  |
>  |    master ==> "pseudoterminal device"
>  |          (or "pseudoterminal end of the pseudoterminal device pair")

This makes no sense to me.  Given the phrase "pseudoterminal device pair",
I would naturally expect "pseudoterminal device" could be used to refer
to either of the individual devices in the pair, rather than one and not
the other.

> How about ancillary or accessory terminal device for the slave.

I think ancillary would actually be more applicable to the master.

> 
>  |The resulting language (as it appears in the proposed changes for the
>  |Linux manual pages) is reasonably clear, albeit a little clunky in
>  |places (wordings like "the (pseudo)terminal end of the pseudoterminal
>  |device pair" are clear, but a little verbose).
> 
> Yes.  It is terrible and absolutely unclear (to me).  And
> presumely i would become dazed if i would read an entire manual
> with the above terms.

I agree, it's too cumbersome.

My own thoughts up to now had been that, since the slave side is the
side that is intended to be used as a terminal in the normal way, the
slave should be called the "primary" device.  I hadn't come up with a
word for the master side, but Steffen's suggestion of "ancillary" works
quite well (I just saw a dictionary definition that said "providing
necessary support to the primary ...").

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: ${unset_var=~:~user} (was: A question about interpretation)

2020-07-31 Thread shwaresyst

Robert has a point that assignments, as represented by the ASSIGNMENT_WORD 
token in the grammar, should be represented by a production of the form:
ASSIGNMENT_WORD :: name EQ WORD

For 2.6.1 that is the assignment form referenced where word expansion, of the 
WORD in that production, is relevant to supporting multiple '~' expansions, and 
even then it could be limited to assigning PATH or NLSPATH as name. For 2.6.2 
imo these should be characterized as substitutions or setting of values, with 
the same net effect as assignments, not that those forms actually do 
assignments, if multiple expansions are not desired.
On Friday, July 31, 2020 Robert Elz  wrote:
    Date:        Fri, 31 Jul 2020 10:21:34 +0100
    From:        Geoff Clare 
    Message-ID:  <20200731092134.GA3453@localhost>

  | I think the intent here is clear, that 2.6.1's use of "In an assignment"
  | is not supposed to apply to this case - hence the reference to 4.23 to 
  | explain what it intends by "assignment".

This is exactly why I asked the original question in the way I did.

Nick made it quite clear (and I agree it is the correct way) that
the xref is only informative, hence not part of the standard, and so
cannot alter the meaning of "assignment" that way.

  | I'm not sure I buy your argument that the use of "assignment" instead
  | of "variable assignment" in 2.6.1 means it is possible to interpret
  | the text the other way, since the phrasing in 2.6.2 (which you
  | quoted earlier, but I'll repeat it here):
  |
  |    ${parameter:=[word]} Assign Default Values. If parameter is unset
  |        or null, quote removal shall be performed on the expansion
  |        of word and the result (or an empty string if word is omitted)
  |        shall be assigned to parameter. [...]
  |
  | is such that the expansion of "word" and the assignment of the result
  | of that expansion to the parameter are two separate operations - so it
  | doesn't really satisfy the "In" in 2.6.1.

This actually opens a whole other can of worms...

But to start with just this, once we're "in an assigment" (which is what
we are when we're expanding a "assign default values" parameter expansion,
we're in it, and that we're not get in the process of actually making the
assignment happen yet is irrelevant.

Note that we have to be "in" the "assign default values" expansion, and
have determined that the parameter is unset (or null when allowed) before
we even start expanding "word" - and by that time we know we're doing
an assignment to parameter, we're "in an assignment".  (This is required
as the different operators that can be used expand the word based upon
different criteria - we have to know which one we're in before we can
correctly determine whether or not the word should be expanded at all).

But then we get to how var assignment's are (or perhaps, are not) specified
to work...

>From XCU 2.9.1.1 ...

    4. Each variable assignment shall be expanded for tilde expansion,
      parameter expansion, command substitution, arithmetic expansion,
      and quote removal prior to assigning the value.

That's simple enough, one would think.

A variable assignment is a "word", that word is expanded as specified.

So now we look at XCU 2.6.1 again

    A ``tilde-prefix'' consists of an unquoted  character
    at the beginning of a word, [...]
    In an assignment (see XBD Section 4.23), multiple tilde-prefixes
    can be used: one at the beginning of the word (that is, following
    the  of the assignment), [...]

How is "following the equals-sign of the asssignment" possibly the
"beginning of the word", the word is var=~foo and the beginning of the
word is the 'v'.  There is no '~' there.  Somehow this is magically
redefining what the word is that is being expanded, or some other magic
is assumed to apply.

I'd always assumed that somewhere variable-assignments were defined
to work with something like
    name=word
where the word would be subject to the various expansions that apply,
rather than the whole thing (the name= part can't contain any '$'
chars, no no param expansions can occur, nor command subs or arith,
no part of that can be quoted, hence quote removal is irrelevant
there, it can't contain any glob chars either, hence even if filename
expansion was specified to happen (it isn't), it wouldn't affect that
part (or it would, and that would mess it up, but who cares),
fortunately field splitting doesn't happen either, because that one
might mess things up if we were applying the expansions to the
whole (original) word.  But for ~ expansion the spec gets messy
as that one depends upon that "first character" (along with anything
following a ':' since this is an assignment).  This is just broken.

On the other hand, in an assignment parameter expansion, the word being
expanded is just ~foo which does begin with a '~' and since we're in
an assignment (though not a variable assignment as defined) a ':' if
present ends the tilde-prefix and a '~' after the 

Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-31 Thread shwaresyst

Yes, right-assoc does exist too, and the standard supports the DBCS varieties 
in charmaps as an election, even. There's also medials, which have left and 
right associativity, and a number of other types, depending on primary script 
family. I agree with your last point that such sequence conversions are 
plausible, it's just how has no portable specification currently, it is left as 
unspecified.
On Friday, July 31, 2020 Steffen Nurpmeso  wrote:
shwaresyst wrote in
 <1371185781.9853799.1596158030...@mail.yahoo.com>:
 |It is not "some sensible \u sequences" alone. First off, there's little \
 |agreement on what constitutes 'sensible'. Just the headache of the \
 |U300 diacritics adds to XBD6 significantly, if they're to be supported, \
 |as one example. The 'sensible' present solution is to not support them \
 |at all; others will argue the 'sensible' thing is to support them because \
 |Unocode does include these code points. The headache stems from it \
 |is not simply arbitrarily saying let's have the utility support these \
 |in $'', it's ensuring there are interfaces for the utilities to be \
 |written in that understand left-associative combining sequences, and \

i think right-associative does also exist.  I have long not worked
with this stuff.

 |these interfaces are portable because requirements in XBD add that support.

Please look at my former message.  It stands that \Uu is ISO
10646, and that does not represent characters but codepoints,
multiple of which may be necessary to represent one real
character, which then may be a valid character in the locale
encoding.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread shwaresyst

It is not "some sensible \u sequences" alone. First off, there's little 
agreement on what constitutes 'sensible'. Just the headache of the U300 
diacritics adds to XBD6 significantly, if they're to be supported, as one 
example. The 'sensible' present solution is to not support them at all; others 
will argue the 'sensible' thing is to support them because Unocode does include 
these code points. The headache stems from it is not simply arbitrarily saying 
let's have the utility support these in $'', it's ensuring there are interfaces 
for the utilities to be written in that understand left-associative combining 
sequences, and these interfaces are portable because requirements in XBD add 
that support.
On Thursday, July 30, 2020 Steffen Nurpmeso  wrote:
shwaresyst wrote in
 <1127836834.9524758.1596121054...@mail.yahoo.com>:
 |Yes, the additions necessary still for even limited Unicode support \
 |above the broken bandaids C11+ provide are one of those issues. Where \
 |Unicode is incompatible with POSIX, and is therefore (by design) broken \
 |too needs addressing also. The white papers detailing most of these \
 |changes have yet to be written, or published if some have been.

Hmm, the ISO C reference is of course true.  But then this is
about Unix/POSIX shells, and then adding some sensible \u
sequences and defining their conversion to locale charset can only
be an improvement, i think.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


Re: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread shwaresyst

Yes, the additions necessary still for even limited Unicode support above the 
broken bandaids C11+ provide are one of those issues. Where Unicode is 
incompatible with POSIX, and is therefore (by design) broken too needs 
addressing also. The white papers detailing most of these changes have yet to 
be written, or published if some have been.
On Thursday, July 30, 2020 Steffen Nurpmeso  wrote:
shwaresyst wrote in
 <311169368.9432836.1596108598...@mail.yahoo.com>:
 |On Thursday, July 30, 2020 Geoff Clare  wrote:
 |Robert Elz  wrote, on 29 Jul 2020:
 |>
 |> Speaking of which, what is the current holdup with resolving
 |> whichever bug it is (I hate searching in mantis, so I won't
 |> try here) which specifies $'...' ?  Perhaps whatever the
 |> problem was (before my time) with the specification of that
 |> is no longer a problem?
 |
 |It's bug 249. It was reopened in Oct 2015 and several notes were
 |added to the bug after that, starting with 
 |
 |https://austingroupbugs.net/view.php?id=249#c2893
 |
 |My guess is the conference calls postponed returning to it because
 |there was ongoing discussion, but by the time the discussion ended
 |it had "gone off the radar".
 ...
 |Also, as something new, its inclusion is part of a later draft of Issue \
 |8. Additional issues it depends on need to be addressed first, specified \
 |fully, and incorporated. This is more why it went on the back burner, \
 |that I recall. Various other bugs are in similar state; the prerequisites \
 |to finish speciifying them so they can be considered portable aren't \
 |done yet either.

The problem being that what is in the wild does not work out for
many languages.  The in-use shell quote pattern consisting of
small, isolated parts which depend on which kind of escaping and
expanding is necessary just does not work out for many languages.
Period.

I (the mailer i maintain, using POSIX-incompatible sh(1)ell-style
command line input) for example claim

  ? echo 'Quotes '${HOME}' and 'tokens" differ!"# no comment
  ? echo Quotes ${HOME} and tokens differ! # comment
  ? echo Don"'"t you worry$'\x21' The sun shines on us. $'\u263A'

The latter is what i mean.  There are many languages on this world
where these \u expansions do not work out that way, but where the
"entire sentence must be interpreted as a unity" in order to get
the iconv(3) conversation to nl_langinfo(CODESET) correctly, aka
the way it is _desired_.  Of course you can move it all to the
twilight zone of "undefined behaviour", but if you do not, then
quoting must extend to the largest possible extend, and
interpreted as a unity.

And for that it would be tremendous if $'' would be defined so
that it can be used as the sole quoting mechanism, and that would
then also include expansion of $VAR (i use \$VAR or \${VAR} in my
mailer).  But to know exactly how problematic splitting of quotes
is for many languages of the world, including right-to-left
direction and shift state changes etc., and changing of meaning as
such if the sentence cannot be interpreted as a unity, a real
expert had to be asked.  Anyhow, the Unicode effort mandates
processing of entire strings and denotes isolated treatment as
a complete error.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

RE: Status of $'...' addition (was: ksh93 job control behaviour)

2020-07-30 Thread shwaresyst

Also, as something new, its inclusion is part of a later draft of Issue 8. 
Additional issues it depends on need to be addressed first, specified fully, 
and incorporated. This is more why it went on the back burner, that I recall. 
Various other bugs are in similar state; the prerequisites to finish 
speciifying them so they can be considered portable aren't done yet either.
On Thursday, July 30, 2020 Geoff Clare  wrote:
Robert Elz  wrote, on 29 Jul 2020:
>
> Speaking of which, what is the current holdup with resolving
> whichever bug it is (I hate searching in mantis, so I won't
> try here) which specifies $'...' ?  Perhaps whatever the
> problem was (before my time) with the specification of that
> is no longer a problem?

It's bug 249. It was reopened in Oct 2015 and several notes were
added to the bug after that, starting with 

https://austingroupbugs.net/view.php?id=249#c2893

My guess is the conference calls postponed returning to it because
there was ongoing discussion, but by the time the discussion ended
it had "gone off the radar".

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



RE: Should POSIX/SUS advise against MSG_OOB in recognition of RFC 6093?

2020-07-25 Thread shwaresyst

That the semantics are unspecified already disqualifies them from use by 
portable applications. If it was implementation-defined then such an addition 
might be warranted, that I see.
On Saturday, July 25, 2020 Danny Niu  wrote:
RFC 6093 "On the Implementation of the TCP Urgent Mechanism"
surveys the then existing implementations of TCP "URG" flag
and use and recommends that new applications to not use it.

In POSIX, it is said that "Support for an out-of-band data 
transmission facility is protocol-specific"; and unlike
textbooks such as "Unix Network Programming" (vol 1 ch.24)
semantics of OOB IOs are unspecified in the standard.
These all diminishes the usefulness of using that flag
in portable applications. 

Should we, in recognizing these, recommend against the
use of MSG_OOB flag in new applications, by referencing 
RFC 6093?




Re: [1003.1(2016)/Issue7+TC2 0001345]: date(1) default format

2020-07-21 Thread shwaresyst

While %Z is part of strftime() in the c2x draft (strptime() is not present at 
all), it's specification is left as implementation-defined, therefore 
non-portable enough to be still considered unspecified, for the "C" locale. 
Additionally, the format %c represents for the "C", and by extension "POSIX", 
locale is unchanged from C99, and has not added %Z. This would apply to its 
usage both by strftime() and strptime(), to  maintain symmetry, I would think.
On Tuesday, July 21, 2020 J William Piggott  wrote:


On Sun, 12 Jul 2020, shwaresyst wrote:

Thank you for replying.

> The reason for the disconnect, that I see, is because the %Z modifier 
> references the TZ environment variable, not a value in a struct tm, adding it 
> to d_t_fmt would disqualify the definition from being __STD_C__ conforming. 
> This is one of the areas where the C standard can be considered broken, 
> leaving Time Zone handling unspecified. As a backwards compatibility matter 
> changing the tm struct to accommodate time zone info properly is not 
> indicated, by either standard. It breaks too much code that relies on the 
> current definition of the tm_year and tm_isdst fields. The people that can do 
> such a change is the C committee, not POSIX to keep deferring to it, is what 
> I see more as Geoff's point.

The time zone (%Z) isn't unspecified; it's implementation-defined,
because it comes from the runtime environment which the C standard
avoids. It is expected though, for example, localtime() and friends
require time zone and dst information.

The only intrastandard reason I can find for %Z to have been excluded
from d_t_fmt is that strptime() specified %c, but not %Z. Meaning it
would have failed.

If that was the only reason for the disconnect, then that roadblock has
been removed in the current draft, as strptime() now includes %Z.

> On Sunday, July 12, 2020 J William Piggott  wrote:
>
>
> On Mon, 6 Jul 2020, Geoff Clare wrote:
>
>> J William Piggott  wrote, on 05 Jul 2020:
>>>
>>>> --
>>>> (0004893) geoffclare (manager) - 2020-07-02 10:57
>>>> https://austingroupbugs.net/view.php?id=1345#c4893
>>>> --
>>>> The POSIX locale default date format:%a %b %e %H:%M:%S %Z
>>>> %Ycontains two pieces of information beyond the minimum of date and
>>>> time required for other locales: the day name (%a) and the time zone (%Z).
>>>> Most implementations include these in the default date output for their
>>>> implementation-provided locales.
>>>>
>>>> Since many users would expect to see day name and time zone information in
>>>> the default date output (particularly if they are used to the traditional
>>>> behaviour that was standardised for the POSIX locale), as a minimum we
>>>> should add something to APPLICATION USAGE about this.  We should also
>>>> consider recommending that implementations include them (either via
>>>> "should" in normative text or a statement in RATIONALE).  Here is a
>>>> suggested set of changes that does the latter...
>>>>
>>>> On page 2638 line 85823 section date, add a paragraph to APPLICATION
>>>> USAGE:Since the default date format for locales other than the
>>>> POSIX or C locale is not required to include anything beyond the date and
>>>> time, whereas for the POSIX or C locale it also includes the day name and
>>>> time zone, it may be necessary to specify a format (or override the
>>>> locale-selection environment variables) to ensure this information is
>>>> included when desired.
>>>> On page 2640 line 85914 section date, add these paragraphs to
>>>> RATIONALE:Although this standard only requires the default date
>>>> format, for locales other than the POSIX or C locale, to include the date
>>>> and time, it is common for implementations to include day name and time
>>>> zone information as well.  (For the POSIX locale this is required, with the
>>>> day name in %a format at the beginning and the time zone in %Z format
>>>> before the year.)  Implementations are encouraged to include the day name
>>>> (in %a or %A format) and the time zone (in %Z or %z format) in the default
>>>> date format for all of the locales they provide.
>>>
>>> If that rational were applied to the POSIX locale date and time format 
>>> string
>>> the objection would be resolved.
>>
>> There is no way we are going to change the required d_t_fmt value for
>> the POSIX locale.
>
> Why?
>
> Has it been discussed with 'we'? Would any of them like to comment on
> this please?
>
> Does anyone have the historical rational for the disconnect between the
> POSIX locale's date-and-time format string and the date utility's
> default date-and-time format string? That is, why the date(1) default
> output contains %Z and LC_TIME d_t_fmt does not. Was this intentional?
> If yes, for what reason?
>
>> --
>> Geoff Clare 
>> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
>>
>>
>
>

Re: [Issue 8 drafts 0001349]: Where to obtain ISO/IEC standards (footnote)

2020-07-19 Thread shwaresyst

That reference was left as-is because the Issue 8 draft targets C17 (or 18), 
but there's a chance c2x may get ratified first so that may affect later drafts.
On Sunday, July 19, 2020 Quentin Rameau  wrote:
Hello,

> == 
> Summary:                    Where to obtain ISO/IEC standards (footnote)
> == 

The C standard specified by current (and next) POSIX is C99, but this
standard doesn't seem to be available at all from the ISO/IEC, at
least from their website which shows it as withdrawn, same for C11,
if favor of C18 which is the only one is accessible.

What would then be the correct way to access such standard (c99)?
Should that then be added to the footnote?

Thanks for any clarification!



Re: [1003.1(2016)/Issue7+TC2 0001345]: date(1) default format

2020-07-12 Thread shwaresyst

The reason for the disconnect, that I see, is because the %Z modifier 
references the TZ environment variable, not a value in a struct tm, adding it 
to d_t_fmt would disqualify the definition from being __STD_C__ conforming. 
This is one of the areas where the C standard can be considered broken, leaving 
Time Zone handling unspecified. As a backwards compatibility matter changing 
the tm struct to accommodate time zone info properly is not indicated, by 
either standard. It breaks too much code that relies on the current definition 
of the tm_year and tm_isdst fields. The people that can do such a change is the 
C committee, not POSIX to keep deferring to it, is what I see more as Geoff's 
point.
On Sunday, July 12, 2020 J William Piggott  wrote:


On Mon, 6 Jul 2020, Geoff Clare wrote:

> J William Piggott  wrote, on 05 Jul 2020:
>>
>>> --
>>> (0004893) geoffclare (manager) - 2020-07-02 10:57
>>> https://austingroupbugs.net/view.php?id=1345#c4893
>>> --
>>> The POSIX locale default date format:%a %b %e %H:%M:%S %Z
>>> %Ycontains two pieces of information beyond the minimum of date and
>>> time required for other locales: the day name (%a) and the time zone (%Z).
>>> Most implementations include these in the default date output for their
>>> implementation-provided locales.
>>>
>>> Since many users would expect to see day name and time zone information in
>>> the default date output (particularly if they are used to the traditional
>>> behaviour that was standardised for the POSIX locale), as a minimum we
>>> should add something to APPLICATION USAGE about this.  We should also
>>> consider recommending that implementations include them (either via
>>> "should" in normative text or a statement in RATIONALE).  Here is a
>>> suggested set of changes that does the latter...
>>>
>>> On page 2638 line 85823 section date, add a paragraph to APPLICATION
>>> USAGE:Since the default date format for locales other than the
>>> POSIX or C locale is not required to include anything beyond the date and
>>> time, whereas for the POSIX or C locale it also includes the day name and
>>> time zone, it may be necessary to specify a format (or override the
>>> locale-selection environment variables) to ensure this information is
>>> included when desired.
>>> On page 2640 line 85914 section date, add these paragraphs to
>>> RATIONALE:Although this standard only requires the default date
>>> format, for locales other than the POSIX or C locale, to include the date
>>> and time, it is common for implementations to include day name and time
>>> zone information as well.  (For the POSIX locale this is required, with the
>>> day name in %a format at the beginning and the time zone in %Z format
>>> before the year.)  Implementations are encouraged to include the day name
>>> (in %a or %A format) and the time zone (in %Z or %z format) in the default
>>> date format for all of the locales they provide.
>>
>> If that rational were applied to the POSIX locale date and time format string
>> the objection would be resolved.
>
> There is no way we are going to change the required d_t_fmt value for
> the POSIX locale.

Why?

Has it been discussed with 'we'? Would any of them like to comment on
this please?

Does anyone have the historical rational for the disconnect between the
POSIX locale's date-and-time format string and the date utility's
default date-and-time format string? That is, why the date(1) default
output contains %Z and LC_TIME d_t_fmt does not. Was this intentional?
If yes, for what reason?

> --
> Geoff Clare 
> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
>
>



Re: [Issue 8 drafts] Error in XCU 2.9.4.5

2020-07-08 Thread shwaresyst

That still has some_other_command never getting executed, so I think just 
adding the printf sufficient.
On Wednesday, July 8, 2020 Nick Stoughton  wrote:
The whole point of this example is to show that the exit status *of the while 
loop* will be zero if the loop does not execute, and if you care about exit 
statuses of commands in the first compound-list (the one that is being tested), 
you must store them as you evaluate that condition.
Possibly it would be helpful to show more by extending the example to something 
like:    while some_command; st=$?; false; do some_other_command; done; printf 
"while loop status: %d; some_command status: %d\n" $? $st
If folks agree that this would help I will file a Mantis bug for you.-- Nick
On Wed, Jul 8, 2020 at 2:55 AM Wilhelm Mueller  wrote:

I can't log in with the bug tracker, so I'll try to send this directly
to the list.

Section 2.9.4.5
Page 2302

To retain the final status of "compunt-list-1", the note gives the
following example:

Line 74477
        while some_command; st=$?; false; do ...

This will retain the status, but with 'false' always failing by
definition, the loop will never be run.

My suggestion is to replace it by the following:

        while some_command; st=$?; [ $st -eq 0 ]; do ...

Yours,
Wilhelm

-- 

  fixed pitch fonts! **
  Wilhelm Müller          mu...@gmx.net             (o_
                                          (o_  (o_  //\
  1024D/2048g  5E6E CF83 B15E C7ED 1A31   (/)_ (/)_ V_/_
   F9435BF6    E9F3 F509 FD7B F943 5BF6             © N.Smith




RE: Why does %#x omit the 0x prefix for a zero value?

2020-07-06 Thread shwaresyst

The necessity for "0x" is to disambiguate from octal numbers with their leading 
'0', or decimal for a context allowing leading zeroes, but since a 0 is the 
same in all radices I suspect the decision was not to require it to keep field 
width minimal for delimited formats like CSV.


As to 2nd, "#.8x" forces a 10 char output for non-zero values, the "0x" 
followed by the 8 digits for the explicit precision; for zero values and the 
other format it stays at 8, as the "0x" considered part of the width, I would 
think. This could be more explicit, but I think matches existing practice for 
how many spaces get inserted to do a right justify in a field width.
On Monday, July 6, 2020 Schwarz, Konrad  wrote:

Sorry, this isn’t really a POSIX or a standards question, but does anyone know 
why this was defined this way?   Was it just codification of “historical 
practice” (i.e., a non-fatal bug)?
 
  
 
While we’re at it: when print formatting integers, are there any disadvantages 
of using a precision specification over a zero flag followed with a field 
width, i.e., “%#.8x” vs. “%#08x”?
 

Re: LC_CTYPE=UTF-8

2020-06-25 Thread shwaresyst

There are plans for this, having a POSIX.UTF-8 locale as an XSI base 
requirement. There may be POSIX.UTF-E and UTF-I locales too; same features, 
simply the different charmaps. As options there may even be, albeit this is 
unlikely as no platform I'm aware of fully supports ISO-6429 now, a POSIX.ISO-7 
and POSIX.ISO-8 specification as well. Because c11 and c17 are fundamentally 
broken, with only a minimal partial fix slated for c2x, there are no viable 
plans for a C.UTF-8 or C.UTF-E proposal that I've ever seen. 

However, the way the standard is written now only the repertoire that 
transforms to a single byte encoding may be used, and is what the c2x fix 
limits itself to. This is effectively normative support only of ASCII-68, not 
ISO-646 or 10646. Expanding support to include some of the 2 byte graphic 
repertoire is already permitted by POSIX, but not required. Making allowances 
for most of the UCS2 repertoire is fairly easy, including its 3 byte UTF-8 
representations, but the text for this, and the significant changes for the 4 
byte form needed for full UCS-4 and UTF-16 support, is still to be proposed.

The point is it is still too early, in my opinion, to say what additional 
capabilities these locales will provide to applications to ease multi-lingual 
portability. Of the four choices I see the second or third as the minimum 
desireable. The industry as a whole needs to communicate how much of Unicode 
they want to be supported in Issue 8 or they will be stuck with the minimal 
represented by ASCII-68. Whatever is decided upon, bug fixes and breaking 
changes to non-portable aspects of existing implementations to be conforming to 
the final formal specification of the locale are to be expected.
On Thursday, June 25, 2020 Ingo Schwarze  wrote:
Hi Alan,

Alan Coopersmith wrote on Thu, Jun 25, 2020 at 07:59:39AM -0700:
> On 6/25/20 6:33 AM, Hans Aberg wrote:

>> Perhaps there should be a default UTF-8 locale: It seems that the
>> current construct does not apply so well to it.

> If the goal is to standardize existing behavior the standard could define
> the C.UTF-8 locale (or perhaps a POSIX.UTF-8 locale) that a number of
> systems already have, which is the standard C/POSIX locale with just the
> character set changed to UTF-8 instead.

This idea makes a lot of sense to me.

If the Austin Group decides that it wants to go into that direction,
i would make sure that both OpenBSD and the software i publish use
that name for a locale with these properties and consistently
recommend using that name.  Both already support a locale with these
properties and select it if the user asks for C.UTF-8 or POSIX.UTF-8,
but so far, they recommend that users specify en_US.UTF-8 (for
historical reasons), which is a bit unfortunate because it looks
like requesting cultural conventions for a particular country, which
is not the intention.

Whether to standardize only C.UTF-8 or both C.UTF-8 and POSIX.UTF-8
as synonyms looks a bit like asking for the best colour of a bikeshed.
Given that the standard already contains the redundancy of requiring
both "C" and "POSIX", maybe it is more consistent to also require
both "C.UTF-8" and "POSIX.UTF-8", but i don't think that matters
greatly.

Yours,
  Ingo



Re: LC_CTYPE=UTF-8

2020-06-25 Thread shwaresyst

The locale requirements specified in the C standard are what is applicable for 
implementations that limit their character encoding to the basic source and 
execution character sets. POSIX requires implementations to support, in at 
least one provided charmap, the superset of the basic sets represented by the 
portable character set. The C standard makes allowance for this, with extended 
character sets, as also being conforming so the use of "C" as synonym for 
"POSIX" is permitted. The use is required only when a platform is configured to 
operate in a POSIX conforming mode, as well, so an implementation electing to 
have separate C and POSIX definitions is plausible still. If the C standard 
ever does make a new requirement that conflicts with what is specified now for 
the POSIX locale then the likelihood is 2 separate locales will be a new 
requirement in a future Issue, no longer an election, to retain backwards 
compatibility. 
On Thursday, June 25, 2020 Martijn Dekker  wrote:
Op 25-06-20 om 21:13 schreef Alan Coopersmith:
> The only thought I had along those lines was that I thought the "C" 
> locale came from the C standard, and might be best left to the C 
> committee to standardize, while this group controls the "POSIX" 
> locale definition.

Actually, as far as POSIX is concerned, the two are synonymous.

XBD 7.2 "POSIX Locale":
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_02
| Conforming systems shall provide a POSIX locale, also known as the C
| locale. In POSIX.1 the requirements for the POSIX locale are more
| extensive than the requirements for the C locale as specified in the ISO
| C standard. However, in a conforming POSIX implementation, the POSIX
| locale and the C locale are identical.


-- 
||    modernish -- harness the shell
||    https://github.com/modernish/modernish
||
||    KornShell lives!
||    https://github.com/ksh93/ksh



Re: [1003.1(2016)/Issue7+TC2 0001345]: date(1) default format

2020-06-21 Thread shwaresyst

Re: Are not the examples demonstrating relevant date utility specification
requirments as follows:

No, they are examples of how the various specified elements can produce output 
reflecting various locale LC_TIME settings, that's all. The actual format 
string is still an unspecified implementation election, not requirement. The 
Description leaves open an implementation can elect to use %c for all locales, 
but to keep backwards compatibility can't make this a requirement.

What can be done, if someone wants to implement it, is add a switch, say "-c", 
that does use %c instead of any default, and then portable code can be written 
that examines LC_TIME data for guiding a parse of the utility's output. Then an 
enhancement request to require that new switch can be drafted.
On Sunday, June 21, 2020 J William Piggott  wrote:


On Fri, 5 Jun 2020, Geoff Clare wrote:

> J William Piggott  wrote, on 05 Jun 2020:
> >
> > On Tue, 26 May 2020, Geoff Clare wrote:
> >
> > >>==
> > >>https://www.austingroupbugs.net/view.php?id=1345
> > >>==
> > >
> > >>Summary:                    date(1) default format
> >
> >  ... >8
> >
> > >
> > >>  The several date(1) EXAMPLES seem to support this position.
> > >
> > >None of the examples show what the output from "date +%c" would be,
> > >so I don't see how they support making the default the same as %c.
> > >
> >
> > Lets address one step at a time as to why I suggest that.
> >
> > What are these examples demonstrating that is relevant to the standards'
> > definition?:
> >
> > 85867 $ date
> > 85868 Tue Jun 26 09:58:10 PDT 1990
>
> The required output, when no format is specified, in the POSIX locale
> (presumably with TZ=PST8PDT)
>
> > 85875 $ LANG=da_DK.iso_8859−1 date
> > 85876 ons 02 okt 1991 15:03:32 CET
>
> As per the line above it, this is example output with a locale for
> Denmark, when no format is specified, on an implementation where the
> default date and time format for the selected locale is
> "%a %d %b %Y %T %Z".
>
> It does not imply that this is the correct or expected output when
> using a locale for Denmark, only that this is what you would likely
> get *if* the default format for the locale is "%a %d %b %Y %T %Z".
>
> Nor does it imply anything about how the implementation decides
> what default format to use for that locale.
>
> > 85882 $ LANG=De_DE.88591 date
> > 85883 Mi 02.Okt.1991, 15:01:21 MEZ
>
> Likewise for Germany if the default format is "%a %d. %h. %Y, %T %Z".
>
> > 85888 $ LANG=Fr_FR.88591 date
> > 85889 Mer 02 oct 1991 MET 15:03:32
>
> Likewise for France if the default format is "%a %d %h %Y %Z %T".

So, for examples 2, 3, and 4 your answer to my question is: nothing?

Are not the examples demonstrating relevant date utility specification
requirments as follows:

85674 By default, the current date and time shall be written.
85785 The following environment variables shall affect the execution of date:
85786 LANG


> --
> Geoff Clare 
> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
>
>

RE: Is ksh93's default alias command='command ' POSIX compliant?

2020-06-14 Thread shwaresyst

The command alias is nominally conforming, I believe, in that recursive alias 
expansion isn't permitted so looking for a utility named command still occurs. 
However, the implementation of various utilities as aliases changes the 
reporting of 'command -v', or '-V',  to that they are aliases and not actual 
utilities as a user might expect. Such a change may be allowed, but I don't see 
it as intended for the utilities the standard requires.

The definition of times only as an alias, in the manner used, is not 
conforming; the use of curly braces, being keywords, turns it into a compound 
command the command utility is not expected to see as an argument list, and 
defeats the recognition of times as a special built-in per XCU 2.9.1, as time 
is a regular utility. There still needs to be an implementation that is 
accessed when "\times", to disable expansion as a possibility, is specified as 
well, that I see.
On Sunday, June 14, 2020 Martijn Dekker  wrote:
I am now the maintainer of what is currently, to the best of my 
knowledge, the only actively developed fork of AT ksh93. It is based 
on the last stable AST version, 93u+ 2012-08-01. Along with a few others 
I have been fixing a bunch of bugs. See https://github.com/ksh93/ksh for 
the current activity and some history/rationale in the README.md.

One issue is ksh39's default alias
    command='command '
which continues alias substitution after 'command', defeating its 
ability to bypass aliases.

I think that doing this by default violates POSIX, because 'command' is 
specified as a regular builtin, not as an alias. So I have removed it:

    https://github.com/ksh93/ksh/commit/61d9bca5

There is some disagreement about that, however -- as you can see in the 
comment under that commit (scroll all the way down). And of course it is 
possible that I am wrong. So I would like the ask the Austin Group's 
opinions. Does this alias, by virtue of being default, violate POSIX?

One issue in particular I would note: on 93u+, 'times' is defined as a 
default alias
    times='{ { time;} 2>&1;}'
which not only does not produce POSIX compliant output, but also does 
not combine with the default 'command' alias: 'command times' is a 
syntax error. And that is a perfectly valid POSIX idiom, so at the very 
least, that seems like a straight-up standards violation. However, I 
have already replaced the 'times' alias by a proper POSIX compliant 
builtin on my version. This does demonstrate a problem, however; since 
aliases can contain arbitrary shell grammar, continuing alias expansion 
after 'command' seems problematic even if it doesn't violate POSIX.

    [Possibly, this alias was added to make 'command' work with some
    POSIX commands that are also defined as default aliases, i.e. 'fc',
    'hash', 'times', and 'type' -- as well as some ksh-specific
    scripting-relevant commands: 'autoload', 'compound', 'float',
    'functions', 'integer', 'nameref', 'redirect'. I think defining
    any of these as aliases is a bug in itself, as 'unalias -a' removes
    all of them, so it is impossible for a script to start with a clean
    alias slate without losing essential commands. That's why I'm in the
    process of converting all of these to proper builtin commands.]

- Martijn

-- 
||    modernish -- harness the shell
||    https://github.com/modernish/modernish
||
||    KornShell lives!
||    https://github.com/ksh93/ksh



Re: [1003.1(2016)/Issue7+TC2 0001345]: date(1) default format

2020-06-10 Thread shwaresyst

There is also the POSIX locale is a superset of the C locale as defined by the 
C standard, because it requires support of the Portable Character Set, which 
has more chars than C requires, and has the LC_MESSAGES category; as primary 
differences.
On Wednesday, June 10, 2020 Joerg Schilling 
 wrote:
"Schwarz, Konrad"  wrote:

> Hmm, isn't it also so that applications can make more assumptions about the 
> input/output of utilities?  Defining a POSIX locale
> for the sole purpose of enabling testing the compliance of said locale seems 
> very redundant.  And actually, I'm pretty
> sure the  POSIX locale was defined to behave as traditional, non-locale 
> enabled Unix, to make it possible to have locale support
> as a differentiating feature.

In the late 1990s, many US Solaris users have been confused as the US English 
locale has been introduced past European locales and behaved different from the
C locale they have been used to use before.

Jörg

-- 
 EMail:jo...@schily.net                    (home) Jörg Schilling D-13353 Berlin
    joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001347]: stderr access mode - "is expected to be" is not defined

2020-05-29 Thread shwaresyst

No, I did have the "respectively" in there, for 3 separate possible file 
descriptions. It is also, near as I can tell, a requirement of the C standard 
those FILE records exist and be usable at program startup, whether  
included or not, for referencing either an already open stream or be in a 
closed state. The C11 standard does not say stdin, out, or err can be NULL 
pointers, anyways.
On Friday, May 29, 2020 Stephane Chazelas  wrote:
2020-05-30 02:16:18 +0700, Robert Elz:
>    Date:        Fri, 29 May 2020 18:02:24 +0100
>    From:        Stephane Chazelas 
>    Message-ID:  <20200529170224.ul27advxyfyeh...@chazelas.org>
> 
>  | Why?
> 
> It is kind of the definition...  the numbered fd's are all that
> is passed from process to process (via exec), not the FILE * objects,
> yet the FILE * objects are what are mostly used to access them.
[...]


Sorry, that was my misreading of the initial message. I thought
it was saying that fd 0, 1 and 2 were meant to point to the same
open file description (as they are in processes started by
login/terminal emulators...).

-- 
Stephane


RE: [1003.1(2016)/Issue7+TC2 0001347]: stderr access mode - "is expected to be" is not defined

2020-05-29 Thread shwaresyst

Re: matter in connection with
C's stderr but only for file descriptor 2 (as inherited by the login
shell). 

Isn't there something about stdin, stdout, and stderr being required to 
reference the same open file descriptions as fd's 0, 1, and 2, respectively, 
with inheritance? An application can choose not to reference them via the FILE 
*, but I do not see this lessens any requirement for a fork() to set them up 
this way.

Another reason I see any of these 3 may need to be bidirectional, or full 
duplex, is when they are implemented over serial connections that make use of 
soft flow control protocols, e.g. makes use of "Are you ready?" ENQ ACK/NAK 
pairs behind the scenes to bracket data sends.
On Friday, May 29, 2020 Austin Group Bug Tracker  wrote:

A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1347 
== 
Reported By:                dalias
Assigned To:                
== 
Project:                    1003.1(2016)/Issue7+TC2
Issue ID:                  1347
Category:                  System Interfaces
Type:                      Clarification Requested
Severity:                  Editorial
Priority:                  normal
Status:                    New
Name:                      Rich Felker 
Organization:              musl libc 
User Reference:              
Section:                    stdin 
Page Number:                2017 
Line Number:                64733 
Interp Status:              --- 
Final Accepted Text:        
== 
Date Submitted:            2020-05-28 19:50 UTC
Last Modified:              2020-05-29 10:53 UTC
== 
Summary:                    stderr access mode - "is expected to be" is not
defined
== 

-- 
 (0004880) geoffclare (manager) - 2020-05-29 10:53
 https://austingroupbugs.net/view.php?id=1347#c4880 
-- 
After a bit of digging (with help from Andrew J) it appears that this
wording arose as a result of ERN 40 against XSH6 draft 1, which can be seen
here:

http://www.opengroup.org/austin/docs/austin_34r1.txt

and says: Problem:
 How is stderr opened: input only (not likely), output only (of course)
 or both (there's the rub)?

 Action:
 My preference: "stderr is opened for input and output, although it is
 expected that under normal circumstances it will not be used for input".
 I'm OK with "stderr is opened for output.  Implementations may open it
 for input as well, but a conforming application should not expect
that."
It was submitted by someone called Donn with an Interix email address,
which I assume is Donn Terry. If Donn reads this perhaps he can remember
what led him to raise this.

My initial guess was that it is is related to the more utility reading
input from "standard error", but since more doesn't have to be implemented
in C there would have been no need to raise the matter in connection with
C's stderr but only for file descriptor 2 (as inherited by the login
shell). 

Issue History 
Date Modified    Username      Field                    Change              
== 
2020-05-28 19:50 dalias        New Issue                                    
2020-05-28 19:50 dalias        Name                      => Rich Felker    
2020-05-28 19:50 dalias        Organization              => musl libc      
2020-05-28 19:50 dalias        Section                  => stderr          
2020-05-28 19:50 dalias        Page Number              => unknown        
2020-05-28 19:50 dalias        Line Number              => unknown        
2020-05-28 23:00 Don Cragun    Section                  stderr => stdin    
2020-05-28 23:00 Don Cragun    Page Number              unknown => 2017    
2020-05-28 23:00 Don Cragun    Line Number              unknown => 64733    
2020-05-28 23:00 Don Cragun    Interp Status            => ---            
2020-05-29 10:53 geoffclare    Note Added: 0004880                          
==




Re: Help request and introduction

2020-05-18 Thread shwaresyst

Personally, the offer is appreciated, but I would not find scans of the 
documents wrapped as a pdf all that useful, if this is intent. Using the 
original troff, if this is still available, to generate copy and searchable 
versions, with TOC and xrefs, would be nicer.
On Monday, May 18, 2020 J William Piggott  wrote:


On Mon, 18 May 2020, Geoff Clare wrote:

> J William Piggott  wrote, on 17 May 2020:
>> 1) I am unable to log in to Mantis. My credentials work on other
>>    opengroup.org assets, but Mantis fails with:

>
> I have created an account for you and told Mantis to send you a password
> reset email. (We disabled self sign-up because of spammers.)

Thank you very much; log in was successful.

>
>> 2) Could anyone offer links to, or send me, the xpg3 and xpg4 standards?
>
> They were never released as electronic documents, only paper.

Is there any interest in converting them? I would be willing to do the
work (I cannot say how long it would take though). Did The Open Group
inherit the copyrights?

>
>
> --
> Geoff Clare 
> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
>



Re: [1003.1(2016)/Issue7+TC2 0001341]: The resolution of bugid:1208 as amended by bugnote:4830 is incorrect

2020-05-09 Thread shwaresyst
Just floating this out there... While a posix_spawnp() kernel imp doing PATH 
searches relative to the parent's pwd is a well defined behavior, should there 
be an additional file_action that search for the executable happens after the 
chdir action; for the case where PATH, as inherited by the child environment, 
has "./" or "../" as initial element of a PATH entry?


-Original Message-
From: Austin Group Bug Tracker 
To: austin-group-l 
Sent: Thu, May 7, 2020 04:21 AM
Subject: [1003.1(2016)/Issue7+TC2 0001341]: The resolution of bugid:1208 as 
amended by bugnote:4830 is incorrect


A NOTE has been added to this issue. 
== 
https://austingroupbugs.net/view.php?id=1341 
== Reported 
By:kre Assigned To:
== Project: 
   1003.1(2016)/Issue7+TC2 Issue ID:   1341 
Category:   System Interfaces Type:   Error 
Severity:   Objection Priority:   normal 
Status: New Name:   Robert Elz 
Organization:User Reference:  Section:  
  posix_spawn Page Number:1452 ff.  Line Number:
48227 ff.  Interp Status:  --- Final Accepted Text: 
== Date 
Submitted: 2020-05-05 04:53 UTC Last Modified:  
2020-05-07 08:18 UTC 
== Summary: 
   The resolution of 
https://austingroupbugs.net/view.php?id=1208 as amended by 
https://austingroupbugs.net/view.php?id=1208#c4830 is incorrect 
== 
--  
(0004867) geoffclare (manager) - 2020-05-07 08:18 
https://austingroupbugs.net/view.php?id=1341#c4867 
-- Re 
https://austingroupbugs.net/view.php?id=1341#c4858 "could someone tell me if 
there is actually an implementation of posix_spawnp() that actually processes 
the chdir option [...] in the way it is intended here?"Casper Dik reported on 
the mailing list:The Solaris posix_spawnp() system call 
implementation does has an action to change chdir and it will actually search 
the PATH in the kernel.While the kernel has the ability the execute the file 
actions after the exec as we are still in control, the native syscall 
posix_spawn implementation uses the exact order as a libc implementation is 
forced to obey.The PATH being searched is not necessarily the same PATH in the 
child process but that of the parent. Issue History Date Modified  
  Username   FieldChange   
== 
2020-05-05 04:53 kreNew Issue
2020-05-05 04:53 kreName  => Robert Elz  
2020-05-05 04:53 kreSection   => posix_spawn 
2020-05-05 04:53 krePage Number   => 1452 ff.
2020-05-05 04:53 kreLine Number   => 48227 ff.   
2020-05-05 05:07 kreNote Added: 0004852  
2020-05-05 05:09 kreNote Deleted: 0004852
2020-05-05 08:18 geoffclare Note Added: 0004853  
2020-05-05 15:28 kreNote Added: 0004858  
2020-05-05 15:30 kreNote Edited: 0004858 
2020-05-05 17:36 kreNote Edited: 0004858 
2020-05-06 08:47 geoffclare Note Added: 0004865  
2020-05-07 08:18 geoffclare Note Added: 0004867  
==



Re: [1003.1(2016)/Issue7+TC2 0001341]: The resolution of bugid:1208 as amended by bugnote:4830 is incorrect

2020-05-09 Thread shwaresyst

No, I did not misunderstand. His 'not necessarily' refers to PATH resolutions, 
with respect to relative paths, resolved against the parent's pwd or the 
child's post-chdir file action pwd. Before the action, or no action is added 
like with posix_spawn(), those are the same; after they are usually different. 
Various out-of-tree build scenarios may need searches for just built utilities  
relative to an output tree to be invoked, not a build script in the source tree 
that is the directory the parent is using, as a usage example where this would 
be relevant.
On Saturday, May 9, 2020 Robert Elz  wrote:
    Date:        Sat, 9 May 2020 13:00:54 + (UTC)
    From:        shwaresyst 
    Message-ID:  <600759678.614717.1589029254...@mail.yahoo.com>

  | Just floating this out there... While a posix_spawnp() kernel imp doing
  | PATH searches relative to the parent's pwd is a well defined behavior,

I think you misunderstood:

casper@oracle.com said:
  | The PATH being searched is not necessarily the same PATH in the child
  | process but that of the parent.

What that refers to is the value of $PATH from the parent, not the
parent's PWD.  If it was intended to work as you suggest there, there
would be much less of an issue (the chdir fileactions would be irrelevant,
and posix_spawnp with them would be the same as without them).

  | should there be an additional file_action that search for the
  | executable happens after the chdir action;

Since the file actions happen before the PATH search for the exec,
that is the natural (and I believe, expected) behaviour of the
current chdir and fchdir file actions.

That's what makes them evil to implement in kernel implementation
of posix_spawn where the path search for posix_spawnp is to be in
user code (where it belongs).

  | for the case where PATH, as inherited by the child environment,

The PATH in the child environment seems to be irrelevant (other
than that is what the child, once it is running, sees).

  | has "./" or "../" as initial element of a PATH entry?

You mean any relative path (anything that does not start with a '/').
There doesn't have to be a '.' to get a PWD relative path (though that
is certainly one way).

kre



Re: sh: aliases in command substitutions

2020-04-22 Thread shwaresyst

I never said this was expected to be clean, or even easy to do, just that it is 
plausible for the feature set desired. What mucks it up is things that change 
how lexical elements are expected to be recognized; case conditions should use 
someting like , with left angles being optional, to 
indicate end of pattern, not ')', but these don't become part of the base PCS 
until Issue 9.
On Wednesday, April 22, 2020 Joerg Schilling 
 wrote:
shwaresyst  wrote:

> No, you only lex up to the newline or EOF in the first pass, whether the 
> ending ')' or other context delimeter is found or not, because after the 
> newline may be an io_here body. This is the recognition phase that skips 
> aliases, and the grammar currently has no ambiguities in this regard so it 
> can be lexed. It doesn't even matter if the newline is supposed to be 
> retained in the token because a single quote didn't have it's ending quote 
> yet, the recognition stops there. There probably should be a numbered item 
> about how various keywords establish lexical sub contexts too, but it is 
> accounted for in Item 5 by 'any recursion necessary'.

Did you ever try to write a working shell using lexical parsing for $(...) only?

I did write a working shell based on a clean recursive parsing method and if 
you 
look at: https://www.in-ulm.de/~mascheck/various/cmd-subst/ you'll see that 
bosh/mksh/ksh93 that all use a clean recursive parser are closest to the 
expected behavior. 

Shells that try the lecial only parsing way for $(...), miserably fail.

Jörg

-- 
 EMail:jo...@schily.net                    (home) Jörg Schilling D-13353 Berlin
    joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'


Re: sh: aliases in command substitutions

2020-04-22 Thread shwaresyst

No, you only lex up to the newline or EOF in the first pass, whether the ending 
')' or other context delimeter is found or not, because after the newline may 
be an io_here body. This is the recognition phase that skips aliases, and the 
grammar currently has no ambiguities in this regard so it can be lexed. It 
doesn't even matter if the newline is supposed to be retained in the token 
because a single quote didn't have it's ending quote yet, the recognition stops 
there. There probably should be a numbered item about how various keywords 
establish lexical sub contexts too, but it is accounted for in Item 5 by 'any 
recursion necessary'.

Then evaluations begin from the beginning of the line to check syntax is 
appropriate for the operators and keywords noted as controlling required 
recursions for the first pass up to that newline, most importantly to see if 
any io_here redirects emerge as a result of any substitutions or alias 
expansions that are balanced in their delimeters. It doesn't matter whether a 
line is one command or many, separated by ';', if one of these appears before 
the newline is again scanned it affects how the following line is recognized, 
as here body, new simple or compound command, or continuation of a substitution 
or compound command where the closing delimeter is still to be discovered. 
These may be quotes, a ')' or '}', or a keyword like 'esac'.


I agree because of these checks for keyword delimiters it looks on first read 
like recognition and evaluation are to occur at the same time, but to allow 
things like here documents and aliases requires the two phases when quoting or 
substitutions are present. If the grammar had a production like 
alias-or-command in addition to cmd-name that specified alias names were to be 
checked for first lexically, then you'd have a stronger case, but it never has 
that I've seen.
On Wednesday, April 22, 2020 Joerg Schilling 
 wrote:
shwaresyst  wrote:

> When you are evaluating substitutions, yes, expansion is required, but not on 
> the first pass recognizing them. This is the effect of, in Item 5, "The 
> characters found from the beginning of the substitution to its end, allowing 
> for any recursion necessary to recognize embedded constructs, shall be 
> included unmodified in the result token, including any embedded or enclosing 
> substitution operators or quotes." Alias expansions are modifications, 
> replacing the name with the alias body, and therefore precluded until the 
> token is evaluated. Unquoted keyword recognition is included if the keyword 
> grammar overloads how delimiter recognition of embedded constructs are to be 
> scanned, as a sub context of a command substitution, implicitly in of 'any 
> recursion necessary'.

You still make the mistake to assume that it is possible to find the closing 
')' 
by just using a lexer based solution.

This does not work an since you need to call the full parser recursively to 
find the end of a $(...) comman substitution, te shells use that which implies 
to apply alias substitution.

Jörg

-- 
 EMail:jo...@schily.net                    (home) Jörg Schilling D-13353 Berlin
    joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'


Re: sh: aliases in command substitutions

2020-04-21 Thread shwaresyst

When you are evaluating substitutions, yes, expansion is required, but not on 
the first pass recognizing them. This is the effect of, in Item 5, "The 
characters found from the beginning of the substitution to its end, allowing 
for any recursion necessary to recognize embedded constructs, shall be included 
unmodified in the result token, including any embedded or enclosing 
substitution operators or quotes." Alias expansions are modifications, 
replacing the name with the alias body, and therefore precluded until the token 
is evaluated. Unquoted keyword recognition is included if the keyword grammar 
overloads how delimiter recognition of embedded constructs are to be scanned, 
as a sub context of a command substitution, implicitly in of 'any recursion 
necessary'.

This was discussed ad nauseum last time we looked at aliases, because of the 
effects of opening a string with single or double quotes in an alias body, 
expecting users of an alias to supply a matching closing quote, or ending a 
body with the first character of a redirection operator, how this might affect 
reclassification of following characters; and it was touched on in the 
discussion of adding $' ' type quoting, that during recognition the new escape 
sequences were not to be converted so discovery of the closing single quote was 
reliable. We missed that a few keywords may cause similar recognition 
ambiguities when they're in alias bodies, it appears.


I am not trying to claim it is impossible to have an interpreter that expands 
alias bodies first, simply that sh does not do it that way due to the possible 
ambiguities.
On Tuesday, April 21, 2020 Joerg Schilling 
 wrote:
shwaresyst  wrote:

> No, those are attempts at speed optimizations; the description before the 
> numbered list of XCU 2.3 has line delimiting comes first as the logical model 
> to determine tokenizing mode. This is continued in list items 4. and 5., that 
> substitutions shall not occur during recognition.
>
> This makes it a requirement that a secondary pass, as the logical model, may 
> be necessary to fully evaluate a token according to the grammar that applies 
> for determining whether an alias name should be looked up. This model takes 
> into account the result of a substitution may need to be classified as an 
> assignment word or redirection when the grammar says a command prefix or 
> keyword is the legal tokens, not a command or alias name only.

This is not correct.

When David Korn introduced $(cmd), he did obviously first believe that parsing 
the string inside $(...) could be done at lexical level, but this does not work.
For this reason, it is important to do alias expansion already when just tring 
to find the closing ')' that matches '$('.

Jörg

-- 
 EMail:jo...@schily.net                    (home) Jörg Schilling D-13353 Berlin
    joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'


Re: sh: aliases in command substitutions

2020-04-20 Thread shwaresyst

Yes, I do have an idea, since I was on those phone calls. It is your comments 
that are ill founded. The first unquoted newline terminates the recognition 
phase/lookahead's mentioned. Substitutions occur afterwards to determine final 
token classifications, not during this initial pass. That many substitutions 
can safely occur during this initial pass for various parser algorithms does 
not make them part of the model. Alias replacements occur during left to right 
scan of substitutions establishes a token after evaluations is not a keyword or 
other non-name token and the context according to the grammar to that point 
permits it to still be a command name, not an argument operand. Your question 
was 'is the standard really requiring that?' and imo due to the above the 
answer is 'yes', whether you want to believe it or not.
On Monday, April 20, 2020 Robert Elz  wrote:
    Date:        Mon, 20 Apr 2020 21:17:12 + (UTC)
    From:        shwaresyst 
    Message-ID:  <1050536090.3716059.1587417432...@mail.yahoo.com>

  | No, those are attempts at speed optimizations;

I'm sad to have to reply like this, but do you have any idea at all
what you're talking about?

  | the description before the numbered list of XCU 2.3 has line
  | delimiting comes first as the logical model to determine tokenizing mode.

Yes, it does.  Now go read it.  Really read it.  That distinction is
to separate parsing tokens for the grammar, from here docs.  Newlines
appear at the switches from one mode to another.  That's it.

  | This is continued in list items 4. and 5.

4's sole mention of newlines is that newline joining results in the
\newline combination being completely deleted from the input.  All 4
is saying is that a quoted string is (part of) a single token, and
nothing in it ends the token.

5 doesn't mention newlines at all.  That one just says that the
various word expansions, once started, continue until they end, and
the whole thing is (part of) one token.

Neither quoted strings nor word expansions (or words containing word
expansions) can be aliases, so neither 4 nor 5 is in any way relevant
to alias processing.  (Parsing the command inside a command substitution
means recursive processing of everything - so for that the whole process
starts over.)

  | that substitutions shall not occur during recognition.

That's correct, they don't.  But aliases are not that.

None of the rest of what you say has anything to do with aliases either.
Paramater expansions are not aliases (in ${CC} CC is not an alias).

Please read 2.3.1 properly.  In particular, where it says:

    After a token has been delimited, but before applying the grammatical
    rules in Section 2.10, a resulting word that is identified to be the
    command name word of a simple command shall be examined to determine
    whether it is an unquoted, valid alias name.
  [...varipous conditions omitted, not relevant here]
    the word shall be replaced by the value of the alias

It isn't 100% clear from that (but I believe it is in updated text
that some bug number or other applies to this) that "replaced by"
means that the word (which was detected to be an alias) is deleted,
and the value of the alias is treated as replacement input, and put
through the tokeniser as if it had been in the original input stream.

This is also why aliases cannot be defined and used "close together" - the
alias command has to have been executed before the use of it is parsed, for
it to be effective.  (unalias too).

None of this is in dispute (there are some issues with technical details of
how things get processed in some obscure cases, but none of that is relevant
here).

And once again, none of this is in any way evem slightly relevant to the
question I asked.

kre




Re: sh: aliases in command substitutions

2020-04-20 Thread shwaresyst

No, those are attempts at speed optimizations; the description before the 
numbered list of XCU 2.3 has line delimiting comes first as the logical model 
to determine tokenizing mode. This is continued in list items 4. and 5., that 
substitutions shall not occur during recognition. 

This makes it a requirement that a secondary pass, as the logical model, may be 
necessary to fully evaluate a token according to the grammar that applies for 
determining whether an alias name should be looked up. This model takes into 
account the result of a substitution may need to be classified as an assignment 
word or redirection when the grammar says a command prefix or keyword is the 
legal tokens, not a command or alias name only.

This isn't obvious, but there are many scripts that rely on $CC to provide the 
command name for a compiler, as an example. This can't be checked whether it 
holds an actual name until recognition of the line as a whole has been 
completed.
On Monday, April 20, 2020 Robert Elz  wrote:
    Date:        Mon, 20 Apr 2020 18:01:49 + (UTC)
    From:        shwaresyst 
    Message-ID:  <1837359500.1041757.1587405709...@mail.yahoo.com>

  | It seems to me that what is missing, in XCU 2.3.1, is a statement that use
  | of keywords in alias bodies is unspecified behavior.

That isn't "missing" because it isn't unspecified.  What's more there is
no dispute at all that this works, and works in all shells.

  | Alias expansion occurs after this line is identified,

No, it doesn't.  It occurs immediately after a word has been recognised
in the command position - just the same as keyword recognition - and when
a previous alias expansion has caused the next word to be a potential alias.
Alias expansion (XCU 2.3.1) is in the Token recognition (XCU 2.3) section
of the standard for a reason, it is not a word expansion (XCU 2.6)).

But this is a general discussion of aliases, which is also not the point
of my query (unless this turns into a "remove aliases entirely" discussion)
which was very specific to alias recognition in command substitutions that
are quoted.

Joseph's message helps provide context, and it may be that now the
"historically shells have not done this" is nolonger true, and the
standard should revert to its earlier form.

kre



Re: sh: aliases in command substitutions

2020-04-20 Thread shwaresyst

It seems to me that what is missing, in XCU 2.3.1, is a statement that use of 
keywords in alias bodies is unspecified behavior. 

Even outside double quotes an initial scan collecting tokens to form a logical 
line distinct from a potential io-here body will have to treat an alias name as 
a command name and following arguments. Alias expansion occurs after this line 
is identified, in the context of seeing whether this line has multiple commands 
separated by semi-colons. For this to be reliable keywords establishing 
contexts where the meaning of overloaded operators such as ')' need to be 
disambiguated need to be recognizable as such on that initial pass, not only on 
a subsequent one.
On Monday, April 20, 2020 Joerg Schilling  
wrote:
Robert Elz  wrote:

> (lines 74718-22, Issue 7 TC2 - 2018 edition) says ...
>
>    The input characters within the quoted string that are also enclosed
>    between "$(" and the matching ')' shall not be affected by the
>    double-quotes, but rather shall define that command whose output
>    replaces the "$(...)" when the word is expanded. The tokenizing rules
>    in Section 2.3, not including the alias substitutions in Section 2.3.1,
>    shall be applied recursively to find the matching ')'.
...

> Not even the broken pdksh, which seems to match that ')' after the second
> "foo" as terminating the command substitution, but then processes the
> alias anyway (later) and cannot find a valid case statement within the
> truncated command substitution, so generates a syntax error.
>
> But perhaps that is actually what the standard says must happen - we
> don't use the alias for finding the matching ')', but then do when
> parsing the command inside.    That would be a recipe for disaster,
> but if it is what old versions of ksh did/do then perhaps the standard
> really is requiring that?  If so, it is time for a change, as nothing
> relevant acts like that any more (not mksh, not ksh93, not bosh, ...)

I believe that David Korn at some time believed that he could write a simple 
parser for $(cmd) and introduced the opening '(' in case for simple counting 
symmetry.

This however does not work for other reasons.

ksh93 uses a resursive parser and Thorsten Glaser rewrote pdksh into mksh and 
while fixing plenty of bugs, he also started to use a recursive parser for 
$(cmd).

bosh also uses a recursive parser and it would be of interest whether anyone 
did succeed to implement $(cmd) without using a recursive parser.

ksh93 still uses a different method than bosh/mksh:

-    ksh93 recursively calls the parser to stop at the first superluous ')'
    and records all characters read during this attempt.

-    bosh and mksh recursively call the parser and tell it to stop at a
    superfluous ')' and then translate the binary syntax tree created by
    the parser back into a command text.

Whether it works to use the alias switch=case is a different thing.

If you like this to work, you need to have a lexer that expands aliases before
detecting keywords. "bash" does not seem to do this.

Jörg

-- 
 EMail:jo...@schily.net                    (home) Jörg Schilling D-13353 Berlin
    joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



RE: Bug 1016: is anybody working on adding O_NOCLOBBER?

2020-04-16 Thread shwaresyst

yes, it is being worked on but not as a high priority, so Option 2 appears to 
be a no go. It may be better to revisit the bug and add a test for its 
availability to sysconf()/getconf as a formal option, in a Coming Attractions 
sense, so code can be written portably now based on using Option 1 or existing 
practice.
On Thursday, April 16, 2020 Geoff Clare  wrote:
The resolution for bug 1016 says:

    Make the changes in Note: 0003485, choosing between option 1 and
    option 2 during work on the Issue 8 drafts.

Option 1 encourages O_NOCLOBBER, option 2 requires it.

We resolved it that way in Nov 2016 in the hope that someone would
implement O_NOCLOBBER in the not too distant future, which would mean
we could then require it in Issue 8.

A web search does not turn up any work on it, but perhaps there is
some activity by developers that has not yet been made public. Does
anyone here know of any such activity?

We need to decide what changes, if any, to make for Issue 8 draft 1.
I think appropriate choices would be either do nothing (deferring the
decision to a later draft), or apply option 1 and submit a bug
against Issue 8 draft 1 suggesting that a switch to option 2 could
be made in a later draft if someone implements O_NOCLOBBER in the
meantime.

Switching from option 1 to option 2 would involve a lot of extra
editing work on the source, so I would prefer not to apply option 1
for draft 1 if there is a reasonable chance that we will end up
switching to option 2 in a later draft.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2013)/Issue7+TC1 0000871]: Missing potential error code, EOVERFLOW

2020-04-02 Thread shwaresyst

There was other text that got deleted, so I agree it may look redundant, but 
the reasons for possible failure differ so the error code should appear as both 
"may fail" and "shall fail" cases, not the single "shall fail" case. Other 
interfaces have similar practice.
On Thursday, April 2, 2020 Geoff Clare  wrote:
shwaresyst  wrote, on 02 Apr 2020:
> 
> This is not a duplicate, bug 315 covers only one case, this points out it 
> applies to other places. If anything it is a child, not duplicate, that's why 
> it was Accepted.

It was not "Accepted", it was "Accepted As Marked" and the resolution
only adds EOVERFLOW to sem_post(), which is the same thing the resolution
to bug 315 did.

> On Thursday, April 2, 2020 Austin Group Bug Tracker  
> wrote:
> 
> The following issue has been set as DUPLICATE OF issue 315. 
> == 
> https://austingroupbugs.net/view.php?id=871 

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



RE: [1003.1(2013)/Issue7+TC1 0000871]: Missing potential error code, EOVERFLOW

2020-04-02 Thread shwaresyst

This is not a duplicate, bug 315 covers only one case, this points out it 
applies to other places. If anything it is a child, not duplicate, that's why 
it was Accepted.
On Thursday, April 2, 2020 Austin Group Bug Tracker  
wrote:

The following issue has been set as DUPLICATE OF issue 315. 
== 
https://austingroupbugs.net/view.php?id=871 
== 
Reported By:                shware_systems
Assigned To:                
== 
Project:                    1003.1(2013)/Issue7+TC1
Issue ID:                  871
Category:                  System Interfaces
Type:                      Omission
Severity:                  Comment
Priority:                  normal
Status:                    Resolved
Name:                      Mark Ziegast 
Organization:              SHware Systems Development 
User Reference:              
Section:                    sem_getvalue, sem_post, sem_*wait 
Page Number:                many 
Line Number:                58951, 59170, others 
Interp Status:              --- 
Final Accepted Text:        See
https://austingroupbugs.net/view.php?id=871#c2400. 
Resolution:                Accepted As Marked
Fixed in Version:          
== 
Date Submitted:            2014-08-26 16:51 UTC
Last Modified:              2020-04-02 09:14 UTC
== 
Summary:                    Missing potential error code, EOVERFLOW
==
Relationships      ID      Summary
--
duplicate of        315 sem_post maximum number of semaphores
== 

Issue History 
Date Modified    Username      Field                    Change              
== 
2014-08-26 16:51 shware_systems New Issue                                    
2014-08-26 16:51 shware_systems Name                      => Mark Ziegast    
2014-08-26 16:51 shware_systems Organization              => SHware Systems
Development
2014-08-26 16:51 shware_systems Section                  => sem_getvalue,
sem_post, sem_*wait
2014-08-26 16:51 shware_systems Page Number              => many            
2014-08-26 16:51 shware_systems Line Number              => 58951, 59170,
others
2014-08-27 00:35 dalias        Note Added: 0002365                          
2014-10-02 17:00 Don Cragun    Interp Status            => ---            
2014-10-02 17:00 Don Cragun    Note Added: 0002400                          
2014-10-02 17:00 Don Cragun    Status                  New => Resolved    
2014-10-02 17:00 Don Cragun    Resolution              Open => Accepted As
Marked
2014-10-02 17:01 Don Cragun    Final Accepted Text      => See
https://austingroupbugs.net/view.php?id=871#c2400.
2014-10-02 17:02 Don Cragun    Tag Attached: issue8                        
2014-10-02 17:02 Don Cragun    Note Edited: 0002400                        
2014-10-02 17:06 Don Cragun    Note Edited: 0002400                        
2014-10-02 17:07 Don Cragun    Note Edited: 0002400                        
2014-10-02 17:10 shware_systems Note Added: 0002401                          
2014-10-02 17:30 Don Cragun    Note Edited: 0002400                        
2014-10-02 17:31 Don Cragun    Note Added: 0002402                          
2014-10-02 17:40 Don Cragun    Note Added: 0002403                          
2014-10-02 17:41 Don Cragun    Note Edited: 0002400                        
2014-10-02 17:42 Don Cragun    Note Edited: 0002403                        
2020-04-02 09:14 geoffclare    Note Added: 0004810                          
2020-04-02 09:14 geoffclare    Relationship added      duplicate of 315
==




Re: Mantis project for Issue8 bug reports?

2020-03-29 Thread shwaresyst

Um, I did not say that people should get a draft whenever they want one, simply 
that the other list is being underutilized for discussion of issues arising 
from Mantis, not any drafts, until we have a release we feel is ready for 
public review. I wholly agree trying to reference non-releases would be more 
confusing than helpful. I also feel the list, not Mantis, is the place to 
discuss issues like Donn T. suggests, about what can be done to fix those 
issues that the consensus is more they are broken but a better invention hasn't 
yet appeared.
On Sunday, March 29, 2020 Don Cragun  wrote:
You can't file a bug agains 1003.1(20xx)/Issue 8 because there is no such thing 
until it is approve by the balloting group at IEEE, ISO, and The Open Group.  
Trying to file bugs against a draft of the standard would lead to hundreds of 
different Category values (and a hundreds of large PDF files we would all have 
to keep in case a bug was filed against it).  Note that each note that you see 
in the mailing list that has Status Applied could indicate that a new draft has 
been created.

Despite Mark's belief that we should all be able to get a new draft whenever we 
want one, trying to keep track of what draft we're talking about in a situation 
like that would be horrendous.  (Nonetheless, it would be nice to be able to 
look at the current status of the draft to make it easier to see how all of the 
changes for related bugs fit together.)

If you believe that there is a problem in the resolution of a bug that you can 
no longer add a comment to, I would suggest creating a new bug as follows:
1.  Use the same Category as was in the bug that has been resolved.
2.  In the Summary say something like "Problem in resolution of bugid:" 
(where "bugid:" is literal text and "" is replaced by the bug number that 
you believe was incorrectly resolved).
3.  In the Description say what you believe the problem is in the resolution to 
bugid: as specified in bugnote: (where ) is the number of the note 
in that bug that you believe should change.
4.  In the Desired Action tell us exactly what text in that bug note needs to 
be changed and give us new text that you believe will solve the problem.
5.  If the changes needed are about new text that was added in the resolution 
you can leave the Page Number and Line Number fields blank and in the 
Description say something like paragraph added after Paaa, Lbbb-ccc (where aaa, 
bbb, and ccc specify where the text was added relative to the version of the 
standard given in the Category.  If the changes needed are about replacement 
text for existing text in the old standard, specify the page and line numbers 
in that standard that you believe were incorrectly resolved.

The above suggestion is a mild abuse of Mantis, but it should allow you to do 
what you need to do and shoula allow us to figure out exactly what changes you 
think were incorrectly resolved.

Hope this helps,
Don

> On Mar 29, 2020, at 8:47 AM, Robert Elz  wrote:
> 
> Is it time already for a new Project to be added to Mantis in
> which to file bugs on the new text which is not in Issue7, but
> which has been added (bug report closed and marked "Applied") ?
> 
> I have been sending breif summaries of what is being changed for
> Issue8 to the NetBSD mailing list (for changes I don't already
> know are irrelevant to NetBSD - eg: I didn't bothed with the 'i'
> (ignore case) flag added to sed 's' commands as I know we already
> have that) so people can be aware of things that might need to
> change (for some the answer is "nothing needed" in that I just
> didn't know that NetBSD already did whatever).
> 
> One of our developers/users found some (real) errors in one of the
> new applied sections of text (and no, not one I have complained
> about here...)  One of the errors is fairly simple, and could probably
> be fixed editorially by just sending a message to this list, but the
> other is a real issue that is going to need extra definition, and
> a decision (there are two ways it could be resolved, and they are
> quite different, and I personally have no idea which is right).
> 
> I know I'm being obscure - that's because I'm leaving it for the
> person who noticed the problem to report it.
> 
> So the actual issue isn't relevant to this message - but the correct
> form to use in Mantis to report a bug in text that has been edited
> into what will be Issue8 is unclear to me.  Is it still Issue7-TC2 ?
> The relevant text with the problem doesn't appear in it however (so
> there are no page/line numbers, etc).
> 
> kre
> 




RE: Mantis project for Issue8 bug reports?

2020-03-29 Thread shwaresyst

This has been suggested, but the ORs feel the time for such is when Issue 8 is 
formally released, not before. The austin-futures-l list is more the intended 
vehicle for issues of that nature, as to what's in place already, but it does 
not have its own Mantis, or other browser based, interface.
On Sunday, March 29, 2020 Robert Elz  wrote:
Is it time already for a new Project to be added to Mantis in
which to file bugs on the new text which is not in Issue7, but
which has been added (bug report closed and marked "Applied") ?

I have been sending breif summaries of what is being changed for
Issue8 to the NetBSD mailing list (for changes I don't already
know are irrelevant to NetBSD - eg: I didn't bothed with the 'i'
(ignore case) flag added to sed 's' commands as I know we already
have that) so people can be aware of things that might need to
change (for some the answer is "nothing needed" in that I just
didn't know that NetBSD already did whatever).

One of our developers/users found some (real) errors in one of the
new applied sections of text (and no, not one I have complained
about here...)  One of the errors is fairly simple, and could probably
be fixed editorially by just sending a message to this list, but the
other is a real issue that is going to need extra definition, and
a decision (there are two ways it could be resolved, and they are
quite different, and I personally have no idea which is right).

I know I'm being obscure - that's because I'm leaving it for the
person who noticed the problem to report it.

So the actual issue isn't relevant to this message - but the correct
form to use in Mantis to report a bug in text that has been edited
into what will be Issue8 is unclear to me.  Is it still Issue7-TC2 ?
The relevant text with the problem doesn't appear in it however (so
there are no page/line numbers, etc).

kre



Re: sed '\n\nnd'

2020-03-26 Thread shwaresyst

Sorry, I was confusing the 'd' command, thinking 'display', for the 'l' 
command. Since stdout is supposed to echo after processing, I did have it 
reversed; if the gnu version echos the 'n' it probably wasn't 'delete'd, and 
for the others it likely was. 
On Wednesday, March 25, 2020 Harald van Dijk  wrote:
On 25/03/2020 23:30, shwaresyst wrote:
> yes, without them the argument would be "nnnd", after quote removal by 
> the shell. The reasoning in first reply was meant to show that the 
> non-GNU versions might be erroneously treating the second '\' as "do 
> contol alias processing always", ignoring that its use as delimeter 
> overrides that interpretation, to get the results observed.

Again, it's the BSD version that treats the second \n as , treating 
the backslash in there as just escaping the delimiter character. You 
have it backwards. The GNU version is the one that treats the second \n 
as .

> 
> On Wednesday, March 25, 2020 Harald van Dijk  wrote:
> 
> On 25/03/2020 21:09, shwaresyst wrote:
>  > If it wasn't in single quotes, then that might be plausible, but I don't
>  > see it as the intent since no other aliases are excluded as
>  > possibilities for after the '/'. The initial "\n" makes 'n' the
>  > delimiter, the 2nd overrides it as being the BRE terminator, and the
>  > following 'n' is the terminator, before the 'd' command. Should there be
>  > something explicit about aliases not being usable when repurposed as
>  > delimiter, maybe.
> 
> This reply makes no sense to me, sorry. The single quotes are processed
> at the shell level. Without single quotes, there would be no backslash
> for sed to process.
> 
> Regardless, the only thing I wrote was that you simultaneously
> considered the GNU version more correct and explained it in a way that
> led me to believe you actually consider the BSD version more correct. I
> wrote absolutely nothing about what the standard says or intends to say.
> 
>  > ----
>  > On Wednesday, March 25, 2020 Harald van Dijk  <mailto:a...@gigawatt.nl>> wrote:
>  >
>  > On 25/03/2020 19:44, shwaresyst wrote:
>  >  > The GNU version is more correct, in my opinion, in that the use of 
> n as
>  >  > a delimiter should take precedence over its use as control character
>  >  > alias with the wording as is. The other versions appear to 
> consider the
>  >  > BRE as  so does not match 'n'.
>  >
>  > You have that backwards, don't you? The GNU version lets the use of \n
>  > as a control character take precedence over its use as a delimiter.
>  > That's why n gets printed: \n\nn is treated as /\n/, which can never
>  > match any single-line string, so nothing gets deleted.
>  >
>  > Likewise,
>  >
>  >    echo n | sed '\n[^\n]nd'
>  >
>  > prints nothing with GNU sed, but prints n with FreeBSD sed for the same
>  > reason: 'n' does contain a character that is not , but does not
>  > contain any character that is not .
>  >
>  >
>  >  > 
> 
>  >  > On Wednesday, March 25, 2020 Oğuz  <mailto:oguzismailuy...@gmail.com>
>  > <mailto:oguzismailuy...@gmail.com 
> <mailto:oguzismailuy...@gmail.com>>> wrote:
> 
>  >  >
>  >  >      echo n | sed '\n\nnd'
>  >  >
>  >  > Above command returns 'n' with GNU sed, and nothing with BSD seds and
>  >  > OmniOS sed. [...]


Re: sed '\n\nnd'

2020-03-25 Thread shwaresyst

yes, without them the argument would be "nnnd", after quote removal by the 
shell. The reasoning in first reply was meant to show that the non-GNU versions 
might be erroneously treating the second '\' as "do contol alias processing 
always", ignoring that its use as delimeter overrides that interpretation, to 
get the results observed.
On Wednesday, March 25, 2020 Harald van Dijk  wrote:
On 25/03/2020 21:09, shwaresyst wrote:
> If it wasn't in single quotes, then that might be plausible, but I don't 
> see it as the intent since no other aliases are excluded as 
> possibilities for after the '/'. The initial "\n" makes 'n' the 
> delimiter, the 2nd overrides it as being the BRE terminator, and the 
> following 'n' is the terminator, before the 'd' command. Should there be 
> something explicit about aliases not being usable when repurposed as 
> delimiter, maybe.

This reply makes no sense to me, sorry. The single quotes are processed 
at the shell level. Without single quotes, there would be no backslash 
for sed to process.

Regardless, the only thing I wrote was that you simultaneously 
considered the GNU version more correct and explained it in a way that 
led me to believe you actually consider the BSD version more correct. I 
wrote absolutely nothing about what the standard says or intends to say.

> 
> On Wednesday, March 25, 2020 Harald van Dijk  wrote:
> 
> On 25/03/2020 19:44, shwaresyst wrote:
>  > The GNU version is more correct, in my opinion, in that the use of n as
>  > a delimiter should take precedence over its use as control character
>  > alias with the wording as is. The other versions appear to consider the
>  > BRE as  so does not match 'n'.
> 
> You have that backwards, don't you? The GNU version lets the use of \n
> as a control character take precedence over its use as a delimiter.
> That's why n gets printed: \n\nn is treated as /\n/, which can never
> match any single-line string, so nothing gets deleted.
> 
> Likewise,
> 
>    echo n | sed '\n[^\n]nd'
> 
> prints nothing with GNU sed, but prints n with FreeBSD sed for the same
> reason: 'n' does contain a character that is not , but does not
> contain any character that is not .
> 
> 
>  > 
>  > On Wednesday, March 25, 2020 Oğuz  <mailto:oguzismailuy...@gmail.com>> wrote:
>  >
>  >      echo n | sed '\n\nnd'
>  >
>  > Above command returns 'n' with GNU sed, and nothing with BSD seds and
>  > OmniOS sed. [...]


Re: sed '\n\nnd'

2020-03-25 Thread shwaresyst

yes, without them the argument would be "nnnd", after quote removal by the 
shell. The reasoning in first reply was meant to show that the non-GNU versions 
might be erroneously treating the second '\' as "do contol alias processing 
always", ignoring that its use as delimeter overrides that interpretation, to 
get the results observed.
On Wednesday, March 25, 2020 Harald van Dijk  wrote:
On 25/03/2020 21:09, shwaresyst wrote:
> If it wasn't in single quotes, then that might be plausible, but I don't 
> see it as the intent since no other aliases are excluded as 
> possibilities for after the '/'. The initial "\n" makes 'n' the 
> delimiter, the 2nd overrides it as being the BRE terminator, and the 
> following 'n' is the terminator, before the 'd' command. Should there be 
> something explicit about aliases not being usable when repurposed as 
> delimiter, maybe.

This reply makes no sense to me, sorry. The single quotes are processed 
at the shell level. Without single quotes, there would be no backslash 
for sed to process.

Regardless, the only thing I wrote was that you simultaneously 
considered the GNU version more correct and explained it in a way that 
led me to believe you actually consider the BSD version more correct. I 
wrote absolutely nothing about what the standard says or intends to say.

> 
> On Wednesday, March 25, 2020 Harald van Dijk  wrote:
> 
> On 25/03/2020 19:44, shwaresyst wrote:
>  > The GNU version is more correct, in my opinion, in that the use of n as
>  > a delimiter should take precedence over its use as control character
>  > alias with the wording as is. The other versions appear to consider the
>  > BRE as  so does not match 'n'.
> 
> You have that backwards, don't you? The GNU version lets the use of \n
> as a control character take precedence over its use as a delimiter.
> That's why n gets printed: \n\nn is treated as /\n/, which can never
> match any single-line string, so nothing gets deleted.
> 
> Likewise,
> 
>    echo n | sed '\n[^\n]nd'
> 
> prints nothing with GNU sed, but prints n with FreeBSD sed for the same
> reason: 'n' does contain a character that is not , but does not
> contain any character that is not .
> 
> 
>  > 
>  > On Wednesday, March 25, 2020 Oğuz  <mailto:oguzismailuy...@gmail.com>> wrote:
>  >
>  >      echo n | sed '\n\nnd'
>  >
>  > Above command returns 'n' with GNU sed, and nothing with BSD seds and
>  > OmniOS sed. [...]


Re: sed '\n\nnd'

2020-03-25 Thread shwaresyst

If it wasn't in single quotes, then that might be plausible, but I don't see it 
as the intent since no other aliases are excluded as possibilities for after 
the '/'. The initial "\n" makes 'n' the delimiter, the 2nd overrides it as 
being the BRE terminator, and the following 'n' is the terminator, before the 
'd' command. Should there be something explicit about aliases not being usable 
when repurposed as delimiter, maybe.
On Wednesday, March 25, 2020 Harald van Dijk  wrote:
On 25/03/2020 19:44, shwaresyst wrote:
> The GNU version is more correct, in my opinion, in that the use of n as 
> a delimiter should take precedence over its use as control character 
> alias with the wording as is. The other versions appear to consider the 
> BRE as  so does not match 'n'.

You have that backwards, don't you? The GNU version lets the use of \n 
as a control character take precedence over its use as a delimiter. 
That's why n gets printed: \n\nn is treated as /\n/, which can never 
match any single-line string, so nothing gets deleted.

Likewise,

  echo n | sed '\n[^\n]nd'

prints nothing with GNU sed, but prints n with FreeBSD sed for the same 
reason: 'n' does contain a character that is not , but does not 
contain any character that is not .

> 
> On Wednesday, March 25, 2020 Oğuz  wrote:
> 
>      echo n | sed '\n\nnd'
> 
> Above command returns 'n' with GNU sed, and nothing with BSD seds and 
> OmniOS sed. [...]


RE: sed '\n\nnd'

2020-03-25 Thread shwaresyst

The GNU version is more correct, in my opinion, in that the use of n as a 
delimiter should take precedence over its use as control character alias with 
the wording as is. The other versions appear to consider the BRE as  
so does not match 'n'.
On Wednesday, March 25, 2020 Oğuz  wrote:
    echo n | sed '\n\nnd'
Above command returns 'n' with GNU sed, and nothing with BSD seds and OmniOS 
sed. The standard says 
   
   -
In a context address, the construction "\cBREc", where c is any character other 
than  or , shall be identical to "/BRE/". If the character 
designated by c appears following a , then it shall be considered to 
be that literal character, which shall not terminate the BRE. For example, in 
the context address "\xabc\xdefx", the second x stands for itself, so that the 
BRE is "abcxdef".

   -
The escape sequence '\n' shall match a  embedded in the pattern space. 
A literal  shall not be used in the BRE of a context address or in the 
substitute function.


but this is not clear at all. Which is the correct behavior here?


-- 
Oğuz


Re: [1003.1(2008)/Issue 7 0000518]: Allow make's "include" to include multiple files

2020-03-19 Thread shwaresyst

I can't reopen the bug to add a note, so like we did 1318 on top of 411, it 
probably should be reopened and discussed. Straightforward or not, what you put 
in is not what is marked as to be put in.
On Thursday, March 19, 2020 Geoff Clare  wrote:
shwaresyst  wrote, on 18 Mar 2020:
>
>> -- 
>>  (0004798) geoffclare (manager) - 2020-03-18 15:21
>>  https://austingroupbugs.net/view.php?id=518#c4798 
>> -- 
>> Note that when applying the changes from this bug, I noted that other
>> changes had been made to the Include Lines subsection since the 2008
>> edition, and therefore I did not follow the instruction to replace that
>> section with the new text, but instead identified the changes between the
>> 2008 edition text and the text given here and then applied equivalent
>> changes to the current text.
>> 
>> Some of the intervening changes were from bug
>> https://austingroupbugs.net/view.php?id=333 adding
>> -include ("silent includes"), but there were also some TC changes. 
> 
> Note 1190 does not include mention of -include, so imo a separate note should 
> be added showing what was committed.

I think a separate note with the details is unnecessary; the way to
merge the changes was obvious and straightforward. If it had not been,
then I would have reopened the bug.

If you really think such a note is needed then go ahead and add one.
(You can copy from the latest Base.pdf in the Issue8 branch in
gitlab.)

> As example, the text about "file not found" always invoking unspecified 
> behavior is not true for -include, it is now explicit these are ignored as 
> the specified behavior.

Sounds like you think a needed change was missed in bug 333, in which
case you should submit a new bug to propose that change.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: XCU: 'exit' trap condition

2020-03-16 Thread shwaresyst

The wording can be construed the intent is the EXIT trap is always expected to 
be called, with a SIGEXIT delivered to the context of the subshell and not the 
parent, and otherwise to the parent for performing the trap in its context and 
terminating the parent. While a subshell context is created as a duplicate of 
the parent, by the time an exit call is processed it may differ significantly 
and so should be its own target; it is not on the parent to attempt any 
operations on behalf of the subshell, that I see.
On Monday, March 16, 2020 Joerg Schilling  
wrote:
Dirk Fieldhouse  wrote:

> On 15/03/20 16:43, Harald van Dijk wrote:
> >...>  >
> > "Before the shell terminates" is not limited to "before the top level
> > shell terminates". If a shell terminates, even if it is a subshell that
> > terminates, any EXIT trap action should be run.
>
> Sure, that is the intended interpretation, but this requirement in the
> DESCRIPTION of 'exit'
>
> "... If the current execution environment is a subshell environment, the
> shell shall exit from the subshell environment with the specified exit
> status and continue in the environment from which that subshell
> environment was invoked; otherwise, the shell utility shall terminate
> with the specified exit status. ..."
>

>
The environment in which the shell executes a trap on EXIT shall be identical 
to the environment immediately after the last command executed before the trap 
on EXIT was taken.
<---

This implies that the exit trap is called from within exit(1), but return(1) 
does not call exit(1) nor is it an alias for exit(1).

Jörg

-- 
 EMail:jo...@schily.net                    (home) Jörg Schilling D-13353 Berlin
    joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2008)/Issue 7 0000411]: adding atomic FD_CLOEXEC support

2020-03-12 Thread shwaresyst

Yw... Also, while change is minor, just putting out for next meeting, we 
probably want to take a look at the Related: bugs with Interpretations, see if 
any those need a timer restart to reflect the change properly.
On Thursday, March 12, 2020 Eric Blake  wrote:
On 3/12/20 1:02 PM, shwaresyst wrote:
> 
> Fyi,  the Last updated: date at top wasn't changed.
> On Thursday, March 12, 2020 Austin Group Bug Tracker  
> wrote:
> 
> A NOTE has been added to this issue.

> --
>  (0004796) eblake (manager) - 2020-03-12 16:35
>  https://www.austingroupbugs.net/view.php?id=411#c4796
> --
> minor tweak to the attached files to fix an instance of O_CLOEXEC that
> should be SOCK_CLOEXEC in relation to accept4().

Thanks. I'll re-upload with that additional date tweak.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.          +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


RE: [1003.1(2008)/Issue 7 0000411]: adding atomic FD_CLOEXEC support

2020-03-12 Thread shwaresyst

Fyi,  the Last updated: date at top wasn't changed.
On Thursday, March 12, 2020 Austin Group Bug Tracker  
wrote:

A NOTE has been added to this issue. 
== 
https://www.austingroupbugs.net/view.php?id=411 
== 
Reported By:                eblake
Assigned To:                ajosey
== 
Project:                    1003.1(2008)/Issue 7
Issue ID:                  411
Category:                  System Interfaces
Type:                      Enhancement Request
Severity:                  Objection
Priority:                  normal
Status:                    Resolved
Name:                      Eric Blake 
Organization:              Red Hat 
User Reference:            ebb.cloexec 
Section:                    various - see desired action 
Page Number:                various - see desired action 
Line Number:                various - see desired action 
Interp Status:              --- 
Final Accepted Text:        See desired action section in attached file
bug411_atomic_CLOEXEC.pdf  2014-05-03 16:45 
Resolution:                Accepted As Marked
Fixed in Version:          
== 
Date Submitted:            2011-04-20 17:01 UTC
Last Modified:              2020-03-12 16:35 UTC
== 
Summary:                    adding atomic FD_CLOEXEC support
==
Relationships      ID      Summary
--
related to          149 Add fdwalk system interface
related to          368 Hidden file descriptors should be requi...
related to          0001208 calling chdir as part of posix_spawn
parent of          598 OH shading and new interfaces
parent of          833 SOCK_* flags in getaddrinfo hints-a...
has duplicate      331 Add 'x' mode to fopen and freopen to fo...
related to          456 mandate binary mode of fmemopen
related to          590 dup2 and signals
related to          591 No reason for OH margins in the synopse...
related to          593 posix_typed_mem_open requires the use o...
related to          662 Clarify or add file descriptor preserva...
related to          0001302 Alignment with C17
related to          0001318 Define close-on-fork flag
== 

-- 
 (0004796) eblake (manager) - 2020-03-12 16:35
 https://www.austingroupbugs.net/view.php?id=411#c4796 
-- 
minor tweak to the attached files to fix an instance of O_CLOEXEC that
should be SOCK_CLOEXEC in relation to accept4(). 

Issue History 
Date Modified    Username      Field                    Change              
== 
2011-04-20 17:01 eblake        New Issue                                    
2011-04-20 17:01 eblake        Status                  New => Under Review 
2011-04-20 17:01 eblake        Assigned To              => ajosey          
2011-04-20 17:01 eblake        Name                      => Eric Blake      
2011-04-20 17:01 eblake        Organization              => Red Hat        
2011-04-20 17:01 eblake        User Reference            => ebb.cloexec    
2011-04-20 17:01 eblake        Section                  => various - see
desired action
2011-04-20 17:01 eblake        Page Number              => various - see
desired action
2011-04-20 17:01 eblake        Line Number              => various - see
desired action
2011-04-20 17:01 eblake        Interp Status            => ---            
2011-04-20 17:02 eblake        Tag Attached: issue8                        
2011-04-20 17:03 eblake        Relationship added      related to 149  
2011-04-20 17:03 eblake        Relationship added      related to 368  
2011-04-28 16:21 eblake        Description Updated                          
2011-04-28 16:21 eblake        Desired Action Updated                      
2011-04-28 16:23 eblake        Note Added: 773                          
2011-04-28 18:41 eblake        Note Added: 774                          
2011-05-03 11:04 drepper        Note Added: 776                          
2011-06-02 19:13 eblake        Relationship added      related to 456  
2011-08-03 21:38 eblake        Note Added: 917                          
2011-08-04 15:42 nick          Final Accepted Text      => See
https://www.austingroupbugs.net/view.php?id=411#c917 
2011-08-04 15:42 nick          Status                  Under Review =>
Resolution Proposed
2011-08-04 15:42 nick          Resolution              

Re: XCU: 'return' from subshell

2020-03-11 Thread shwaresyst

I agree this is something suitable for a TC type resolution. However, given the 
amount of existing practice, I do not see we should leave anything unspecified 
for Issue 8, even if some implementations break as a result. 

Either it always stops function execution or dot file processing, which can be 
implemented with process shared mutexes or semaphores for synchronous 
subshells, or they are ignored in a subshell context to the extent 
interpretation continues after the closing ')'. As the shell stays in the 
foreground after initiating any asynchronous subshells, interpretation inside 
the definition or dot file is already required to continue. It may not get far 
past before the async job is permitted to get to the foreground, but this is 
still past. It may need to be explicit, for that context, that return is also 
required to raise SIGCHLD the same as a normal exit would.
On Wednesday, March 11, 2020 Don Cragun  wrote:
Would this issue be resolved if we change the last sentence of the description 
section of the return Special Built-In Utility from:
    If the shell is not currently executing a function
    or dot script, the results are unspecified.
to:
    If the shell is not currently executing a function
    or dot script running in the same shell execution
    environment as the command that invoked the function
    or dot script, the results are unspecified.
?

Dirk, notice that I said "in the same shell execution environment as the 
command that invoked ..."; not "in the same shell execution environment as the 
function's defining command".  If a function is invoked in a subshell of the 
environment that defined the function that should not affect the behavior of 
return as long as the function doesn't doesn't invoke return in a new shell 
execution environment that it created.)

Since current shells do not all treat return in a subshell as exit, I think we 
should leave that behavior unspecified.  (I see no reason why a conforming 
shell should not be able to report that return was invoked in a shell execution 
environment that is not current execution environment as the command that 
invoked the function or dot script.)

Cheers,
Don

> On Mar 11, 2020, at 10:07 AM, Dirk Fieldhouse  wrote:
> 
> On 11/03/20 15:23, Chet Ramey wrote:
>> ...>
>> What does a `return from the execution environment' mean, exactly? ...
> 
> To clarify, what I wrote was shorthand for "return from the function if
> the 'return' is executed in the same execution environment as" the
> function's defining command, or otherwise (ii) exit or (iii) unspecified
> behaviour.
> 
> So I think your agreement with case (i) means that you would be looking
> for some text like that in my originally (10/03/20 15:22) proposed
> Resolution (a), and therefore (with the clarification above) a
> definition of when a 'return' is "in" a function for case (i) that would
> make it equivalent to case (iii).
> 
> Then the remaining question is whether (case (ii)) there is benefit in
> codifying, normatively or not, the 'exit'-like behaviour when a 'return'
> that is textually contained in a function definition is run in a
> 'separate execution environment' from the function's defining command,
> rather than leaving all other cases unspecified.
> 
> The case (naively) of 'return' used textually outside a function
> definition would be covered by existing 2.14 text "If the shell is not
> currently executing a function or dot script", except that it is widely
> held that "not currently executing a function" includes the situation of
> the previous paragraph. Whatever the wording, it appears controversial
> whether the 'exit'-like behaviour should apply here.
> 
> Without wishing to extend the scope of discussion, the model against
> which readers are likely to interpret the standard's text on functions
> is that of a programming language like C, where a function foo() is said
> to be executing while any statements textually contained in the function
> are executing, and possibly more than once if it performed a successful
> fork(). Any revised wording needs to invalidate such a mental model, if
> that is what the standard intends.
> 
> /df
> 
> --
> London SW6
> UK
> 




Re: XCU: 'return' from subshell

2020-03-10 Thread shwaresyst

After some thought, I believe I'd be in favor more of: a) adding explicitly 
that '{' and '(' introduce a new lexical scope, balanced by '}' and ')' 
respectively; b) when such scopes are asynchronous return shall function the 
same as exit, with, as App Usage, it is up to any trap action for CHLD to store 
a return or exit parameter value for that result to be available to any 
monitoring loop in the main script; and c) something like "return &;" shall be 
equivalent to "{ return &; }" in effect, for that corner case.
On Tuesday, March 10, 2020 Dirk Fieldhouse  wrote:
On 10/03/20 18:25, shwaresyst wrote:
 >
 > I basically agree this is an issue - I see return as more for being
interpreted as a lexical scope abort, whatever the execution context,
and exit an execution scope abort, such as a subshell or separate script
utility environment, as their basic intent. ...

As this is what the reader with no shell source code would take from the
standard, the problem (one of them) is how to converge the spec and
implementations with the least overall pain.

 >                    ... Further complicating things, I don't see that the 
 >standard or
the fixes proposed here adequately addresses the expectations when such
a return is part of a brace group or subshell executed as an
asynchronous command, using '&'.
 >...

The issue of asynchronous lists came to me, obviously, walking the dog
after posting.

If 'return' is a lexical scope abort, an asynchronous list is not a
meaningful place for it to be used. So one's expectations based on the
standard should be for such usage to invoke unspecified behaviour, such
as ignoring the 'return' with/out a warning or generating an error.

The proposed wording in my Resolution (a) to restrict the specification
of 'return' to cases where it is "executing in the same execution
environment as the function's defining compound-command" would in fact
cover asynchronous lists. But it would leave this unspecified:

f() {
  ( echo "What will happen next is Unspecified"; return )
  echo "This probably should not be reached, but typically is"
}

New wording would be required in my Resolution (c), eg in 2.14 'return'
(inside_ _):

"... If the shell is not currently executing a function or dot script,
_or if the current execution context is asynchronous with respect to the
execution context of the function's defining command or the caller of
the dot script,_ the results are unspecified."

Unsurprisingly, the tested implementations are consistent in treating
'return' in an asynchronous context like 'exit' as well.

/df

--
London SW6
UK

RE: XCU: 'return' from subshell

2020-03-10 Thread shwaresyst

I basically agree this is an issue - I see return as more for being interpreted 
as a lexical scope abort, whatever the execution context, and exit an execution 
scope abort, such as a subshell or separate script utility environment, as 
their basic intent. Further complicating things, I don't see that the standard 
or the fixes proposed here adequately addresses the expectations when such a 
return is part of a brace group or subshell executed as an asynchronous 
command, using '&'.
On Tuesday, March 10, 2020 Dirk Fieldhouse  wrote:
1    Summary

XCU Ch2.14 states that 'return' shall cause the shell to leave the
current function or dot script, if any. Ch2.95 says that execution shall
continue with the next command after the function call. Implementations
that claim conformance consistently contradict this specification, if
the function has created a subshell. They can't both be right. As the
specification was in part intended to codify existing practice, how did
this contradiction arise?

2    Description

2.1    Relevant specifications

POSIX Vol3 (XCU) Ch2.14 'return'

says, and has since at least 2004:

> The return utility shall cause the shell to stop executing the current 
> function or dot script. If the shell is not currently executing a function or 
> dot script, the results are unspecified.
In case this wasn't clear enough, in Vol3 Ch2.9.5
:

> A function is a user-defined name that is used as a simple command to call a 
> compound command with new positional parameters. ...
>
> The compound-command shall be executed whenever the function name is 
> specified as the name of a simple command... If the special built-in return 
> ... is executed in the compound-command, the function completes and execution 
> shall resume with the next command after the function call.

So not with some other command inside the function, you'd imagine.

2.2    The issue

Implementations consistently contradict the above wording when the
'return' is in a 'subshell' (probably any 'separate shell execution
context'), treating this use of return as if the script used 'exit'. Yet
suppliers have been claiming conformance to the existing wording.

Are they all wrong, or is an there an adaptation or interpretation of
the specification that would align it with reality?

2.3    Example

A simple test case is

foo() {
    ( if [ "$1" = fum ]; then echo EQ; return 0; fi )
    echo NE; return 1
}

with expected result

$ foo fum || echo WTF
EQ
$

but in the several modern shells (dash-0.5.9.1 and earlier,
bash-4.3-14ubuntu1.4, busybox-static-1:1.22.0-15ubuntu1.4) tested, we get

$ foo fum || echo WTF
EQ
NE
WTF
$

2.4    Interpretations

One oracle has said:

> In the subshell, the shell should not be considered to still be executing a 
> function or dot script. As such, the results should be unspecified, and any 
> behaviour should be valid. The standard may be underspecified here, but any 
> other interpretation is not reasonable.

But if you read the standard without having knowledge of existing shell
internals, it's entirely reasonable (and IMO desirable) to consider that
a shell function is a lexical group, like a script file, which is being
executed as long as any command within the function's defining
compound-command is running; as the spec refers to subshells explicitly
elsewhere (eg 'exit') the reader would have to believe that "subshell"
was accidentally omitted from the list of contexts that 'return' should
return from, to interpret the text as quoted above.

The resolution of a related rejected Defect Report 1042
 says:

> ... the results of using return when you are not in a shell execution 
> environment running a function or a dot script is unspecified.

But this is restating the wording of the standard, unless "in a shell
execution environment" means "in a shell execution environment, and not
in a subshell environment thereof", which, as argued above, is
additional to that wording.

Under DR 842, clarifications
 have been made
on the scope of the 'break' and 'continue' special utilities so that the
expected behaviour matches the specification; 'return' did not receive
such attention.

3    Resolution

I considered these possible resolutions, though others may exist:

a)    the existing 2.14 and 2.95 text means to permit the interpretation
that 'return' from a function when in a subshell context may just exit
the subshell and not return from the function;

b)    the existing 2.14 text is consistent with the observed behaviour but
the 2.95 text specifying function definition must be changed to restrict
the types of command that can be used in the definition;

c)    the text of 2.14 and 2.95 means what it appears to say.

3.1    

Re: [1003.1(2004)/Issue 6 0000267]: time (keyword)

2020-02-25 Thread shwaresyst

No keyword, on its own, presently in the standard is required to produce output 
that may be redirected. What redirections are permitted apply to utilities used 
with those keywords, not the keywords themselves, and this is where the 
conflict appears - does a stderr redirect with time as a keyword apply only to 
the output of time, which is desired but not implemented; only whatever 
command(s) are being processed, as is currently implemented; or both as with 
the utility.

What I proposed could be grafted onto existing utility implementations, 
perhaps, but making it robust is much harder then, that I see, as you have to 
guard against date being used to modify the clock somehow, for one type of 
implementation. It would be less efficient also, as the shell could not apply 
various optimizations it may have to bypass portions of setting up a utility 
environment for special built-ins. The differences are minor in theory between 
regular and special, I agree, but they do affect how utility implementations 
are designed.

As to getting it invented, roughly 80% of the bash implementation for posix 
mode can be reused, I estimate, and changing from keyword to built-in as a new 
wrapper for that is straightforward, it appears. The backend for the new 
switches, as new code, is not as straightforward but shouldn't have any 
insurmountable difficulties; I'd need to spelunk the codebase more to verify 
that. The point is dumb can be fixed, without sacrificing backwards 
compatibility or needing any new keywords. 
On Tuesday, February 25, 2020 Robert Elz  wrote:
    Date:        Tue, 25 Feb 2020 17:47:23 + (UTC)
    From:        shwaresyst 
    Message-ID:  <1701159045.321204.1582652843...@mail.yahoo.com>

  | The thing is, various shells have implemented time as keyword, so this bug
  | is trying to get the standard to reflect actual practice that ignored that
  | Rationale.

The first part of that I understand, and what's more, that was predicted, even
known, at the time, as the implementations (at least one) already existed.

Further it was intended to be permitted - that much is clear, they just
didn't realise all they needed to make unspecified in order for that to work.

  | The debate is more some things allowed to utilities aren't allowed
  | for keywords, and vice versa, so there are advantages to both.

Then that's a pretty silly debate, as absolutely anything can be allowed
for a keyword - part of defining a keyword (which is a grammar token) is
defining the grammar rules that use it, and those can be just about anything
that's desired, in any form that's desired.  It just needs to be consistent
(ie: not cause a conflict, I don't mean be similar to) with the rest of the
grammar.

More likely the real problem here is how to define the grammar for time as
a reserved word to make it match the actual implementations correctly.
That I haven't considered, as I don't think it should even be invented.

We should not be inventing new reserved words that are not in the standard.
An unspecified reserved word (one which might be reserved, or might not be)
is as far as is reasonable to go,  and even that is a huge push at the limits.

  | As another option, if an approach is to be invented, the utility could
  | change from being a regular utility to a special built-in,

That doesn't help - the only differences between them are some execution
time weirdnesses (error consequences, effects of var assigns, etc).

Further, that must be the case, as the relevant part of the script will have
been parsed before we even know what kind of command it is (whether built in
or not, and if built in, whether special or not).  It is way too late to
make syntax decisions, of any kind, by then.

  | This method adds capability not available to keywords
  | such as redirections

keywords can have redirections, almost all of them do, or at least the
structures they define do (all of for/while/until/case/()/{} can be redirected
(subshells aren't implemented with keywords, but with operators, but the
effect is the same for that particular operator pair in this regard).

  | and multiple utilities,

Sorry, don't understand what that means.

  | pipelined or not

Where a program structure introduced by a keyword fits in the grammar
is entirely up to the grammar to determine - and provided it isn't made
ambiguous by so doing, it can have different structures in different
circumstances if desired.

But a utility can only ever be an element in a pipeline, we have no idea
what utility is to be invoked until after the pipeline structure has been
parsed.

  | and requires no grammar changes.

That is certainly correct.

I suspect a large part of the problem might be because of the limits of
the current implemented time reserved word.  That's horrid, and most likely
only in the form it is because it was copying csh, and for some reason,
apparently a need was felt to make the syntax compatible with csh's syntax,
which was a patently 

Re: [1003.1(2004)/Issue 6 0000267]: time (keyword)

2020-02-25 Thread shwaresyst

The thing is, various shells have implemented time as keyword, so this bug is 
trying to get the standard to reflect actual practice that ignored that 
Rationale. The debate is more some things allowed to utilities aren't allowed 
for keywords, and vice versa, so there are advantages to both.

As another option, if an approach is to be invented, the utility could change 
from being a regular utility to a special built-in, because then it is fairly 
simple to implement -b(egin) and -e(nd) switches so use of the utility can 
bracket arbitrary amounts of script, with the shell being tasked to maintain an 
internal variable for the start or elapsed time, like the read utility can set 
visible variables. This method adds capability not available to keywords, such 
as redirections and multiple utilities, pipelined or not, and requires no 
grammar changes. A time format could be an argument to the -b switch, getting 
rid of any dependence on a shell variable being set or not too. It has the 
advantages of both other approaches, in other words, and more.
On Tuesday, February 25, 2020 Robert Elz  wrote:
    Date:        Tue, 25 Feb 2020 10:08:53 +
    From:        Geoff Clare 
    Message-ID:  <20200225100853.GA7003@lt2.masqnet>

  | In particular, ripping out most of the pipeline changes

Yes, that part is obvious, and I am not sure you need "most", all of
it should simply go.  No grammar changes are required at all.

But:
  | and adding the time reserved
  | word on top of the pipefail changes instead.

That is what cannot be done.  We cannot at this late stage reverse the
decision of the original standard not to make time a required reserved
word - making it even an unspecified reserved word will be a substantial
reversal of the earlier decision, and if there was any way that we could
implement the original intent (or most of it) of the original decision
without doing that, I wouldn't be in favour of that either.  But right
now I don't see one.

All that is needed to resolve the issues is to make time unspecified
reserved (what I have been calling reserved reserved - a word without
defined meaning that scripts should not use it), which means that it
cannot be safely used as an alias or function name (which is the right
thing to do as those don't work in shells where time is a reserved
word).

That would be sufficient, though the time utility should probably also
be dropped if it was done that simply, as there would be no defined way
to use it anyway.  Which of course doesn't mean anyone would delete it
from their distributions.

Better would be to spefify just enough of the time reserved word, with
language like

    If the unspecified reserved word time is implemented as a
    reserved word, then when used with a simple command, and without
    any redirections (including pipelines) it must operate as specified
    for the time utility.

    When redirections are present, or when used as an element of
    a pipeline, the results are unspecified.

There can also be added app notes, etc, showing how to use the time
utility in ways that seem to be unspecified, by quoting, by embedding it
in { }, ... and making clear some ways that are not speficied to give
any particular result.    And there should be plenty about this mess
added to the rationale.

We absolutely do not, in fact, must not, specify anything at all about
how a time reserved word should work.  That means there is no need to
waste time discussing nonsense like which format characters in TIMEFORMAT
mean what, as TIMEFORMAT won't even be mentioned.

None of this depends upon there being a proposal for some new way to
do what the time reserved word does in shells that have it.  They will
still have it after all (just be conformant in this area after the changes,
if we get it all correct), and scripts will still not be able to rely upon
it existing, as other shells don't have a reserved "time" - all that is the
correct result, as this is the current state of the world.

We are not required to provide such a solution - it would be nice if
one can be worked out, and agreed, and then implemented enough to be
standardised, but all of that will take a long time - no way it is going
to be ready for issue 8.

  | It's a work in progress but you can see what we have so far in the
  | etherpad at https://posix.rhansen.org/p/bug267

Yes, I know, I have been watching (sporadically).  That is what I have
been commenting on.  Sorry if that was not clear.  I haven't looked again
since you said (or the minutes said, or something) that this issue was
being delayed for a month, as I am assuming that it isn't currently being
worked upon (in the conferences).

kre



Re: [1003.1(2004)/Issue 6 0000267]: time (keyword)

2020-02-24 Thread shwaresyst

I disagree with the notion of using names with colons, especially for new 
utilities, mainly because colon is the character used as path separator in PATH 
and there may be usage ambiguities resulting from this overload. I'd be more in 
favor of implementing labels with another character entirely as distinguishing 
definer, such as caret, and reserving that instead.
On Monday, February 24, 2020 Robert Elz  wrote:
    Date:        Mon, 24 Feb 2020 15:27:22 -0800
    From:        Nick Stoughton 
    Message-ID:  


  | XRAT states "The restriction on ending a name with a  is to allow
  | future implementations that support
  | named labels for flow control; see the RATIONALE for the break built-in
  | utility.",

Ah.  Thanks for that.  I missed that part, and certainly never thought to
look at the pages for break ...

But we can even keep that possibility (as unimplemented as it is) alive
(I guess it sounded like a good idea at the time ... it still does, just
not so good as to be worth spending any effort implementing), by only allowing
such labels to be a NAME with an appended ':'.  Since that is all new
(fictional) usage, we can restrict it however we like...

Then we have the entire remaining name space (anything that is not a NAME)
with a solon appended to use to invent new reserved words as needed, by
making them something that is not a NAME (anything really) with a ':' appended
(it just needs to not require quoting).

One obvious example would be to end with two colons eg: statz::  or we
could do away with the s/s/z/ stuff as no longer being needed, and just
use  stats::  (I still prefer not to imply only timing related statistics
should be available by using anyything suggestive of that).

kre



Re: [1003.1(2004)/Issue 6 0000267]: time (keyword)

2020-02-24 Thread shwaresyst

If you're going to go with 'z' suffix, "statz" is an option too; for 
session/process statistics or "state of the machine".
On Monday, February 24, 2020 Scott Lurndal  wrote:
On Mon, Feb 24, 2020 at 09:23:16AM +, Geoff Clare wrote:
> Chet Ramey  wrote, on 16 Feb 2020:
> >
> > On 2/15/20 9:42 PM, Robert Elz wrote:
> > 
> > > Let's design & implement something better, and then have that (eventually)
> > > standardised, rather than compromising the standard with this horror.  
> > > There
> > > is of course, no reason that the shells that have chosen to copy ksh's
> > > extension cannot continue to do so, no matter how ugly and difficult it
> > > is to do.
> > 
> > OK. Let's start with a proposal. The existing syntax will have to stick
> > around for backwards compatibility, of course.
> 
> In case you two don't read the teleconference minutes, in Thursday's
> meeting we decided to postpone further work on bug 267 for a month to
> allow you time to work on this new proposal.
> 
> > (The simplest proposal is to replace the ksh/bash/mksh/zsh(?) `time' with
> > `ptime' and go from there.)
> 
> The new name needs to be something that is unlikely to be in use by
> applications or users already.  I think ptime is not a good candidate,
> as it is a likely choice for an existing script, function or alias name.
> 

System V had 'timex';  perhaps 'timez' would be a good candidate name?



RE: [Online Pubs 0001326]: Superfluous punctuations

2020-02-23 Thread shwaresyst

The parentheses are there to indicate all those are interfaces, without the 
wordier "last msgsnd interface call", or similar, in those lines.
On Sunday, February 23, 2020 Austin Group Bug Tracker  
wrote:

The following issue has been SUBMITTED. 
== 
https://www.austingroupbugs.net/view.php?id=1326 
== 
Reported By:                dannyniu
Assigned To:                
== 
Project:                    Online Pubs
Issue ID:                  1326
Category:                  Base Definitions
Type:                      Error
Severity:                  Editorial
Priority:                  normal
Status:                    New
Name:                      DannyNiu/NJF 
Organization:                
User Reference:              
URL:                        basedefs/sys_msg.h.html 
Section:                     
== 
Date Submitted:            2020-02-24 04:10 UTC
Last Modified:              2020-02-24 04:10 UTC
== 
Summary:                    Superfluous punctuations
Description: 
In the descriptions for the msqid_ds structure, there's several instances
of "()." occuring for each members of the structure. These are probably
errors in Troff source code or macro packages. 
Desired Action: 
Change


pid_t          msg_lspid  Process ID of last msgsnd
 (). 
pid_t          msg_lrpid  Process ID of last msgrcv
 (). 
time_t          msg_stime  Time of last msgsnd
 (). 
time_t          msg_rtime  Time of last msgrcv
 (). 


to 


pid_t          msg_lspid  Process ID of last msgsnd.
pid_t          msg_lrpid  Process ID of last msgrcv.
time_t          msg_stime  Time of last msgsnd.
time_t          msg_rtime  Time of last msgrcv.
== 

Issue History 
Date Modified    Username      Field                    Change              
== 
2020-02-24 04:10 dannyniu      New Issue                                    
2020-02-24 04:10 dannyniu      Name                      => DannyNiu/NJF    
2020-02-24 04:10 dannyniu      URL                      =>
basedefs/sys_msg.h.html
2020-02-24 04:10 dannyniu      Section                  =>     
==




Re: Re: Re: [1003.1(2004)/Issue 6 0000267]: time (keyword)

2020-02-18 Thread shwaresyst
Strictly from the syntax, yes, if what is before the lparen is a valid NAME. 
Practically, I'd expect an 'keyword redefinition attempt' message rather than 
'syntax error', because "time()" does meet the requirements of the syntax, but 
the standard doesn't require this so silently storing the definition, and 
confusing the user, are more the conforming behavior.


-Original Message-
From: Joerg Schilling 
To: shwaresyst ; kre 
Cc: gwc ; austin-group-l 
Sent: Tue, Feb 18, 2020 12:08 PM
Subject: Re: Re: [1003.1(2004)/Issue 6 267]: time (keyword)


shwaresyst  wrote:>> Don't see why, lparen, like "=", is 
not a char that stops collection of a token body; the function name is 
extracted, supposedly, when the parser finds the lparen rparen pair; it is used 
to start a background job only when it is the first character of the token 
body.Do you expect any function name to be accepted when a function is 
defined?I am not sure whether this is a good idea, as a function that uses the 
name of a reserved word will never be executed and this may cause more 
amazement to the user than an error message that flags the function 
definition.Jörg--  EMail:jo...@schily.net(home) Jörg 
Schilling D-13353 Berlinjoerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/ URL: http://cdrecord.org/private/ 
http://sf.net/projects/schilytools/files/'



Re: Re: [1003.1(2004)/Issue 6 0000267]: time (keyword)

2020-02-18 Thread shwaresyst


Don't see why, lparen, like "=", is not a char that stops collection of a token 
body; the function name is extracted, supposedly, when the parser finds the 
lparen rparen pair; it is used to start a background job only when it is the 
first character of the token body.

-Original Message-
From: Robert Elz 
To: shwaresyst 
Cc: Joerg.Schilling ; gwc 
; austin-group-l 
Sent: Tue, Feb 18, 2020 11:34 AM
Subject: Re: [1003.1(2004)/Issue 6 267]: time (keyword)


Date:Tue, 18 Feb 2020 15:26:43 + (UTC)From:
shwaresyst Message-ID:  
<2085395192.4579856.1582039603...@mail.yahoo.com>  | It shouldn't matter 
whether a token prefix matches a reserved word or not;  | because the lparen is 
not separated by a  function definition should  | be the "best, longest" 
match productionI'm not quite sure what to say to that, other than that that 
interpretationof how things work must be from some other shell entirely, as it 
is noteven close to how POSIX shells work.kre



Re: [1003.1(2004)/Issue 6 0000267]: time (keyword)

2020-02-18 Thread shwaresyst

It shouldn't matter whether a token prefix matches a reserved word or not; 
because the lparen is not separated by a  function definition should be 
the "best, longest" match production that it is interpreted as, and so wouldn't 
be a syntax error. It's on use that keyword recognition would hide whatever 
definition was done, and which might confuse end users, but it appears those 
shells are buggy in not allowing it.
On Tuesday, February 18, 2020 Robert Elz  wrote:
    Date:        Tue, 18 Feb 2020 11:31:54 +0100
    From:        Joerg Schilling 
    Message-ID:  <5e4bbd1a.ixp7vvhraxdcl4i4%joerg.schill...@fokus.fraunhofer.de>


  | There is an official and documented way already, call:
  |
  |     time -p command1 | command2

That's nice, but in a POSIX compatible shell (the FreeBSD shell in
this case)

fbsh $ time() { command time -p "$@"; }
fbsh $ time sleep 1
real        1.00
user        0.00
sys          0.00
fbsh $ 

Also works in dash, yash, nbsh, and zsh

But in ksh93, mksh, bosh, pdksh, and bash, we get (variations of)

bosh $ time() { command time -p "$@"; }
syntax error: `)' unexpected
bosh $ 

That's because time there is a reserved word, and as written that
looks to be an attempt to time the empty sub-shell (which is a
syntax error, correctly).

  | The time utility in POSIX requires to use the -p option to be specified.
  | Otherwise, the results are unspecified.

As written above, the time utility was called with the -p option.

kre




RE: Is there an atomic compare-swap function/routine in the standard?

2020-02-14 Thread shwaresyst

Presently, no. There are operations, such as with semaphores, where this sort 
of functionality might be used but has been left unspecified as many CPUs don't 
support exchanges; for most older ones only loads and stores are atomic, and 
the volatile modifier and sigatomic_t types are geared to be used this way. 
Without direct hardware support, on those systems some aspects of c11s 
stdatomic interfaces get a performance and code size penalty to emulate 
robustly; possibly why it's an optional header entirely.
On Friday, February 14, 2020 Danny Niu  wrote:
As asked, is there an CMPXGCHG-like function in the standard? 

I tried looking for keywords such as cmp, comp, ch, but 
nothing turned up in the system interfaces list. 

The C language introduced atomic functions in C11 though. 




Re: Environment of expansions and visibility of side-effect assignments

2017-07-25 Thread SHwareSyst
By P2366, L75534, redirections are performed in a subshell when no command  
name results, such as one affecting an entire compound command... It goes  
on that redirections otherwise may fail as part of the current execution  
environment, not the pending command environment. Maybe it needs to be more  
explicit, but from that I construe it as when there is a command name 
redirects  are expected to modify the current execution environment and the 
utility 
 environment inherits them.
 
 
In a message dated 7/25/2017 11:38:56 A.M. Eastern Daylight Time,  
k...@munnari.oz.au writes:

Date:Tue, 25 Jul 2017 17:12:28  +0200
From:Joerg Schilling  
Message-ID:   
<59775fdc.mbda4ty+ekbd7zlj%joerg.schill...@fokus.fraunhofer.de>


| Given that Robert came up with this kind of tests:
| 
|  unset X 
| cat <<-EOF 
| ${X=2} 
| EOF  
| echo "${X-1}" 

That was because I rewrote the way the  NetBSD sh does here doc expansions
a year or so ago, and I made those  happen in the sub-shell environ, which
I believed then (and still believe  is correct now) is where all 
redirections
should be performed - which  currently includes the expansion part 
(according
to the spec, as it is  currently - I am not sure I see any particularly
good reason why it must be  that way however.)

So we have tests for that, in our test suite ... I  ran that against bosh,
and sent the (relevant) results to Joerg (many of  the tests check NetBSD
specified, POSIX unspecified, behaviour, like  ${} and more, that those
behaved differently than we expect, I just  ignored, similarly for tests of
NetBSD extensions.)

And of course,  when any application behaves differently when it uses 
vfork()
than it does  using fork, perhaps just compiled with -Dvfork=fork then I
think that's  always a  bug.

kre





Re: Environment of expansions and visibility of side-effect assignments

2017-07-25 Thread SHwareSyst
I disagree about $M, as I see those sub-bullets applying during  evaluation 
of that command or assignments only, in that order of evaluation  isn't 
guaranteed except as a consequence of expansion nesting, and so a  reference 
may get either a pre- or post-assignment value; but by the definition  of the 
${=} and ${:=} expansions the assignments should all be visible  when 
evaluation of the next command begins, as the specified behavior, since no  
nesting of command substitutions is done in the example.
 
 
In a message dated 7/25/2017 10:12:37 A.M. Eastern Daylight Time,  
g...@opengroup.org writes:

Robert  Elz  wrote, on 25 Jul 2017:
>
> Given the  following command sequence ...
> 
> VAR=${M=bla}  /bin/echo ${N=bla} > /tmp/JUNK-${Q=bla}
> echo  $M:$N:$VAR:$Q
> 
> what is the "echo" on the 2nd line supposed to  print?
> 
> It is clear (I think) that expanding VAR='' and N=bla  is correct there,
> and all shells I tested did that,
>  
[...]
> 
> The most common result is
> 
>bla:bla::bla
> 
> which is at least consistent, but to me  incorrect, I believe from reading
> the spec, that the var-assigns and  redirects are supposed to be evaluated
> in the context of the command  about to be executed, which would mean that
> the side-effects from  evaluating them would not be visible in the parent
> shell (which later  runs the 2nd command - at least not when, as in this
> case, the command  is clearly not a builtin utility or function.)

The $M value is  explicitly unspecified (2.9.1, second sub-bullet of the
second bullet  item).

For $Q, I can't find any clear requirement about which  environment the
expansions within the "word" of a redirection associated  with a
non-built-in utility have to be done in.  The file descriptor  operations
must not affect the current shell environment, but I don't think  anything
forbids the expansions affecting it.  However, in the  no-command-name
case (and without using "exec") 2.9.1 clearly says  redirections "shall be
performed in a subshell environment".  If any  of the shells which set Q
in the above command also set it when just  executing:

> /tmp/JUNK-${Q=bla}

then that's a bug (or at  least a non-conformance) in those shells.

-- 
Geoff Clare  
The Open Group, Apex Plaza, Forbury Road,  Reading, RG1 1AX, England




Re: Environment of expansions and visibility of side-effect assignments

2017-07-25 Thread SHwareSyst
Only the explicit assignment preceding the command_word is evaluated in the 
 new context, and uses the value from the caller's context temporarily. The 
 expansions all occur in, and assignments affect, the caller's context  
first. So I agree VAR should stay an empty string or unset, but M, N,  and Q 
getting set to bla is the conforming behavior, and this is what the most  
common result reflects. This is one of the reasons, afaik, that redirect  
evaluation can be swapped with assignment evaluation in XCU 2.9.1,  because 
actual 
value assignment may need to be deferred to the new context after  
expansions related to the redirects are evaluated in the caller's context. 
 
If you add $VAR before the '>', it should put bla bla into JUNK-bla, not  a 
single bla; if you want to check which value the first echo sees.
 
 
In a message dated 7/24/2017 1:51:37 P.M. Eastern Daylight Time,  
k...@munnari.oz.au writes:

Given  the following command sequence ...

VAR=${M=bla} /bin/echo  ${N=bla} > /tmp/JUNK-${Q=bla}
echo  $M:$N:$VAR:$Q

what is the "echo" on the 2nd line supposed to  print?

It is clear (I think) that expanding VAR='' and N=bla is correct  there,
and all shells I tested did that,

Beyond that, little  consistency - there was a clear winning result,, but
not one that is in  accordance with the way I read the spec.

The NetBSD sh (the one I work  on) is in that group - I have long had an
item on my todo list to fix this,  as I consider us to be broken there.

>From my reading of the spec, what  I expect the output is intended to be
is
:bla::

Of  the shells I tested, only zsh managed that (zsh with no options, other
than  zsh -c '...').   That would make zsh (based just on this one  test)
the only possible candidate (among the shells I tested) as being  posix 
conformant!

The most common result is

bla:bla::bla

which is at least consistent, but to me incorrect, I  believe from reading
the spec, that the var-assigns and redirects are  supposed to be evaluated
in the context of the command about to be  executed, which would mean that
the side-effects from evaluating them would  not be visible in the parent
shell (which later runs the 2nd command - at  least not when, as in this
case, the command is clearly not a builtin  utility or function.)

The result
bla:bla::
was also  observed, which looks closer to correct, but inconsistent, I 
cannot
find  any way to read the spec in which that can be the result.

I appreciate  that this kind of thing is not common (which is why it is
just an entry on  my todo list, not even elevated to the status of being
a current project to  consider - just something for the future when I run
out of more useful  things to do) but we should really know what should
happen in cases like  this.

kre




Re: Should "exec" run a shell function?

2017-07-20 Thread SHwareSyst
The grammar has 'program' now too, so it doesn't necessarily follow that a  
script is precluded, I'd think... The exec() would be again of a shell 
instance,  or equivalent. The wording has been, to me, more "replace with what 
is specified  by simple_command as operand", whether that is a binary, 
separate script,  built-in, or function body - all the identifiers valid when 
simple_command is  valid for starting a subshell or not that normally would 
assign a value to $?  after they finish in a subshell. The intent may have been 
limiting it to utility  binaries, by some, as originally even many keywords 
were paged in externals, but  I wouldn't draw a conformance distinction on 
any broader interpretation.
 
 
In a message dated 7/20/2017 5:31:33 A.M. Eastern Daylight Time,  
g...@opengroup.org writes:

Martijn  Dekker  wrote, on 16 Jul 2017:
>
> A test  script:
> 
> #! /bin/sh
> true() {
>  echo "exec runs function"
> }
> exec true
> 
> On zsh,  pdksh and mksh, 'exec' runs the shell function. On all other
> POSIX  shells (at least bash, dash, yash, ksh93, FreeBSD sh), it runs the
>  external 'true' utility.
> 
> Which behaviour is correct?

I  just noticed something not touched on in the previous discussion.

The  EXIT STATUS section for exec says:

If command is  specified, exec shall not return to the shell;
rather, the  exit status of the process shall be the exit status of
the  program implementing command, which overlaid the shell.

The use  of "program" and "overlaid the shell" here means that the
standard clearly  does not allow the execution of built-in utilities and
functions.

Of  course, we could still choose to treat this as a defect in  the
standard.

-- 
Geoff Clare  
The Open Group, Apex Plaza, Forbury Road,  Reading, RG1 1AX, England




Re: quoting in shell parameter expressions

2017-06-20 Thread SHwareSyst
I thought we touched upon this also in discussing adding $'...', with the  
resolution the arguments were WORDs in the grammatical sense and to be 
scanned  that way with regards to quoting. This would supercede that clause in 
221, as  the standard choosing between conflicting interpretations instead of 
leaving it  unspecified.
 
 
In a message dated 6/20/2017 9:37:42 A.M. Eastern Daylight Time,  
g...@opengroup.org writes:

Joerg  Schilling  wrote, on 20 Jun  
2017:
>
> I would like to get a confirmation on how this  expression:
> 
> "${xxx-"a b c"}"
> 
>  is to be understood.

This was discussed when we worked on bug 221, and  is made explicitly
unspecified by the resolution of that bug (to be applied  in Issue 8).

See  http://austingroupbugs.net/view.php?id=221#c399

-- 
Geoff Clare  
The Open Group, Apex Plaza, Forbury Road,  Reading, RG1 1AX, England




Re: IEEE 1003.1b-1993 standard

2017-06-15 Thread SHwareSyst
Ty Joe, that will be useful.
 
 
 
In a message dated 6/15/2017 4:44:49 P.M. Eastern Daylight Time,  
gw...@raytheon.com writes:

 
Joe M. 
I asked the IEEE if they  could provide 1003.1b-1993 to The Austin Group, 
to support the revision of  1003.1-2008, and the IEEE have provided 
1003.1b-1993 as a huge PDF, too large  to mail.  It is 616 pages, and weighs in 
at 
40.9  Mbytes. 
Here is the Google Drive  link:  _ P1003.1b-1993.pdf_ 
(https://drive.google.com/a/ieee.org/file/d/0By0ZpscE3nBvNlFtMXFZSjBqZTg/view?usp=drive_web)
  
Use the Download button in  the upper right corner of the screen to get the 
actual  pdf. 
All thanks to the  IEEE. 
Joe  G



  1   2   >