Re: Thread queue position after unlocking PRIO_PROTECT mutex

2022-10-11 Thread shwaresyst via austin-group-l at The Open Group
Re: the last bitI think that's to account for the case where an implementation 
may asynchronously allow an app thread to modify the base priority while 
another thread is blocked on it, not that the example sched_setparam() has to 
be in the same thread. While just adding it to the tail of the new queues list 
may be fastest to accomplish, I think doing an insertion, at head or a spot 
where relative time to get to head of queue nearest the same as old queue 
position is more the desired behavior.
 
 
  On Tue, Oct 11, 2022 at 4:41 AM, Geoff Clare via austin-group-l at The Open 
Group wrote:   I wrote, on 10 Oct 2022:
>
> I'm trying to understand the second sentence in this paragraph on the
> pthread_mutexattr_getprotocol() page:
> 
>    While a thread is holding a mutex which has been initialized
>    with the PTHREAD_PRIO_INHERIT or PTHREAD_PRIO_PROTECT protocol
>    attributes, it shall not be subject to being moved to the tail
>    of the scheduling queue at its priority in the event that its
>    original priority is changed, such as by a call to sched_setparam().
>    Likewise, when a thread unlocks a mutex that has been initialized
>    with the PTHREAD_PRIO_INHERIT or PTHREAD_PRIO_PROTECT protocol
>    attributes, it shall not be subject to being moved to the tail of
>    the scheduling queue at its priority in the event that its original
>    priority is changed.
> 
> The first sentence is no problem. It's pointing out that items 7 and 8a
> in the description of SCHED_FIFO don't apply to this change of the
> thread's normal priority (since it isn't currently executing at that
> priority).
> 
> But when a thread unlocks a PRIO_PROTECT mutex, in the simple case
> where locking the mutex caused its priority to be raised and unlocking
> it causes its priority to revert to its original value, it has to be
> moved from the queue for the higher priority to the queue for its
> original priority, so it doesn't make any sense to me that the text
> above talks about moving within a queue, and why does it say "in the
> event that its original priority is changed"?

After further reading, I'm not sure it is in a queue at all when it
unlocks the mutex. The rationale implies that it is:

    The process at the front of the ready list is executed until it
    exits or becomes blocked, at which point it is removed from the list.

(it says "process" not "thread", but I think that's just because it is
out of date compared to the normative text it is commenting on).
However, the normative text says the queue is "a thread list that is
ordered by the time its threads have been on the list without being
executed".  The use of "without being executed" here implies that the
thread at the head of the list is removed from the list when it starts
execution, not left there until it blocks or exits.

> The only way I can get any sense out of it is to take it as meaning
> that when the thread moves from the queue for the higher priority to
> the queue for its original priority, it should be placed at the head
> not the tail, which seems reasonable, but it's very unclear.

If the thread is not on a queue at all while it is running, then the
point of the second sentence in the paragraph I originally quoted is
presumably to stop item 7 in the SCHED_FIFO description from requiring
that the thread is placed on a queue when it unlocks the mutex.  I.e. it
keeps running, but now at its original priority (unless unlocking the
mutex makes a higher priority thread runnable, in which it would be
pre-empted by the higher priority thread).

However, the last bit "in the event that its original priority is changed"
still makes no sense.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

  


Re: Can struct sockaddr_un.sun_path be a flexible array member?

2022-07-16 Thread shwaresyst via austin-group-l at The Open Group
Short answer, no. It was erroneously specified as such in the  header 
because there wasn't an agreed upon symbolic constant for the size and I 
believe this notation was the convention before the C standard adopted flexible 
arrays. While an implementation should declare a symbolic constant, some have 
just used an integer constant instead so it's left unspecified.
 
 
  On Sat, Jul 16, 2022 at 1:13 PM, John Scott via austin-group-l at The Open 
Group wrote:   Hi list,

I do not represent any implementations, I ask this merely as an
application developer who has asked around.

Can .sun_path be a flexible array member? The standard says it has
unspecified size, but also normatively says
"The sockaddr_storage structure defined in  shall be large
enough to accommodate a sockaddr_un structure." This doesn't clear
things up unless we have a notion of whether "size of a structure"
includes its flexible array member, and even if that is true, whether
including a flexible array member on sockaddr_storage (albeit one which
a portable application wouldn't know how to access) would satisfy this.

The example for bind() uses sizeof() on .sun_path, suggesting the answer
to my question is "no," but examples aren't normative.

If the standard could say whether this is permitted more clearly, that
would make me happy.

Thanks for your attention to my inquiry,
John
  


Re: POSIX msgfmt: effect of LC_CTYPE on PO file parsing

2022-05-11 Thread shwaresyst via austin-group-l at The Open Group
This is for files that do not specify a separate codeset at all, and for 
interpreting a file that does specify one before it gets to the line with the 
codeset directive, is my understanding, so needs to be there (for now, maybe 
not in future). It maybe should be more explicit codeset changes start with the 
next directive, not applies to a rewind/reread of the whole file.
 
 
  On Wed, May 11, 2022 at 7:30 PM, Bruno Haible via austin-group-l at The Open 
Group wrote:   
https://posix.rhansen.org/p/gettext_draft
Line 960

"Do we need to say this isn't used for message strings, only for parsing
 the .po file?"

The .po file format has a mechanism for specifying the codeset of the
PO file. See line 1009. Therefore LC_CTYPE is *not used* for the
interpretation of the input .po file, only for producing diagnostics
(in combination with the LC_MESSAGES category).



  


Re: When can shells remove "known" process IDs from the list?

2022-04-29 Thread shwaresyst via austin-group-l at The Open Group
It appears to me the set -b wording needs updating, to clarify "may remove the 
job's process ID" is intended to exclude the blocking circumstances listed, and 
since it's a "may", not "shall", whether those exclusions are handled properly 
now is more a quality of implementation than conformance issue.
 
 
  On Fri, Apr 29, 2022 at 10:40 AM, Geoff Clare via austin-group-l at The Open 
Group wrote:   I've been gradually making 
progress on bug 1254 as a background task.
However, today it threw a last curve ball when I was working on an
update to the description of set -b ...

That description includes this near the end:

    When the shell notifies the user a job has been completed, it may
    remove the job's process ID from the list of those known in the
    current shell execution environment

This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
remain known until:

 1. The command terminates and the application waits for the process ID.

 2. Another asynchronous list is invoked before "$!" (corresponding to
    the previous asynchronous list) is expanded in the current execution
    environment.

Then there is the following in the APPLICATION USAGE for wait:

    Historical implementations of interactive shells have discarded
    the exit status of terminated background processes before each
    shell prompt. Therefore, the status of background processes was
    usually lost unless it terminated while wait was waiting for it.
    This could be a serious problem when a job that was expected to
    run for a long time actually terminated quickly with a syntax or
    initialization error because the exit status returned was usually
    zero if the requested process ID was not found. This volume of
    POSIX.1-202x requires the implementation to keep the status of
    terminated jobs available until the status is requested, so that
    scripts like:
    [...]
    work without losing status on any of the jobs.

My initial reaction to this was that the above quote from set -b is
likely a left-over from before the decision to disallow the historical
remove-before-prompting behaviour was made.

However, then I spotted that the text from wait, which seems to be an
attempt to justify that decision, first says it was historical
behaviour for *interactive* shells but then talks about the problems
it could cause for *scripts*.  So it seems to me that the
justification does not stand up to scrutiny.

It also appears that dash still implements remove-before-prompting.

There would seem to be two options to resolve this:

A. Uphold the decision to disallow remove-before-prompting.  This
would mean removing the conflicting text from set -b and updating the
justification on the wait page to something that holds water.
(And dash would need to change in order to conform.)

B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to
add a third list item (for interactive shells only) and deleting the
above quoted text from the wait page.

I'm particularly interested to get the opinions of shell authors on
this.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

  


Re: how do to cmd subst with trailing newlines portable

2022-02-21 Thread shwaresyst via austin-group-l at The Open Group
The compliance factor for locales is more of documentation than exclusionary, 
so Thorsten is correct, the standard allows what he does. Just the fact 
localedef can use any encoding model via suitable charmap data makes this 
somewhat obvious. An implementation may provide locales using any character 
encoding as long as it says which of these have the same requirements of the C 
locale's encoding for the portable character set, and these all map to the same 
wide character encoding. That subset, effectively embodying the POSIX 
'universe' the standard references as "all locales provided", can be expected 
to work with portable code, any others are in the much wider 'universe' of 
unspecified behavior. That subset can be just the POSIX locale too, and the 
implementation is conforming.
 
  On Mon, Feb 21, 2022 at 12:30 AM, Christoph Anton Mitterer via austin-group-l 
at The Open Group wrote:   On Fri, 2022-02-18 at 
00:35 +, Thorsten Glaser wrote:
> You can have nōn-POSIX locales. For example, in mksh, I have a UTF-8
> mode, but I specify that only the "C" locale attempts POSIX
> conformance.

But that sounds like a violation of POSIX, e.g. if you had a locale 'C'
which would encode '.' as 0x2E and another one which encodes the same
as something else - you couldn't just say that only the other locale is
non-POSIX, but then your whole implementation wouldn't be compliant.

Same if any of the other hard rules are broken... like chars from the
portable charset being just one byte long, etc..



> Switching the locale during shell runtime is not allowed to change
> the way the script is parsed, so the variables etc. are all that is
> permitted to “change”, by means of reinterpretation.

Yes I found that now in the standard and I guess it's more or less
clear when a locale change does apply and when not:

- foo=x
  => clear,... all lexical, change doesn't apply

- printf '%s' 'foo'
  => clear, printf is like a command.. so printf would use the change
    locale, depending on what the format actually is (which is also a
    bit ambiguous, see https://www.austingroupbugs.net/view.php?id=1562)

- ${var%foo}
  => well,.. semi-clear

- expanding $# or $?
  => in principle, POSIX seems to allow locales that have different
    encoding for the chars from the portable charset
    So in principle, one could ask whether $# and $? gives the digits
    from the new locale, when it was changed in internally.
    However, POSIX also says, when the portable charset chars are
    encoded differently, and such locales are used, the results are
    unspecified.
    So no doesn't really matter.


> But here you’re lucky again that  has to have the exact same
> encoding across *all* locales supported in one POSIX “universe”, and
> that it must not occur as part of a multibyte encoding in a supported
> locale on the same universe.

Despite that.. and despite the solution that had been discussed here
before, ... having spent quite some thought (and hopefully learned a
bit) about it... I'm still unsure about how to make it the command
substitution with trailing newlines really 100% portable in any
situation (i.e. locale) allowed by POSIX and especially with any shell
conforming to POSIX.

... (below)



> > But at least, it should still work portably, when doing the
> > LC_ALL=C
> 
> No, absolutely not.
> 
> In all supporta̲b̲l̲e̲ scenarios (i.e. those in which you’re not
> entering
> unspecified behaviour already anyway), you’ll be safe with:
> 
> x=$(command; echo .); x=${x%.}
> 
> (Or a variant that carries over $?, of course.)

...

I know you've said earlier, that you considered using '.' enough but
Chet Ramey, Geoff Clare and other still said the LC_ALL=C switch would
be necessary.

It was brought up before that an implementation would be allowed to not
handle it gracefully, if the string was say: "."... and even if that coulnd't form a new character because
of the special properties of '.' ... it could still fail to being
stripped of properly.

Your argument was, that and shell that fails to do that would have a
bug... but it's unclear whether that's really mandated by the standard.


That's why I've asked before:
> I tried to find out in the standard, what POSIX actually says that
> "${tmp%∈}" operates on: bytes or characters.
> 
> And that seems a bit ambiguous (well, to me at least).
> 
> - In some earlier discussion it was pointed out that shell variables
>  should be strings (of bytes, other than NUL)

If variables are byte strings... (which is also disputed, btw.)...

> 
> - 2.6.2 Parameter Expansion
>  doesn't seem to say, what the #, ##, % an %% special forms of
>  expansion work on: bytes or characters
> 
> - 2.13. Pattern Matching Notation says:
>  "The pattern matching notation described in this section is used to
>  specify patterns for matching strings in the shell."
>  => strings... would mean bytes

... and pattern matching notation works on strings (=bytes)...

> 
> - 2.13.1 Patterns Matching a Single 

Re: POSIX gettext() and uselocale()

2022-01-16 Thread shwaresyst via austin-group-l at The Open Group
Historically, gettext domains are process wide, making use in multi-threaded 
apps problematic to begin with. The *_l versions only partially address this. 
The uselocale() interface is included there for the cases where a locale is 
used by both a uselocale() and one or more of the *_l versions, in that a 
second uselocale() call after the retrievals, with a different locale, may 
cause the memory mapping many implementations use for .mo files to be released 
on the next *_l call. Yes, it is not the call itself that causes these 
releases, or shouldn't, but as the root reason, imho, it should stay in the 
list. 
 
  On Sun, Jan 16, 2022 at 4:11 PM, Bruno Haible via austin-group-l at The Open 
Group wrote:   [First sent on 2021-05-03. 
Resending because it has not been handled.]

https://posix.rhansen.org/p/gettext_draft
says (line 358):

  "The returned string may be invalidated by a subsequent call to
  bind_textdomain_codeset(), bindtextdomain(), setlocale(),
  textdomain(), or uselocale()."

While in most programs setlocale(), textdomain(), bindtextdomain(),
bind_textdomain_codeset() are being called at the beginning of the
program execution, before any call to gettext(), the situation is
very different for uselocale().

1) uselocale() is meant to have effects ONLY on the thread in which it
  is called.

2) uselocale() is a helper function to implement *_l functions where
  the POSIX standard does not specify them or the system does not have
  them.
  For example, when a program wants to have a function to parse
  a number, recognizing only the ASCII digits and only '.' as decimal
  separator, a reliable way to implement such a function is by calling
  uselocale of the "C" locale, strtod(), and then uselocale() again
  to switch the thread back to the previous locale.

  If POSIX did not have uselocale(), it would need to provide many
  more *_l functions.

If the gettext() result may be invalidated by a uselocale() call (in
any other thread!), this would mean that

  ** Programs can use gettext() or uselocale() but not both. **

and - more or less -

  ** Multithreaded programs that use libraries (that may use uselocale())
    cannot use gettext(). **

I think that specifying gettext() to be so restricted is not useful.
It would make more sense to allow concurrent uselocale() calls.

Proposed wording:

  "The returned string may be invalidated by a subsequent call to
  bind_textdomain_codeset(), bindtextdomain(), setlocale(),
  or textdomain()."



  


Re: Future of locale, will there be POSIX.utf-8, what will it bring?

2022-01-07 Thread shwaresyst via austin-group-l at The Open Group
C11 tried to add the minimal support for UTF-8, with the u8 string constant 
prefix, but in a broken manner. C2x provides what can be considered a fix for 
this, but does it in a mostly unusable way from the aspect of supporting 
multiple locale languages. That is why you don't see anything about a Unicode 
enabled locale; the fix enables the C locale to stay unchanged.
Because POSIX is adding the  header 16 and 32 bit encodings will be 
supported, separate from wchar_t, via the char16_t and char32_t types. How 
UCS-2 and UCS-4, as encodings, map to the wide character set used by a platform 
is left as a quality of implementation issue for the interfaces in that header, 
so how wchar_t is encoded is considered a non-issue.
What's there is adequate to say minimal support for the 3 primary encoding 
forms has been added, imo. While more aspects of Unicode could be considered in 
scope of the C standard, I think a lot has been left out so implementations, or 
standards like POSIX, aren't locked into having to provide things that their 
end users will rarely, if ever, need. 
 
  On Fri, Jan 7, 2022 at 1:46 PM, Steffen Nurpmeso wrote:   
Hello.

shwaresyst wrote in
 <1494661216.220561.1641574109...@mail.yahoo.com>:
[i resort a bit]
 |  On Thu, Jan 6, 2022 at 3:40 PM, Steffen Nurpmeso via austin-group-l \
 |  at The Open Group wrote:  Hello!
 |
 |I wonder about POSIX.utf-?8, i tried to remember any statement
 |i had read, and Mantis did not show up results.
 |
 |In particular i am interested in whether LC_CTYPE results will
 |bring true Unicode support or not, the reason i am asking is that
 |the upcoming version of my work-box GNU LibC-based (2.34) Linux
 |distribution will provide it like
 |
 |  localedef -i POSIX -f UTF-8 $PKG/usr/lib/locale/C.UTF-8 2> /dev/null \
 ||| true
 |
 |and then this thing is detected as an UTF-8 locale, but causes
 |three test failures of the MUA i maintain because character set
 |conversion behaves differently.
 |
 |My personal opinion was that POSIX.utf8 will bring the complete
 |range of Unicode characters to at least LC_CTYPE, i wonder about
 |LC_COLLATE, as language matching is, hm, very language specific.
 |The rest not (maybe LC_MESSAGES going for UTF-8 though).
 |
 |Is that approximately correct?

 |The first Issue 8 draft is focusing, afaik, on adding the C1x changes \
 |and Mantis Issue 8 tagged items. The changes to XBD 6, 7, etc., that \
 |will formally add a POSIX UTF8 locale are to be part of the second, \
 |maybe third, draft. This is why you don't see them yet.
 |For maximum compatibility with existing practice the required base \
 |repertoire for this will likely be some subset of UCS-2, plus ISO-6429 \

16-bit characters i do not see in POSIX, going that route would
make impossible implementations which use specific bit patterns in
wchar_t, which, if i recall correctly from 2014 or when i was
looking into the issue, is used by at least the Citrus
implementation of the mb* and w* series for at least some asian
languages.  And more .. but that was not the issue i am concerned
about at the moment anyhow, i personally would assume 8-bit aka
UTF-8 character strings to be predominant in Unix based systems,
they surely are in the predominant ones.  (Even though, i have to
say, UTF-16 aka 16-bit characters do have their value for the
majority of the massively declining number of human languages, and
the older i get the more i think using that as a base is a good
decision.)

 |in full, not the complete range. I've hopes this will be significantly \
 |more than the minimal repertoire of C2x, but it may not as a matter \

That made me look for and download a 2020 draft of ISO C2X, i did
not have a look until now.

 |of deferral to the C standard. It should be left up to implementations \
 |still, in my opinion, how much of the range beyond this base they want \
 |to support as extensions, including UTF16 as an encoding. How the LC_* \
 |categories will be extended to fully support that base repertoire accord\
 |ing to the Unicode requirements hasn't been determined yet either, \
 |but this is the nominal goal. 

And from a glance i do not see anything Unicode-enabled-locale
wise.  UTF-16 specifically i do not see ... as you will have to
convert on input and on output in order to use it in your program,
and then you can very well convert to the transparent wchar_t, or
use the wide I/O series which gives it to you.  Minimizing the
tremendous deficiency that many traditional Unix programs have to
face because the historic string interfaces do not provide proper
functionality to deal with human languages is out of scope is it?

At least it seems as if ISO C2X introduces support for UTF-8 as
a native string representation ... in practice it seems Unix
people use GNU libunicode (which explicitly supports UTF-(32|16|8)
i think) as well as ICU (which i think used UTF-16 internally but
offered improved UTF-8 interface performance by then), so the ISO
standard people were able to 

Re: Future of locale, will there be POSIX.utf-8, what will it bring?

2022-01-07 Thread shwaresyst via austin-group-l at The Open Group
The first Issue 8 draft is focusing, afaik, on adding the C1x changes and 
Mantis Issue 8 tagged items. The changes to XBD 6, 7, etc., that will formally 
add a POSIX UTF8 locale are to be part of the second, maybe third, draft. This 
is why you don't see them yet.
For maximum compatibility with existing practice the required base repertoire 
for this will likely be some subset of UCS-2, plus ISO-6429 in full, not the 
complete range. I've hopes this will be significantly more than the minimal 
repertoire of C2x, but it may not as a matter of deferral to the C standard. It 
should be left up to implementations still, in my opinion, how much of the 
range beyond this base they want to support as extensions, including UTF16 as 
an encoding. How the LC_* categories will be extended to fully support that 
base repertoire according to the Unicode requirements hasn't been determined 
yet either, but this is the nominal goal. 
 
  On Thu, Jan 6, 2022 at 3:40 PM, Steffen Nurpmeso via austin-group-l at The 
Open Group wrote:   Hello!

I wonder about POSIX.utf-?8, i tried to remember any statement
i had read, and Mantis did not show up results.

In particular i am interested in whether LC_CTYPE results will
bring true Unicode support or not, the reason i am asking is that
the upcoming version of my work-box GNU LibC-based (2.34) Linux
distribution will provide it like

  localedef -i POSIX -f UTF-8 $PKG/usr/lib/locale/C.UTF-8 2> /dev/null || true

and then this thing is detected as an UTF-8 locale, but causes
three test failures of the MUA i maintain because character set
conversion behaves differently.

My personal opinion was that POSIX.utf8 will bring the complete
range of Unicode characters to at least LC_CTYPE, i wonder about
LC_COLLATE, as language matching is, hm, very language specific.
The rest not (maybe LC_MESSAGES going for UTF-8 though).

Is that approximately correct?

Thanks and Ciao! from Germany,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

  


Re: cut -DF

2021-12-04 Thread shwaresyst via austin-group-l at The Open Group
Yes, there's a path; file an Enhancement Request in Mantis. However, if toybox 
wants to be more POSIX conforming it'll have to add an awk implementation 
anyways, eventually, so not sure such a request would get much traction for 
sponsorship. Those with awk already might not want to add it to their version 
of cut, as unnecessary duplication of functionality. 
 
  On Sat, Dec 4, 2021 at 9:37 AM, Rob Landley via austin-group-l at The Open 
Group wrote:   Since toybox doesn't have its own 
awk yet (and thus awk '{print $3 $4 $5}'),
back in 2017 toybox added the -D, -F, and -O options to cut:

    -D  Don't sort/collate selections or match -fF lines without delimiter
    -F  Select fields separated by DELIM regex
    -O    Output delimiter (default one space for -F, input delim for -f)

-O is -d for output, -F is a regex version of -f, and -D says to show the raw
matches in the order requested (and ONLY those matches, not passing through
lines with no matches).

This lets you do:

  $ echo one two three four five six seven eight nine | cut -DF 7,1-3,2
  seven one two three two

Elliott Hughes (the Android base OS maintainer) asked if I could get the feature
more widely adopted:

  http://lists.landley.net/pipermail/toybox-landley.net/2021-June/012453.html

> your non-POSIX cut(1) extension covers 80% of the in-the-wild use of awk
> anyway :-) if you still talk to any of the busybox folks, we should suggest
> they copy that --- it would be nice for it to be a de facto standard so we
> can get it into POSIX sometime around the 2040s... (and have made lives
> better for the folks who don't care about standards and just want to "get
> things done" in the intervening decades!)

So I offered to implement it in busybox:

  http://lists.busybox.net/pipermail/busybox/2021-June/06.html

And the busybox maintainer merged it here:

  https://git.busybox.net/busybox/commit/?id=0068ce2fa0e3

Is there a path to try to get this option set into posix?

Rob

  


Re: Interpretation starting for a 30 day review (1440)

2021-10-29 Thread shwaresyst via austin-group-l at The Open Group
This is felt required to get POSIX accurately describing what the C standard 
version of system() requires, taking into account where sh differs from the 
minimal requirements of the command shell in that standard. POSIX is as it is 
because it was assumed no programmer would use a option switch character as a 
utility name first character as recommended and so was superfluous, and the 
vast majority don't, but the C standard requires this as it allows any 
characters, besides NUL, are permitted as command name first characters. So, 
the standard is more precise with it than without it.
Because the use of "--" is in an "shall behave as if" clause it is expository, 
not a coding requirement. Some libraries use posix_spawn() to implement 
system(), for example. Some may only add "--" if a check of the string 
determines it is necessary, as also allowed by the standard. 
 
  On Fri, Oct 29, 2021 at 7:50 AM, Robert Elz via austin-group-l at The Open 
Group wrote:       Date:        Fri, 29 Oct 2021 
09:51:09 +0100
    From:        "Andrew Josey via austin-group-l at The Open Group" 

    Message-ID:  <5bf8909a-6cc2-4089-87c1-5fac762fa...@opengroup.org>

  | The following interpretation is starting a 30 day review 
  |
  | 0001440: System Interfaces Calling `system("-some-tool")` fails (although 
it is a valid `sh` command)    
  |
  | Comments are due back no later than November 29 2021.

I object to this one.

In the recent added note (5510) the following appears at the start of
the Rationale for this change:

    There is nothing known that applications can usefully do if the "--"
    is omitted,

That's true, in fact, it is almost possible to prove it (and maybe it
even is).

But
    therefore there is no reason that the standard should not require
    the "--".

that does not follow.  What one could conclude is that no applications
will be broken by adding the "--", but that does not mean the standard
should specify it.

If the standard specifies that the "--" appears, then usages like the
the one in the Subject ( system("-some-tool") ) would be expected to
work, and we know that with current implementations, they do not.

What should be done here, is to advise implementations that

    sh -c -- cmd

is exactly "as if"

    sh -c cmd

and so that adding the "--" does no harm, and is acceptable (the standard
does not require the "--" be omitted, not even by its old wording).  And
that doing this makes things work better, so it is a good idea for
implementations to do that.

It might even be, in fact probably is, worthy of a "Future directions"
stating that the "--" might be required by a future revision of the standard.

But that needs to wait until implementations actually do it.

The problem is that with current implementations, if the cmd is going to
start with '-' or '+' it will be misinterpreted, so to work if a cmd like
that is possible, the application must protect that character, usually by
including some white space before it (though in particular situations there
are other possibilities).

So, please do not approve this interpretation, return it to the group with
instructions that the group not attempt to act as a legislature, deciding
what it feels is good for the world, but as a standards body, correctly
documenting how this can be expected to work, and what a new implementation
needs to do to be compatible with what exists now.

kre

ps: as it happens, I am (or should be if I was not wasting time replying
to this) testing a change to NetBSD that adds the "--" in both system()
and popen() ... but that we will (I expect) have an implementation that
would conform with the proposed text does not mean that it is the correct
thing for POSIX to specify.

  


Re: What string representations of "zero" expr should consider as "zero"?

2021-07-02 Thread shwaresyst via austin-group-l at The Open Group
To the extent XBD 11.1, #6 applies and 2's complement notation is the internal 
representation required, the standard is pretty clear. The first 3 cases all 
evaluate to numeric 0, whether specified in paired quotes or not since the 
shell does quote removal, the +0 case is always a string since + is disallowed 
as a sign character. For the -0 case, since 2's complement does not have a 
representation for it, the practice is it is treated as equivalent to 0. XBD 
11.1 permits leading zeroes, including on a 0 value, for the 00 case, since the 
interpretation is always as decimal. For $'\0' this is effectively a zero 
length string, not a number, even if 2 NUL chars get stored as the 
argument..Similar to +0 is $'\r0',  is not a permitted sign char so that's 
a string. 
Now if more implementations than not are treating a single argument that might 
be a number as an implied "= 0" test, despite it being pretty clear the 
argument chars in this case have to be considered a string, then perhaps the 
Exit Status needs to reflect that as the predominant practice.
 
  On Fri, Jul 2, 2021 at 4:31 AM, Geoff Clare via austin-group-l at The Open 
Group wrote:   Stephane Chazelas wrote, on 01 Jul 
2021:
>
> BTW, for "expr", what is "zero" meant to be?
> 
> I see some variation in behaviour for "00", " 0", "-0", "+0",
> $'\r0', which some (but not all) also treat as zero.

> Also 0,000 or 0,000,000 in locales where "," is a thousand
> separator with ast-open expr (also the builtin expr of ksh93 if
> built as part of ast-open).

I would say the standard is unclear.  To me the most reasonable
interpretation of "The expression evaluates to null or zero" is
that it evaluates to either a null string or a zero-valued integer.
However, that would require "expr 0" to exit with status 0 (because
the 0 argument is treated as a string in this case), which does not
match existing practice.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

  


Re: Minutes of the 14th June 2021 Teleconference

2021-06-15 Thread shwaresyst via austin-group-l at The Open Group
That was a typo, it looks, 723 for 713. Correct link is: 
https://austingroupbugs.net/view.php?id=713


 
  On Tue, Jun 15, 2021 at 4:45 PM, Fred J. Tydeman via austin-group-l at The 
Open Group wrote:   On Tue, 15 Jun 2021 18:13:35 
+0100 Andrew Josey via austin-group-l at The Open Group wrote:
>
>The floating
>point sub-committee will discuss bug 723
>(https://austingroupbugs.net/view.php?id=723 remquo) and advise us
>on what to do.

That link takes me to 723: time is not allowed to write error messages to 
stderr


---
Fred J. Tydeman        Tydeman Consulting
tyde...@tybor.com      Testing, numerics, programming
+1 (702) 608-6093      Vice-chair of PL22.11 (ANSI "C")
Sample C99+FPCE tests: http://www.tybor.com
Savers sleep well, investors eat well, spenders work forever.

  


Re: behavior of printf '\x61'

2021-04-15 Thread shwaresyst via austin-group-l at The Open Group
It is covered in Item 7 of those 11 exceptions, 'x' falling under the blanket 
"every character not specified is unspecified". Portable code is expected to 
use the work alike octal escape, not hex codes. 
 
  On Fri, Apr 16, 2021 at 12:05 AM, Philip Guenther via austin-group-l at The 
Open Group wrote:   The general question is what 
requirements the standard put on the printf utility when the format argument 
contains a \x or other unspecified backslash escape, but the example in the 
subject is a nice concrete example: what's required for or about the output of
        printf '\x61'
?

1003.1-2016 describes the handling of the format argument like this:
-
The format operand shall be used as the format string described inXBD Chapter 5 
(on page 121) with the following exceptions:
-

...followed by a list of 11 exceptions that do not cover \x.  So, let's look at 
XBD Chapter 5:
-
The format is a character string that contains three types of objects
defined below:
   1. Characters that are not "escape sequences" or "conversion
      specifications", as described below, shall be copied to the output.

   2. Escape Sequences represent non-graphic characters and the
      escape character ().

   3. Conversion Specifications specify the output format of each
      argument; see below.
-

Okay, so if it's not an escape sequence or conversion specification, it _shall_ 
be copied to the output.  To jump forward to conversion specifications:
-
Each conversion specification is introduced by the character 
('%').
-

Okay, so \x61 isn't a conversion specification.  Is it an escape sequence?  
Well, there's just a table for those, which lists the following: \\ \a \b \f \n 
\r \t and \v.  There's no "other sequences starting with  are 
unspecified" statement that I can find.

It therefore appears to me that
        printf '\x61'

is required by the standard to output
        \x61

without a following newline.  Unfortunately, the systems I've tested (CentOS 6 
and 7, MacOS, FreeBSD 12, and OpenBSD 6.9) all output an ascii 'a' without a 
following newline.

Did I miss a statement about  somewhere that renders this behavior 
unspecified?


If a wording tweak is deemed to be in order, please note that it should be 
placed or duplicated such that it also applies to the argument interpreted by 
the %b format conversion, because the same "apparently specified but no one 
behaves that way" is true of this:
        printf %b '\x61'


Philip Guenther
  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread shwaresyst via austin-group-l at The Open Group
Then that is conformance bugs in those kernels, to me, in that files of this 
type are not load images exec() is to handle that are usable with dl*(). The 
allowance is for magics differentiating formats of that nature, as I see as the 
intent, not one bypassing what the shell is supposed to determine and in the 
process making illegal what the shell description asserts is required to be 
possible. The way to get shebang processing is as I outlined by adding to set, 
not trying to take advantage of the current language of exec() being too 
permissive.

 
 
  On Mon, Apr 12, 2021 at 9:04 AM, Joerg Schilling via austin-group-l at The 
Open Group wrote:   "shwaresyst via 
austin-group-l at The Open Group"  wrote:

> No, it's not nonsense. The definition of comment has all characters, 
> including '!', shall be ignored until newline or end-of-file being 
> conforming. Then tokenization which might discover an operator, keyword or 
> command continues. This precludes "#!" being recognized as any of those. 
> There is NO allowance for '!' being the second character as reserved for 
> implementation extensions.

#!/bad of course is a normal comment from the vew if a normal shell. 
An execption is mz old "bsh" (not bosh) on a historic UNIX without support for
#! in the kernel.

On all recent platforms, #! is just another *magic number* that is handled by 
the kernel only.

POSIX of course does not limit what magics are recognised by the kernel.

Jörg

-- 
EMail:jo...@schily.net                  Jörg Schilling D-13353 Berlin
                    Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/

  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread shwaresyst via austin-group-l at The Open Group
We are talking about the shell, not some bastardization of execve(), that sees 
it's not a directly loadable process image so treats it as a script. For those 
shells implementing shebang as an extension it is still them piping the body of 
the script after the shebang line, without any token expansion, to an alternate 
interpreter via an exec() of some sort. Second, conforming applications can not 
rely on unspecified behaviors, so having a use beyond that specified makes the 
shell nonconforming. Calling it out like that simply acknowledges a lot of 
shell implementations choose to make themselves nonconforming, I do not see it 
as an endorsement or allowance. The requirement explicitly specified behavior 
shall be implemented as specified takes priority. Some conforming script 
authors may simply want the first line to be a# IMPORTANT USAGE NOTE 
headline, or similar, not want a utility named "!!!" to be exec'd.
What the standard does allow as an extension, and I would support adding to the 
standard, is adding an option to turn off token expansion in here-doc bodies, 
and back on, via set. This allows the effect of shebang to be accomplished 
anywhere in a script, at the expense of a few extra characters for the here 
delimiter and set commands, without any other changes to tokenizing or the 
grammar. 
 
  On Sun, Apr 11, 2021 at 12:15 PM, Harald van Dijk wrote:   
On 11/04/2021 17:09, shwaresyst via austin-group-l at The Open Group wrote:
> No, it's not nonsense. The definition of comment has all characters, 
> including '!', shall be ignored until newline or end-of-file being 
> conforming. Then tokenization which might discover an operator, keyword 
> or command continues. This precludes "#!" being recognized as any of 
> those. There is NO allowance for '!' being the second character as 
> reserved for implementation extensions.

This is wrong on two counts. The first is that you're assuming that this 
will be interpreted by a shell. If execve() succeeds (and the #! line 
does not name a shell), it will not be interpreted by a shell at all, 
and the shell syntax for comments is irrelevant. The second is about 
what happens when it does get interpreted by a shell: POSIX allows 
shells to treat files starting with "#!" specially: "If the first line 
of a file of shell commands starts with the characters "#!", the results 
are unspecified."

Cheers,
Harald van Dijk
  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread shwaresyst via austin-group-l at The Open Group
No, it's not nonsense. The definition of comment has all characters, including 
'!', shall be ignored until newline or end-of-file being conforming. Then 
tokenization which might discover an operator, keyword or command continues. 
This precludes "#!" being recognized as any of those. There is NO allowance for 
'!' being the second character as reserved for implementation extensions.

 
 
  On Sun, Apr 11, 2021 at 11:37 AM, Robert Elz wrote:       
Date:        Sun, 11 Apr 2021 10:46:48 + (UTC)
    From:        shwaresyst 
    Message-ID:  <1413127944.766378.1618138008...@mail.yahoo.com>

  | That's bugs in those shells for POSIX mode then, that I see.

That's nonsense.

  | The conforming behavior is /usr/gcc is found and succeeds at doing nothing,

Nonsense.

That would be a conforming behaviour, it is not "the" conforming behaviour.

POSIX does not define what format a file must be to succeed in being
exec'd by one of the exec*() commands.  The system can have a thousand
different types that work, if it wants, and #! executables are one of
those.  That they're not required to work by POSIX doesn't mean they're
not allowed to work.

For the rest of your message, the reply I just made to Harald's message
applies.

kre

  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread shwaresyst via austin-group-l at The Open Group
That's bugs in those shells for POSIX mode then, that I see. The conforming 
behavior is /usr/gcc is found and succeeds at doing nothing, since it contains 
just a comment line. Other elements of path never get checked. Even in 
non-POSIX mode, trying to process it as a shebang with "/bad" as a ENOEXEC 
because not present, or other reason, does not imply the rest of the path 
should be searched, it should simply return a failure code.
 
 
  On Sun, Apr 11, 2021 at 6:07 AM, Harald van Dijk via austin-group-l at The 
Open Group wrote:   On 10/04/2021 17:08, Robert 
Elz via austin-group-l at The Open Group wrote:
>      Date:        Sat, 10 Apr 2021 11:54:34 +0200
>      From:        "Jan Hafer via austin-group-l at The Open Group" 
>
>      Message-ID:  <15c15a5b-2808-3c14-7218-885e704cc...@rwth-aachen.de>
> 
>    | my inquiry is a question about the potential unexpected behavior of the
>    | shell execution environment on names. It is related to shortcomings of
>    | the command utility.
> 
> I'm not sure I understand.  I read the rest of the message, and I
> couldn't find anything really about any shortcomings, other than perhaps
> some mistakes in interpretation, and usage.

If they are mistakes, they are widespread mistakes. As hinted in the 
links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing 
as files with execute permission, but /bin/gcc as a text file containing 
#!/bad so that any attempt to execute it will fail, there are a lot of 
shells where command -v gcc returns /bin/gcc, but running gcc actually 
executes /usr/bin/gcc instead without reporting any error: this 
behaviour is common to bosh, dash and variants (including mine), ksh, 
and zsh.

Cheers,
Harald van Dijk

  


Re: SIGSTKSZ is now a run-time variable

2021-03-09 Thread shwaresyst via austin-group-l at The Open Group

Yes, it's not something an application would expect to need to keep increasing, 
just that's the part of  I'd move it to. The definition could also be 
the max required by a processor family, with sysconf() reporting a possible 
lower value for a particular processor stepping. At least that way the 
application that doesn't use sysconf() won't be getting SIGSEGV faults.

Additionally, I believe the definition can be calculated at compile time as a 
multiple of ( sizeof(ucontext_t)+sizeof(overhead_struct(s)) ), whatever other 
overhead applies, so I don't see any real need to use sysconf(). This may mean 
having to munge a  by configure, based on config.guess, but that's 
not the standard's headache.


The CS, SC, and PC constants are not in the XSH 2.2.2 table deliberately, from 
Issue 6 TC1, as adding any also requires a bump in POSIX_VERSION or 
POSIX2_VERSION, and often XSI_VERSION. This is so each usage of a constant 
doesn't need individual #ifdefs to test option group availability. The previous 
text was allowing if an implementation wasn't supporting an option group they 
could skip including the related constants in . A simple check of 
VERSION at the top of a source C file suffices now to indicate those constants 
shall be available.
On Tuesday, March 9, 2021 Eric Blake  wrote:
On 3/9/21 10:14 AM, shwaresyst wrote:
> 
> To me that looks like a conformance violation and should be reverted. There 
> is no _SC_SIGSTKSZ defined in  by the standard, to begin with, so 
> that use of sysconf() is a non-portable extension on its own.

Portable apps can't use _SC_SIGSTKSZ, but the standard generally permits
implementations to define further constants.  Then again, re-reading XSH
2.2.2:

" Implementations may add symbols to the headers shown in the following
table, provided the identifiers for those symbols either:

    Begin with the corresponding reserved prefixes in the table, or
..."

but the table lacks a row for  with _CS_* and _SC_* constants.
 Looks like you found an independent defect.

> 
> I could see the definition of SIGSTKSZ being changed to the static minimum a 
> particular processor requires, or is initially allocated as a 'safe' amount, 
> rather than static "default size", and moving SIGSTKSZ to . This 
> would contrast to MINSIGSTKSZ as the lowest value for a platform for all 
> supported processors. Then an application could use sysconf() to query for 
> the maximum size the configuration supports if it wants to use more than 
> that, as a runtime increasable limit.

As I understand it, the concern in glibc is less about runtime
increasability, so much as ABI compatibility with applications compiled
against older headers at a time when the kernel had less state
information to store during a context switch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.          +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: SIGSTKSZ is now a run-time variable

2021-03-09 Thread shwaresyst via austin-group-l at The Open Group

To me that looks like a conformance violation and should be reverted. There is 
no _SC_SIGSTKSZ defined in  by the standard, to begin with, so that 
use of sysconf() is a non-portable extension on its own.

I could see the definition of SIGSTKSZ being changed to the static minimum a 
particular processor requires, or is initially allocated as a 'safe' amount, 
rather than static "default size", and moving SIGSTKSZ to . This 
would contrast to MINSIGSTKSZ as the lowest value for a platform for all 
supported processors. Then an application could use sysconf() to query for the 
maximum size the configuration supports if it wants to use more than that, as a 
runtime increasable limit.
On Tuesday, March 9, 2021 Eric Blake via austin-group-l at The Open Group 
 wrote:
[adding glibc and Austin group lists]

On 3/6/21 12:50 PM, Bruno Haible wrote:
> Hi,
> 
> Carol Bouchard wrote in 
> :
>> A change that was introduced is the
>> #define SIGSTKSZ is no longer a statically defined variable.  It's value can
>> only be determined at run time.
>>
>> # define SIGSTKSZ sysconf (_SC_SIGSTKSZ)
> 
> This is invalid. POSIX:2018 [1] defines two lists of macros:
> 
>  1) "The  header shall define the following macros which shall
>      expand to integer constant expressions that need not be usable in
>      #if preprocessing directives:"
> 
>  2) "The  header shall also define the following symbolic 
>constants:"
> 
> SIGSTKSZ is in the second list. This implies that it must expand to a constant
> and that it must be usable in #if preprocessing directives.

The question becomes whether glibc is in violation of POSIX for having
made the change, or whether POSIX needs to be amended to allow SIGSTKSZ
to be non-preprocessor-safe and/or non-constant.

> 
> Besides being invalid, it is also not needed. The alternate signal stack
> needs to be dimensioned according to the CPU and ABI that is in use. For 
> example,
> SPARC processors tend to use much more stack space than x86 per function
> invocation. Similarly, 64-bit execution on a bi-arch CPU tends to use more 
> stack
> space than 32-bit execution, because return addresses and other pointers are
> 64-bit vs. 32-bit large. But once you have fixed the CPU and the ABI, there is
> no ambiguity any more.
> 
>> This affects m4 code since the code assumes a statically defined variable 
>> which
>> can be determined at preprocessor time.
> 
> POSIX guarantees this assumption.
> 
>> Please advise how I can get past this.
> 
> Fix your .

https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=6c57d320484988e87e446e2e60ce42816bf51d53
shows where glibc made the change, and I've now seen reports of several
projects failing to build when using glibc with this change included.

> 
> Bruno
> 
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.          +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



Re: [1003.1(2016/18)/Issue7+TC2 0001454]: Conflict between "case" description and grammar

2021-02-19 Thread shwaresyst via austin-group-l at The Open Group

At that point in the grammar TOKEN is "esac)" or "(esac)", from which the WORD 
"esac" is extracted, not converted to Esac, as right paren is not an operator 
character that terminates token recognition. Rule 4 applies to "esac ;" or 
"esac" linebreak, no right paren discovered on lookahead, that I see. Same with 
the '|' char, it does not terminate the TOKEN. It could be more explicit that 
the pattern production is subcontext delimited by the ')', I suppose.
On Friday, February 19, 2021 Chet Ramey via austin-group-l at The Open Group 
 wrote:
On 2/19/21 11:21 AM, Geoff Clare via austin-group-l at The Open Group wrote:

>> There is no way to apply rule 4 to produce "a token identifier acceptable at
>> that point in the grammar". The only token identifier acceptable at that
>> point in the grammar is WORD, and rule 4 does not produce WORD. Rule 4
>> reads:
>>
>>    When the TOKEN is exactly the reserved word esac, the token identifier
>>    for esac shall result. Otherwise, the token WORD shall be returned.
>>
>> Here, the TOKEN is exactly the reserved word esac, and you agree that this
>> rule is applied. This therefore produces the token identifier for esac.
>> There is nothing else that turns it into WORD, which is needed to parse it
>> as a pattern.
> 
> I see your point.  The wording of rule 4 itself does not yield WORD in
> this case; it's only when read in combination with the introductory text
> from 2.10.1 that it becomes apparent that this is the intention.

So "acceptable at that point in the grammar" is indeed carrying a heavy
load here. You might want to add the qualifying language you suggested.


> Incidentally, bash 3 on macOS gets the '|' case wrong, e.g.:
> 
> case esac in foo|esac) echo match;; esac
> 
> whereas bash5 accept that.  So it would appear that Chet fixed the
> preceded-by-'|' case at some point but not the preceded-by-'(' case.

It's just another special case in the grammar that lexical analysis
has to handle.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
        ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/



RE: clarification needed: shell 'exec' + function (builtin, …)

2020-12-09 Thread shwaresyst via austin-group-l at The Open Group

I agree more clarification is desirable. The reason I see as why the function 
isn't executed is it may be treating it as an invoke of "sh -c ls", because ls 
is a function, but this new sh does not inherit that definition so it looks on 
path instead and finds the utility.
On Wednesday, December 9, 2020 Thorsten Glaser via austin-group-l at The Open 
Group  wrote:
Hi *,

I’ve got a report in IRC by a user who spotted a cross-shell difference.

In my opinion, the invocation…

    sh -c 'ls() { echo meow; }; exec ls'

… is supposed to output "meow\n and return to the caller with a zero
errorlevel.

Some shells execve() the ls(1) binary instead.
In particular, this was ksh88 behaviour, according to the comments
found in the pdksh-originating mksh source code.

My reading of this is:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#exec

⇒ exec is specified with 'command'
⇒ it will replace the shell with 'command' and never return to the shell

(note this does NOT mandate an actual execve(2) syscall or something)

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09

  A command is one of the following:
    * Simple command (see [134]Simple Commands)
    * Pipeline (see [135]Pipelines)
    * List compound-list (see [136]Lists)
    * Compound command (see [137]Compound Commands)
    * Function definition (see [138]Function Definition Command)

In the subsequent section 2.9.1 Simple Commands, Command Search and Execution,
step 1.c. finds the function.

Therefore, I believe that exec shall invoke the function, then terminate
the shell with the function’s $? as exit status.

(For builtins, 1.a. and 1.d. and 1.e.i.a. will find them.)

Thanks in advance,
//mirabilos
-- 
(gnutls can also be used, but if you are compiling lynx for your own use,
there is no reason to consider using that package)
    -- Thomas E. Dickey on the Lynx mailing list, about OpenSSL



Re: [1003.1(2016/18)/Issue7+TC2 0001346]: Require support for CLOCK_MONOTONIC

2020-12-03 Thread shwaresyst via austin-group-l at The Open Group

It's my understanding the practice predates Issue 6 (I just used that as 
example) and stems from a desire to not break code similar to:
#include 
#if defined(POSIX_OPT) && POSIX_OPT == _POSIX_VERSION
... Add code that presumes option availability ...
#endif

or at runtime:
#ifdef POSIX_OPT
if sysconf(_SC_POSIX_OPT) == _POSIX_VERSION {
... Use code that takes advantage of option ...
}
else 
#endif
{ ... Use code that doesn't or checks for earlier definition in platform 
defined manner... }

as the standard leaves fairly unspecified how a vendor is to support multiple 
versions of the standard with one runtime and set of headers.
On Thursday, December 3, 2020 Robert Elz  wrote:
    Date:        Thu, 3 Dec 2020 18:11:51 + (UTC)
    From:        shwaresyst 
    Message-ID:  <684426419.4103424.1607019111...@mail.yahoo.com>

  | The 20yymmL shall be replaced with the value specific to Issue 8 when that
  | is finalized, not that an implementation may choose an arbitrary value
  | after 2000. It's a placeholder to indicate this for the bug report only.

Yes, that's what I assumed, and said in my message:

austin-group-l@opengroup.org (that was me...) said:
  | (I read the latter as meaning that it will become the actual date of the
  | standard, not yet known).

Back to quote from shwares...@aol.com:

  | The other 200809L values all get a blanket change eventually too,

If that is the standard procedure, then sorry, but that's insane.

  | consistent with the changes from Issue 6 to Issue 7.

If the reason that NetBSD has 200112L and the standard (Issue 7) now
requries 200809L, is solely that (ie: there were no other changes to
the CLOCK_MONOTONIC specification between whichever version 200112L
identifies, and Issue 7) then that's a defect in the standard, and
should be fixed.

Making arbitrary changes that render all implementations non-conforming
and break applications that relied upon the earlier specification is
totally bizarre behaviour.

kre



Re: [1003.1(2016/18)/Issue7+TC2 0001346]: Require support for CLOCK_MONOTONIC

2020-12-03 Thread shwaresyst via austin-group-l at The Open Group

The 20yymmL shall be replaced with the value specific to Issue 8 when that is 
finalized, not that an implementation may choose an arbitrary value after 2000. 
It's a placeholder to indicate this for the bug report only. The other 200809L 
values all get a blanket change eventually too, consistent with the changes 
from Issue 6 to Issue 7.
On Thursday, December 3, 2020 Robert Elz via austin-group-l at The Open Group 
 wrote:
    Date:        Thu, 3 Dec 2020 17:21:47 +
    From:        "Austin Group Bug Tracker via austin-group-l at The Open 
Group" 
    Message-ID:  

  | A NOTE has been added to this issue

The issue is now closed, so I cannot append a new note [Aside:
adding proposed text, and immediately closing the bug report is not
a good way to operate - even if the issue is regarded as finalised,
there can be wording issues that are worthy of discussion].

So...


  | On page 436 lines 14851 - 14854,
  | change_POSIX_MONOTONIC_CLOCKThe implementation
  | supports the Monotonic Clock option. If this symbol is defined in
  | , it shall be defined to be��-1, 0, or 200809L. The value of
  | this symbol reported by sysconf( ) shall either be�-1 or 200809L.
  | 
  | to_POSIX_MONOTONIC_CLOCKThe implementation
  | supports a monotonic clock. This symbol shall always be set
  | to the value 20yymmL.
  | and remove the [MON] shading.

Why the change from 200809L to 20yymmL ?  (I read the latter as meaning
that it will become the actual date of the standard, not yet known).

As best I can see, for implementations that already support the (previously
optional) CLOCK_MONOTONIC nothing changes - except that they will apparently
be required to alter the definition of _POSIX_MONOTONIC_CLOCK.  Why?

What's more, applications which believed the previous text, and actually
test for 200809L will no longer find it, even though nothing else changed.

To me that makes no sense.

In NetBSD, we have:
    #define      _POSIX_MONOTONIC_CLOCK          200112L
which seems to indicate that we support some older version of the
standard - but I haven't looked to see whether there are actual
changes of substance between that version and the 200809L version.

In general, the values of these "This is supported" constants should
only ever change if there is a feature difference between one version
and the next (then the different values can be used to determine what
support is to be expected - though that's a very crude mechanism).

kre




Re: make(1) parallelization, but especially .WAITing

2020-11-03 Thread shwaresyst via austin-group-l at The Open Group

I agree that's the probable intent, but like other undefined things, what isn't 
precluded is a spot where a conformance distinction can't be drawn. There how 
the identifier ends isn't specified, it's left implied implementors will only 
use  after the prefix that is specified.
On Tuesday, November 3, 2020 Paul Smith via austin-group-l at The Open Group 
 wrote:
On Mon, 2020-11-02 at 15:44 +, shwaresyst via austin-group-l at The
Open Group wrote:
> With that phrasing  is also reserved, since it
> is not " followed ONLY by uppercase". Using ".NO_parallel"
> would be similarly conforming, it could be argued.

I don't agree.  By saying "names consisting of" the standard requires
that the entire name must consist of those characters, not just the
first part of the name.

> (The last sentence before the "Macros" heading says "Targets with
> names consisting of a leading  followed by one or more
> uppercase letters are reserved for implementation extensions."




Re: make(1) parallelization, but especially .WAITing

2020-11-02 Thread shwaresyst via austin-group-l at The Open Group

With that phrasing  is also reserved, since it is not 
" followed ONLY by uppercase". Using ".NO_parallel" would be similarly 
conforming, it could be argued.
On Monday, November 2, 2020 Geoff Clare via austin-group-l at The Open Group 
 wrote:
Joerg Schilling wrote, on 31 Oct 2020:
>
> Well this is true. As long as POSIX does not mention parallel builds at all, 
> it makes no sense for .WAIT to appear in a POSIX standard - except as a 
> reserved special target.

It's already in the reserved namespace, so no need to reserve it
explicitly.  (The last sentence before the "Macros" heading says
"Targets with names consisting of a leading  followed by one
or more uppercase letters are reserved for implementation extensions."

> Now it would be nice to have support for .NO_PARALLEL:  and for 
[...]

That name isn't reserved, because it has an underscore.

However, SunPro make seems to have several special targets with an
underscore, so it's possible underscore was left out of the reserved
name space by mistake.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: printf (the utility) expected range of integer values

2020-10-24 Thread shwaresyst via austin-group-l at The Open Group

Could an implementor represent integers as an internal
form with 0 bits (in which the only value that doesn't overflow is 0)
and hence always print 0 for any %d (%u/%x/%d) conversion, with an error
message about overflow for any value with any bits set?

No, the standard requires the internal representation to be two's complement 
for conforming applications; other internal format use is considered 
unspecified behavior. While a utility may support other formats, it is implicit 
by default they support two's complement also for interaction with those 
applications. This ties into the ranges that can be expected to be output are 
between the *_MAX and *_MIN values from the  used to compile the 
utility, and supposedly the implementation as a whole. If something to this 
effect really needs to be added it would go in XBD 2 as an implementation 
conformance requirement, I'd think. 

The last value to be output on error, nominally, is the one before a multiply 
by 10 or add of next digit causes the overflow, is how I'd construe it. For a 
short %d, I'd expect "32769" to output "3276", as the most digits capable of 
fitting in a 16 bit 2's comp. internal format as an actual value.
On Saturday, October 24, 2020 Robert Elz  wrote:
    Date:        Sat, 24 Oct 2020 16:47:41 + (UTC)
    From:        shwaresyst 
    Message-ID:  <160402159.2963847.1603558061...@mail.yahoo.com>

  | The text relevant to all this I see is the paragraph at line 104150, page 3=
  | 114, c181.pdf,

That is the text I quoted in the previous message (I got it from 202x d1.1
but that's irrelevant, the page & line numbers have changed, but the words
are the same).  For reference, here it is again:

    If an argument operand cannot be completely converted into an internal
    value appropriate to the corresponding conversion specification, a
    diagnostic message shall be written to standard error and the utility
    shall not exit with a zero exit status, but shall continue processing
    any remaining operands and shall write the value accumulated at the
    time the error was detected to standard output.

  | which limits outputs to the internal representation range of
  | the format characters used, converted back to text.

Yes.  But what does that actually mean to someone who wants to use
printf (the utility) and wants to be sure it will be able to print the
numbers needed?  Could an implementor represent integers as an internal
form with 0 bits (in which the only value that doesn't overflow is 0)
and hence always print 0 for any %d (%u/%x/%d) conversion, with an error
message about overflow for any value with any bits set?

If not, what text in the standard prohibits that?    We know it can't happen
for printf(3) (XSH.3.fprintf) as the minimum size of a C int (in POSIX)
is 32 bits.  But where is the required range of printf(1) (XCU.3.printf)
integers stated?  Surely not nowhere?

  | This should probably be explicit that the conversion shall detect
  | overflows,

It is, particularly when combined with what is in the APPLICATION USAGE
section.  In c181 see page 3115, the paragraph that starts at line 104190:

    If an argument cannot be parsed correctly for the corresponding
    conversion specification, the printf utility is required to report
    an error. Thus, overflow and extraneous characters at the end
    of an argument being used for a numeric conversion shall be reported
    as errors.

This part isn't a problem, or an issue, this is quite clear (and, aside
from ksh93, which is obviously broken) is what everything I tested does.

Now back to the questions from the original mnessage, neither of which did
you even attempt to answer.

Where, if anywhere, is it started what range of integers is required to be
supported by printf the utility?  Or in other words, is there a smallest
value which is permitted to generate an overflow (for present purposes just
consider positive numbers, we can all easily extrapolate to negative when
appropriate.)  Further, and related, is there any value which is required
to be treated as overflow (perhaps related to something in  rather
than an absolute constant in the printf page)?  And if so, where is that
stated?

For this, remember that printf the utility has no length modifiers for the
numeric conversions (at least the integer ones, the floats aren't required
at all, so obviously nothing is there to distinguish float from double, etc).
That is, there is only one "kind" of integer that it is able to print, a
simple %d (or %u %x %o), there is no %ld %jd %zd %lld ...

And second, when an overflow does occur, and an error message is printed to
stderr (and the eventual exit status from printf when it completes is set to
something greater than 0) then, as required, printf is still required to
print a value for the conversion that overflowed.  What value should be
printed - the maximum that could be handled, which is the common result
(presumably because almost everyone is using strtoll() to 

RE: printf (the utility) expected range of integer values

2020-10-24 Thread shwaresyst via austin-group-l at The Open Group

The text relevant to all this I see is the paragraph at line 104150, page 3114, 
c181.pdf, which limits outputs to the internal representation range of the 
format characters used, converted back to text. This should probably be 
explicit that the conversion shall detect overflows, positive or negative, when 
converting input text, and to treat this as an error. While the C standard 
permits silent overflows in converting C source this makes the utility 
non-portable.
On Saturday, October 24, 2020 Robert Elz via austin-group-l at The Open Group 
 wrote:
Is there somewhere, anywhere, where it is possible to infer what
range of values printf (the utility, not the C library function)
is expected to handle?

I can find nothing in the XCU 3.printf page, nor in XBD 5 (and also
not in XBD 12, which would be another plausible place).  There doesn't
seem to be anything about integers at all in XBD 3.

XBD 14.limits.h gives the minimum allowed value for the maximum value
of an integer (2^31 - 1) (ie: requires at least 32 bit int), but I can
find nothing that says explicitly that that applies to printf the utility.

Is there some expected minimum integer size for printf (the utility)
that is actually specified somewhere?

Further, since printf (the utility) is really just converting text
strings from one format to another, there's really no reason that there
needs to be any limit at all - there's no particular reason that integers
thousands of digits long couldn't be handled.  The standard does say that
if overflow occurs, an error message, and non-zero exit status, must
occur, but it doesn't ever say that overflow must occur.

Second question - if overflow does occur (at whatever point) what is the
value that must be printed (in addition to the error message) from a
numeric conversion.

Given a printf that uses 64 bit integers (which seems to be a very common
choice) then what should be printed from

    printf '%d\n' 0xc000

?

(This is the example that made me think about all of this - we (NetBSD)
have been offered a patch to make the error message go away, and the
result be:
    -70368744177664
That is, treating the value as a bit pattern for the 64 bits, which then
has the sign bit set, and so prints as a negative value.

We will not be doing that.

But what should we print?  (In addition to the error).

Every shell I tested (with 2 exceptions) does:

printf '%d\n' 0xc000
-bash: printf: warning: 0xc000: Result too large or too small
9223372036854775807

That one, obviously, is from bash.  Note that the "every shell" for this
is not all that meaningful, many don't have printf built in, and so are
simply running the NetBSD filesystem printf utility .. so it isn't then
surprising that they all do the exact same thing as that does!  But it
is obvious that at least the NetBSD sh, bash, bosh, zsh, and ksh93 have
a builtin printf (the error messages differ...)

But that value might not be what the standard calls for (even though it
is what almost everyone does), what the standard says is:

    If an argument operand cannot be completely converted into an internal
    value appropriate to the corresponding conversion specification, a
    diagnostic message shall be written to standard error and the utility
    shall not exit with a zero exit status, but shall continue processing
    any remaining operands and shall write the value accumulated at the
    time the error was detected to standard output.

The question is, what is "the value accumulated at the time the error was
detected".

What zsh does is:

    zsh $ printf '%d\n' 0xc000
    zsh: number truncated after 15 digits: c000
    1152917106560335872

which makes some sense to me, I had been thinking this might be the
correct value, before I started testing to see what was produced.
That is, after the first 15 hex digits are consumed, that is the value
(0xc00 in decimal) and then when an attempt is made to
add one more zero, we detect the overflow, and so the value that had
been accumulated when the overflow was detected was 1152917106560335872
(when printed via %d).

The value "everybody" else prints, 9223372036854775807, is simply 2^63-1
(the max possible value) which most likely was never actually encountered
during the conversion, but is just what strtoll() returns as its value.

kre

ps: the other shell which didn't produce 9223372036854775807 was ksh93,
which actually does
    ksh93 $ printf '%d\n' 0xc000
    -70368744177664
Sad that.  Good thing that we don't use ksh as the basis of the standard!




RE: Overflow conditions for read() and fread() (was: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function)

2020-10-07 Thread shwaresyst via austin-group-l at The Open Group

The C standard leaves it undefined for fread() because it doesn't require 
EOVERFLOW in , that I see, or presumes size_t will always be a short 
or int type. Since POSIX does have it and does not presume a limited width I 
feel this is a place where a CX extension is warranted as a portability 
consideration.
On Wednesday, October 7, 2020 Geoff Clare via austin-group-l at The Open Group 
 wrote:
> -- 
>  (0005036) shware_systems (reporter) - 2020-10-07 14:28
>  https://austingroupbugs.net/view.php?id=697#c5036 
> -- 
> That is an error in read(), and fread() as well; that these should have
> that case also as a may fail type.

The above was in reply to my note about posix_getdents() EOVERFLOW
that said:

    This set me thinking about why that part of the EOVERFLOW error is
    there at all. There is no equivalent EOVERFLOW for read(), nor
    should there be.

I continue to believe that for read() there should not be an EOVERFLOW
error.  There is absolutely no reason for read() to fail when it could
instead successfully return SSIZE_MAX bytes.  Perhaps we should add a
statement:

    If nbyte is great than SSIZE_MAX, read() shall
    behave as if nbyte had the value SSIZE_MAX.

For fread(), the return type is size_t not ssize_t, so it doesn't
have quite the same problem. The question is what should happen if
the mathematical product of the size and nitems arguments is greater
than SIZE_MAX.  POSIX defers to the C standard on this and there is no
reason for us to state anything specific about it.  (The C standard
is silent on the matter, so the behaviour is implicitly undefined.)

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



RE: [1003.1(2016/18)/Issue7+TC2 0001406]: clarification of SEEK_END when current pointer doesn't match buffer size

2020-09-28 Thread shwaresyst via austin-group-l at The Open Group

As I read it, file size and *seek(SEEK_END, 0) will still be 16, reflecting how 
many bytes were written to the buffer and which had to be malloc'd. The rewind 
overwrites the first bytes and a flush, close reflects the size of data 
considered to be valid after the rewind, since there is no guarantee such a 
write maintained alignment with whatever data was written to expand it to the 
16 bytes. Maybe it was 2 doubles, for example, and the rewrite trashes the 
first half of the original value. It is the application's responsibility to do 
a SEEK_END after such a rewrite if it knows it is simply modifying a same size 
and type value, so flush and close include the rest of the data area.
On Monday, September 28, 2020 Austin Group Bug Tracker via austin-group-l at 
The Open Group  wrote:

The following issue has been SUBMITTED. 
== 
https://www.austingroupbugs.net/view.php?id=1406 
== 
Reported By:                djdelorie
Assigned To:                
== 
Project:                    1003.1(2016/18)/Issue7+TC2
Issue ID:                  1406
Category:                  Base Definitions and Headers
Type:                      Clarification Requested
Severity:                  Editorial
Priority:                  normal
Status:                    New
Name:                      DJ Delorie 
Organization:              Red Hat Inc 
User Reference:              
Section:                    open_memstream 
Page Number:              
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html 
Line Number:                n/a 
Interp Status:              --- 
Final Accepted Text:        
== 
Date Submitted:            2020-09-28 21:26 UTC
Last Modified:              2020-09-28 21:26 UTC
== 
Summary:                    clarification of SEEK_END when current pointer
doesn't match buffer size
Description: 
Consider a stream created by open_memstream(), where 16 bytes are written,
fseek(0,SEEK_POS) to rewind, then write 4 bytes, and fflush().  At this
point, the value pointed to by the sizep argument to open_memstream()
should be 4 (please confirm).
At this point in the state of the stream, what are the semantics of
SEEK_END?  What will be the "file size" if you fclose() at this point?
The example explicitly SEEK_SETs to the buffer size before fclose(),
eliding the issue.
Desired Action: 
Please clarify if SEEK_END is relative to the current position or the
current buffer length, and if it's changed by a call to fflush() at that
time.
Please clarify if a SEEK_SET to set the current pointer less than the
current buffer size, itself (without read/write), changes the SEEK_END
semantics, or the value stored in *sizep after fflush().

== 

Issue History 
Date Modified    Username      Field                    Change              
== 
2020-09-28 21:26 djdelorie      New Issue                                    
2020-09-28 21:26 djdelorie      Name                      => DJ Delorie      
2020-09-28 21:26 djdelorie      Organization              => Red Hat Inc    
2020-09-28 21:26 djdelorie      Section                  => open_memstream  
2020-09-28 21:26 djdelorie      Page Number              =>
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html
2020-09-28 21:26 djdelorie      Line Number              => n/a            
==




Re: Proposal to update reference to POSIX in the ISO C++ standard

2020-09-28 Thread shwaresyst via austin-group-l at The Open Group

It's my understanding ISO/IEC was to bump their distribution also, to keep in 
synch. Nick S. would be more conversant with the details of thay, though.
On Monday, September 28, 2020 Jonathan Wakely  wrote:
On 28/09/20 14:36 +, shwaresyst wrote:
>
>The 2018 edition is the latest ISO/IEC/IEEE version, in that it was balloted 
>and approved to keep the IEEE "current standard" clock from timing out. The 
>2008 edition plus TCs is now the prior version, in the formal sense.

Is that not in the ISO store?

I don't see an update to https://www.iso.org/standard/50516.html
except for the corrigenda.



Re: Proposal to update reference to POSIX in the ISO C++ standard

2020-09-28 Thread shwaresyst via austin-group-l at The Open Group

The 2018 edition is the latest ISO/IEC/IEEE version, in that it was balloted 
and approved to keep the IEEE "current standard" clock from timing out. The 
2008 edition plus TCs is now the prior version, in the formal sense.
On Thursday, September 24, 2020 Jonathan Wakely via austin-group-l at The Open 
Group  wrote:
On 24/09/20 08:23 -0700, Nick Stoughton wrote:
>ISO/IEC 9945:2009 including Corrigenda 1 (2013) and Corrigenda 2
>(2017) is the current latest approved ISO standard. The Austin Group
>is in the process of revising this, with a publication date in 2022
>expected. You state "Since the TCs are just lists of changes, not a
>complete document, ..." which is technically true for ballot purposes,
>but The Open Group and IEEE publish a fully amended version, and this
>is what most people see when they try to obtain a copy of the latest.

Yes, I use the 2018 version from the Open Group for my own purposes,
but as C++ is an ISO/IEC standard I believe we're supposed to refer to
the ISO/IEC/IEEE version of POSIX, which means 9945:2008 rather than
the fully amended documents available elsewhere.

But we could add the two TCs to the references as well. The C++14
standard referred to C that way:

— ISO/IEC 9899:1999/Cor.1:2001(E), Programming languages — C, Technical 
Corrigendum 1
— ISO/IEC 9899:1999/Cor.2:2004(E), Programming languages — C, Technical 
Corrigendum 2
— ISO/IEC 9899:1999/Cor.3:2007(E), Programming languages — C, Technical 
Corrigendum 3

So I'll propose changing the current reference to:

ISO/IEC/IEEE 9945:2009, Information Technology — Portable Operating System 
Interface (POSIX)
ISO/IEC/IEEE 9945:2009/Cor 1:2013, Information Technology — Portable Operating 
System Interface (POSIX), Technical Corrigendum 1
ISO/IEC/IEEE 9945:2009/Cor 2:2017, Information Technology — Portable Operating 
System Interface (POSIX), Technical Corrigendum 2

Thanks!


>-- 
>Nick
>
>On Thu, Sep 24, 2020 at 7:42 AM Jonathan Wakely via austin-group-l at
>The Open Group  wrote:
>>
>> On 24/09/20 15:28 +0100, Jonathan Wakely via austin-group-l at The Open 
>> Group wrote:
>> >Hello,
>> >
>> >I am writing a proposal for the ISO C++ standard committee (WG21) to
>> >update the reference to the POSIX standard in the C++ International
>> >Standard. My colleague Eric Blake suggested I ask on this list whether
>> >anybody here has any comments on the proposal.
>> >
>> >The draft is at https://kayari.org/tmp/posix.html
>> >
>> >The abstract is:
>> >
>> >  The C++ standard has a normative reference to ISO/IEC 9945:2003 (aka
>> >  POSIX.1-2001 aka The Single UNIX Specification, version 3). However,
>> >  the C++ standard library refers to POSIX functions and macros which
>> >  are not defined in that document, as they weren't added until ÂÂ
>> >  ISO/IEC/IEEE 9945:2009 (aka POSIX.1-2008 aka SUSv4). The C++
>> >  standard should update its reference.
>>
>> Ugh, sorry for the borked indentation.
>>
>> >If you see any errors or incorrect claims from an Austin Group
>> >perspective, I'd be very grateful for your feedback.
>> >
>> >Thanks in advance to anybody who makes time to read through it,
>> >Jonathan
>> >
>>
>



Re: behaviour of pthread_attr_[sg]etguardsize with thread maintained stack

2020-09-22 Thread shwaresyst via austin-group-l at The Open Group

It will not be used by the implementation in managing the thread, and a 
guardsize value might not even be stored in the thread_t data if setstack() has 
been called as there is no pthread_getguardsize() interface; it is just stored 
in the attribute then for possible, not required, use by the application.
On Tuesday, September 22, 2020 Robert Elz  wrote:
    Date:        Tue, 22 Sep 2020 14:38:07 + (UTC)
    From:        shwaresyst 
    Message-ID:  <32911555.5186984.1600785487...@mail.yahoo.com>

  | Yes, it is no longer a factor,

I would have guessed that is what "not used" means, but:

  | and no, it will return what last setting was, be it from init()
  | or a setguardsize()

How is that "not used" ?

  | A set only affects that one attr object, not all of them,

Not the issue.

kre




Re: behaviour of pthread_attr_[sg]etguardsize with thread maintained stack

2020-09-22 Thread shwaresyst via austin-group-l at The Open Group

Does that include calculating the amount of available stack space,
and or the return value of a later getguardsize() ?

Yes, it is no longer a factor, but may be a value the application code uses to 
simulate what the implementation does with memory it manages; and no, it will 
return what last setting was, be it from init() or a setguardsize() call. A set 
only affects that one attr object, not all of them, or any thread the attr was 
used to initialize. The standard expects all relevant attr values to be copied 
into the thread_t or sigev structure being initialized, not store only a 
pointer to the attr object.
On Tuesday, September 22, 2020 Robert Elz  wrote:
    Date:        Tue, 22 Sep 2020 11:05:05 + (UTC)
    From:        "shwaresyst via austin-group-l at The Open Group" 

    Message-ID:  <1248402378.5117076.1600772705...@mail.yahoo.com>


  | Once pthread_attr_init() successfully completes the guardsize should be
  | set to the default value and may be examined by pthread_attr_getguardsize(),
  | that I see.

Fine.  Not the issue.

  | A call to setguardsize() should store the value and be returned
  | by subsequent getguardsize() calls,

Fine, still not the issue.  Again, except as  workaround to what might
be a NetBSD bug (or might just be unspecified behaviour), nothing is calling
setguardsize();

  | even though it is not used after pthread_attr_setstack() is called.

That is closer to the issue.  What does "not used" mean here?

Does that include calculating the amount of available stack space,
and or the return value of a later getguardsize() ?

  | Once setstack() is called the standard provides only
  | pthead_attr_destroy() followed by an init() as the portable means of
  | reenabling the use of the default guardsize.

Not the issue.

  | It is left unspecified, not even directly mentioned, that an implementation
  | may provide a special stackaddr value for use with setstack() that says
  | next time allocate an arbitrary stack area that does take the current
  | guardsize, and stacksize if that was set, into account

I don't much like "not even directly mentioned" - though that scenario
is perhaps so far outside what might be expected of an implementation that
it is a reasonable thing to have omitted.

But that's not the issue.

  | It is left implied by a getstack() before any setstack() being unspecified
  | behavior as to result;

Also not the issue, but again "left implied" is not nice.

kre



RE: behaviour of pthread_attr_[sg]etguardsize with thread maintained stack

2020-09-22 Thread shwaresyst via austin-group-l at The Open Group

Once pthread_attr_init() successfully completes the guardsize should be set to 
the default value and may be examined by pthread_attr_getguardsize(), that I 
see. A call to setguardsize() should store the value and be returned by 
subsequent getguardsize() calls, even though it is not used after 
pthread_attr_setstack() is called. Once setstack() is called the standard 
provides only pthead_attr_destroy() followed by an init() as the portable means 
of reenabling the use of the default guardsize.

It is left unspecified, not even directly mentioned, that an implementation may 
provide a special stackaddr value for use with setstack() that says next time 
allocate an arbitrary stack area that does take the current guardsize, and 
stacksize if that was set, into account without needing to call destroy(). It 
is left implied by a getstack() before any setstack() being unspecified 
behavior as to result; an implementation using such a value would be expected 
to set stackaddr to it during an init() call as the default value, and which 
getstack() would then succeed in returning.
On Tuesday, September 22, 2020 Robert Elz via austin-group-l at The Open Group 
 wrote:
Note this is forwarding a NetBSD query ... I claim no knowledge
about any of this ...  but I can relay any replies (and I have included
Thomas Klausner in the Reply-To so he doesn't need to wait for me, and/or
so you can ask him for more details if needed ... I'm not sure if this list
allows contributions from non-subscribers though).

In XSH/pthread_attr_getguardsize it is stated:

    If the stackaddr attribute has been set (that is, the caller is
    allocating and managing its own thread stacks), the guardsize attribute
    shall be ignored and no protection shall be provided by the
    implementation. It is the responsibility of the application to
    manage stack overflow along with stack allocation and management
    in this case.

In the 202x Draft 1 version that is on page 1494, lines 49730-3 but this
hasn't changed from the current published std, in TC2 it is page 1568
lines 51425-8.

The question (I think) is when an application uses a user-provided stack,
should the guard size (default, or that set by pthread_attr_setguardsize(),
get used by the implementation for anything at all, including when
pthread_attr_getguardsize() is called.

In case I don't have the scenario quite right, you can see the original
(currently quite brief) discussion at:
    https://mail-index.netbsd.org/current-users/2020/09/21/msg039578.html

At the very least the standard doesn't appear to say anything about what
should be returned by pthread_attr_getguardsize() when an application has
set the stackaddr attribute.

Does anyone know what is intended to happen here?

kre



RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread shwaresyst via austin-group-l at The Open Group

No, it does not need to be aligned to a multiple of 4, except on some lame RISC 
architectures. The logical model is unaligned accesses are always permitted; 
aligned accesses are the exception, not the rule. This is why the language is 
padding bytes may be added, not shall be added. The standard expects 
applications to use int_fastN_t or int_leastN_t types if it wants to take 
advantage of platform specific alignment optimizations. The allocation 
functions only recently added the only alignment requirement, namely any 
pointer returned be aligned for an access to an intmax_t value, and the region 
be minimally sizeof(intmax_t) in length.
On Wednesday, September 2, 2020 Wojtek Lerch  wrote:
#yiv9121566835 #yiv9121566835 -- _filtered {} _filtered {} _filtered 
{}#yiv9121566835 #yiv9121566835 p.yiv9121566835MsoNormal, #yiv9121566835 
li.yiv9121566835MsoNormal, #yiv9121566835 div.yiv9121566835MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 a:link, #yiv9121566835 span.yiv9121566835MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv9121566835 
p.yiv9121566835MsoPlainText, #yiv9121566835 li.yiv9121566835MsoPlainText, 
#yiv9121566835 div.yiv9121566835MsoPlainText 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonormal, #yiv9121566835 li.yiv9121566835msonormal, 
#yiv9121566835 div.yiv9121566835msonormal 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonospacing1, #yiv9121566835 li.yiv9121566835msonospacing1, 
#yiv9121566835 div.yiv9121566835msonospacing1 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonormal4, #yiv9121566835 li.yiv9121566835msonormal4, 
#yiv9121566835 div.yiv9121566835msonormal4 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 p.yiv9121566835msonormal31, #yiv9121566835 li.yiv9121566835msonormal31, 
#yiv9121566835 div.yiv9121566835msonormal31 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv9121566835
 span.yiv9121566835EmailStyle36 {font-family:New 
serif;color:windowtext;}#yiv9121566835 span.yiv9121566835PlainTextChar 
{font-family:sans-serif;}#yiv9121566835 .yiv9121566835MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv9121566835 div.yiv9121566835WordSection1 
{}#yiv9121566835 
Yes I made the flexible member a "short" on purpose -- I wanted that byte of 
padding before the flexible array.
 
  
 
No, the sizeof can't be 5 or 6 unless the implementation is okay with unaligned 
access.  If I declare an array of these structs, the int32 inside each element 
needs to be aligned to a multiple of 4 -- therefore the size of the struct must 
be a multiple of 4 as well.  The same applies to a struct without a flexible 
member.
 
  
 
No, the requirements on sizeof have nothing to do with how many flex members 
are "present".  All that is required is that the sizeof is either the same as 
it would be for a struct without the flexible member (which is still 8, on any 
implementation that requires alignment), or greater, if the struct requires 
more padding (presumably also for alignment).  Apart from that, the C standard 
says nothing about whether there's enough room between the offsetof and the 
sizeof for one or more elements of the flexible array.
 
  
 
What you described with malloc() has nothing to do with what the C standard 
refers to as “padding”.
 
  
 
Also, while I understand the need to page-align data structures in some 
situations, I still don’t see its relevance to a discussion of the C standard’s 
requirements regarding padding in struct types and how it’s affected by 
flexible arrays.
 
  
 
From: shwaresyst  
Sent: September 2, 2020 1:58 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 
That example still has a byte of added padding, or the offsetof would be 5. The 
sizeof value is just incorrect, as it assumes one flex member is present. It 
should be 5 or 6, and which is the required value is what is ambiguous.
 
As you say, these are used most often with malloc(). Padding after the array is 
usually an artifact of this operation. You do a malloc(12) and you may get 16 
or 32 bytes actually allocated. Mapping this as a short s[] an application can 
safely access s[5], but a compiler may not block an access to s[7] too, in that 
the memory for it is allocated. You map a long long l[] and you can only access 
l[0] safely, the remaining 4 bytes out of the 12 plus what malloc adds are tail 
padding, but a compiler may allow an l[1] access because the total allocated 
permits it.
 
I mentioned page aligned because when you are buffering multiple sectors 
directly from media the malloc()s for these will usually be in multiples of 
pages, and efficient management of these happens 

RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-02 Thread shwaresyst via austin-group-l at The Open Group

That example still has a byte of added padding, or the offsetof would be 5. The 
sizeof value is just incorrect, as it assumes one flex member is present. It 
should be 5 or 6, and which is the required value is what is ambiguous.


As you say, these are used most often with malloc(). Padding after the array is 
usually an artifact of this operation. You do a malloc(12) and you may get 16 
or 32 bytes actually allocated. Mapping this as a short s[] an application can 
safely access s[5], but a compiler may not block an access to s[7] too, in that 
the memory for it is allocated. You map a long long l[] and you can only access 
l[0] safely, the remaining 4 bytes out of the 12 plus what malloc adds are tail 
padding, but a compiler may allow an l[1] access because the total allocated 
permits it.

I mentioned page aligned because when you are buffering multiple sectors 
directly from media the malloc()s for these will usually be in multiples of 
pages, and efficient management of these happens when these don't straddle 
pages so are page aligned too. Such isn't required by the standard, but it's 
common enough as desirable aligned_alloc() was added. As I've seen no one use 
FLA as an acronym for flexible array, I consider VLA as applying to any array 
of indeterminate size, sorry if this confuses anyone.
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
#yiv4376059201 #yiv4376059201 -- _filtered {} _filtered {} _filtered 
{}#yiv4376059201 #yiv4376059201 p.yiv4376059201MsoNormal, #yiv4376059201 
li.yiv4376059201MsoNormal, #yiv4376059201 div.yiv4376059201MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 a:link, #yiv4376059201 span.yiv4376059201MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv4376059201 
p.yiv4376059201msonospacing, #yiv4376059201 li.yiv4376059201msonospacing, 
#yiv4376059201 div.yiv4376059201msonospacing 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 p.yiv4376059201msonormal, #yiv4376059201 li.yiv4376059201msonormal, 
#yiv4376059201 div.yiv4376059201msonormal 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 p.yiv4376059201msonormal3, #yiv4376059201 li.yiv4376059201msonormal3, 
#yiv4376059201 div.yiv4376059201msonormal3 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv4376059201
 span.yiv4376059201EmailStyle33 {font-family:New 
serif;color:windowtext;}#yiv4376059201 .yiv4376059201MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv4376059201 div.yiv4376059201WordSection1 
{}#yiv4376059201 
My understanding is that they meant to allow an implementation where  “struct a 
{ int32_t x; char y; short flex[]; }”  produces  sizeof(struct a)==8  but  
offsetof(struct a,flex)==6.
 
  
 
I don’t like that they talk about padding “after” the flexible member – since 
the flexible array has a flexible size, rather than a zero size, that padding 
really overlaps the beginning of the array.
 
  
 
Personally I think that the standard could be made clearer if a structure with 
a flexible member were considered an incomplete type.  You wouldn’t be allowed 
to applysizeof to it at all, and you wouldn’t be able to declare objects whose 
type is the structure, but you could still use pointers to it and dereference 
members – since the main purpose of such structures is to allocate them via 
malloc(), I don’t think anybody would mind those restrictions.
 
  
 
Also, I don’t understand whystruct s would need to be page aligned or why you 
mention a VLA.  A flexible array is not a VLA, in the sense C uses the term.
 
  
 
From: shwaresyst  
Sent: September 1, 2020 4:55 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 
What that refers to, it looks, is any tail padding for the structure as a 
whole. The standard still permits internal padding between individual fields as 
required, e.g. a struct s { short a; double b[] } might need 6 bytes of this 
padding to align access for b[0]. This would still be needed if b[] only has a 
few members as a VLA but s is being page aligned, and so would reserve a lot of 
tail padding too. There would be 2 padding regions, however, is what that 
change forces.
 
  
 
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
 
Actually the intent was the opposite.  The original C99 did contain a wording 
that matches your interpretation:
 
 
 
… the size of the structureshall be equal to the offset of the last element of 
an otherwise identical structure that replaces the flexible array member with 
an array of unspecified length.
 
 
 
But this was reported as a defect, and corrected in TC2.
 
 
 
Summary
 6.7.2.1 Structure and union specifiers, paragraphs 15 and 16 require that any 
padding for alignment of a structure containing a flexible array member must 
preceed the flexible array member.  This contradicts existing 

Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

No, that is not what I would want nor would anyone else. NAME_MAX doesn't 
guarantee no d_name will ever be longer than this value, what it says is all 
drivers for file systems provided by the implementation are capable of 
processing names up to that length. Some provided may support much longer names 
too, the standard leaves open. Because of this latter possibility no compile 
time constant guarantees EINVAL won't occur, that is suitable for use in a 
macro. Something that examines the media at runtime is required, which a macro 
might be an alias for, as a wrapper, but something still needs to be 
implemented to be wrapped.
On Tuesday, September 1, 2020 Steffen Nurpmeso  wrote:
shwaresyst wrote in
 <1739483391.1543785.1598977118...@mail.yahoo.com>:
 |No, it couldn't introduce such a macro, because such would have to \
 |assume all d_name entries are the same length. Adding an option to \

Well it has to go for NAME_MAX + the_size_of_posix_dent for each
and every entry, this is what you want here?  Except for what
Philip Guenther said, of course.  But if it would be left
implementation defined then even that could be covered by the
macro, better than by anything else.

I for one feel you are very brave to apply sizeof() to anything
with a "flexible array member", i would not dare that for portable
code.  (But my code has to work with ISO C89 too, so i have to use
macros to switch between [a-number] and [] as applicable, and also
to SIZEOF these types.)

Really, you are very brave!  Just the bugs i had to work around
since 2018 or what for a really tiny set of primitive tools!
(Like some gregarious animal not inlining for -Os, and another
huge one requiring explicit this-> to find superclass fields in
one class, but not the other.)

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

What that refers to, it looks, is any tail padding for the structure as a 
whole. The standard still permits internal padding between individual fields as 
required, e.g. a struct s { short a; double b[] } might need 6 bytes of this 
padding to align access for b[0]. This would still be needed if b[] only has a 
few members as a VLA but s is being page aligned, and so would reserve a lot of 
tail padding too. There would be 2 padding regions, however, is what that 
change forces.
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
#yiv7361582445 #yiv7361582445 -- _filtered {} _filtered {} _filtered 
{}#yiv7361582445 #yiv7361582445 p.yiv7361582445MsoNormal, #yiv7361582445 
li.yiv7361582445MsoNormal, #yiv7361582445 div.yiv7361582445MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv7361582445
 a:link, #yiv7361582445 span.yiv7361582445MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv7361582445 
p.yiv7361582445MsoNoSpacing, #yiv7361582445 li.yiv7361582445MsoNoSpacing, 
#yiv7361582445 div.yiv7361582445MsoNoSpacing 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv7361582445
 p.yiv7361582445msonormal, #yiv7361582445 li.yiv7361582445msonormal, 
#yiv7361582445 div.yiv7361582445msonormal 
{margin-right:0cm;margin-left:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv7361582445
 span.yiv7361582445EmailStyle27 {font-family:New 
serif;color:windowtext;}#yiv7361582445 .yiv7361582445MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv7361582445 div.yiv7361582445WordSection1 
{}#yiv7361582445 
Actually the intent was the opposite.  The original C99 did contain a wording 
that matches your interpretation:
 
  
 
… the size of the structureshall be equal to the offset of the last element of 
an otherwise identical structure that replaces the flexible array member with 
an array of unspecified length.
 
  
 
But this was reported as a defect, and corrected in TC2.
 
  
 
Summary
 6.7.2.1 Structure and union specifiers, paragraphs 15 and 16 require that any 
padding for alignment of a structure containing a flexible array member must 
preceed the flexible array member.  This contradicts existing implementations.  
We do not believe this was the intent of the C99 specification.
 
Details
 
If a struct contains a flexible array member and also requires padding for 
alignment, then the current C99 specification requires the implementation to 
put this paddingbefore the flexible array member.  However, existing 
implementations, including at least GNU C, Compaq C, and Sun C, put the 
paddingafter the flexible array member.
 
The layout used by existing implementations can be more efficient. Furthermore, 
requiring these existing implementations to change their layout would break 
binary backwards compatibility with previous versions.
 
  
 
See DR282 for more 
details:http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_282.htm
 
  
 
  
 
From: shwaresyst  
Sent: September 1, 2020 2:27 PM
To: Wojtek Lerch ; austin-group-l@opengroup.org
Subject: RE: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 
I agree some additional clarity might be useful there, in the C standard. I'm 
reading it as the intent being sizeof is equivalent to offsetof the VLA in 
accordance with the restrictions placed on it by use of the . or -> operators, 
which may not need extra bytes (so >vla == ( + sizeof(s)) is a truism, in 
other words) but it is not that specific.
 
  
 
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
 
That sounds a little backwards – it’severything else that works as if the 
flexible (not “variable”) member were not present.  The sizeof operator, as an 
exception, can return a greater value.  (The “.” and “->” operators are another 
exception.)
 
 
 
The standard does not sayhow much greater the value may be, or promise that it 
must be greater, even if padding is necessary to align the flexible member – as 
far as I can tell, sizeof(structure) can beless than offsetof(structure, 
flexible).
 
 
 
From: austin-group-l@opengroup.org 
Sent: September 1, 2020 10:52 AM
To: g...@opengroup.org; austin-group-l@opengroup.org
Subject: Re: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
 
 
It's my understanding, by C11 6.7.2.1p18, sizeof on a struct with a variable 
array works as if the variable member was not present, but does count any bytes 
added for alignment padding, as this will be a fixed amount for each use of the 
struct. It is up to the application, like with variable argument lists, to 
establish a protocol that allows it to determine the effective size of the 
final member.
 
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. 

RE: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

I agree some additional clarity might be useful there, in the C standard. I'm 
reading it as the intent being sizeof is equivalent to offsetof the VLA in 
accordance with the restrictions placed on it by use of the . or -> operators, 
which may not need extra bytes (so >vla == ( + sizeof(s)) is a truism, in 
other words) but it is not that specific.
On Tuesday, September 1, 2020 Wojtek Lerch  wrote:
#yiv0502119094 #yiv0502119094 -- _filtered {} _filtered {}#yiv0502119094 
#yiv0502119094 p.yiv0502119094MsoNormal, #yiv0502119094 
li.yiv0502119094MsoNormal, #yiv0502119094 div.yiv0502119094MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv0502119094
 span.yiv0502119094EmailStyle20 {font-family:New 
serif;color:windowtext;}#yiv0502119094 .yiv0502119094MsoChpDefault 
{font-size:10.0pt;} _filtered {}#yiv0502119094 div.yiv0502119094WordSection1 
{}#yiv0502119094 
That sounds a little backwards – it’severything else that works as if the 
flexible (not “variable”) member were not present.  The sizeof operator, as an 
exception, can return a greater value.  (The “.” and “->” operators are another 
exception.)
 
  
 
The standard does not sayhow much greater the value may be, or promise that it 
must be greater, even if padding is necessary to align the flexible member – as 
far as I can tell, sizeof(structure) can beless than offsetof(structure, 
flexible).
 

 
  
 
From: austin-group-l@opengroup.org 
Sent: September 1, 2020 10:52 AM
To: g...@opengroup.org; austin-group-l@opengroup.org
Subject: Re: [1003.1(2013)/Issue7+TC1 697]: Adding of a getdirentries() 
function
 
  
 

It's my understanding, by C11 6.7.2.1p18, sizeof on a struct with a variable 
array works as if the variable member was not present, but does count any bytes 
added for alignment padding, as this will be a fixed amount for each use of the 
struct. It is up to the application, like with variable argument lists, to 
establish a protocol that allows it to determine the effective size of the 
final member. This transmission (including any attachments) may contain 
confidential information, privileged material (including material protected by 
the solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.


Re: [1003.1(2013)/Issue7+TC1 0000697]: Adding of a getdirentries() function

2020-09-01 Thread shwaresyst via austin-group-l at The Open Group

No, it couldn't introduce such a macro, because such would have to assume all 
d_name entries are the same length. Adding an option to the interface to do a 
count, as a vararg parameter, and directly malloc the necessary space, returned 
via my suggested change to buf as a **, is plausible. Since we are merging 
common behaviors with this interface introduction, not describing a single 
reference implementation, such changes are permitted if someone commits to 
doing an implementation, afaik.
On Tuesday, September 1, 2020 Steffen Nurpmeso via austin-group-l at The Open 
Group  wrote:
Geoff Clare via austin-group-l at The Open Group wrote in
 <20200901143300.GB24606@localhost>:
 |> -- 
 |>  (0004953) philip-guenther (reporter) - 2020-08-28 22:52
 |>  https://www.austingroupbugs.net/view.php?id=697#c4953 
 |> -- 
 |> I think the unspecified nature of the d_name member in the new posix_dent
 |> makes writing portable software more difficult while providing only \
 |> minimal
 |> benefit to programs that don't care.  I would support requiring it \
 |> to be a
 |> flexible array member and thus eliminating the error of declaring \
 |> an array
 |> and trying to walk it via indexing instead of by advancing a char pointer
 |> by d_reclen.
 |
 |I think we should keep the requirements for d_name the same between
 |struct dirent and struct posix_dent.  Some implementations of
 |getdents() and getdirentries() use struct dirent and they should be
 |able to make posix_getdents() a synonym (or a light wrapper) for the
 |existing function by making struct posix_dent be identical to struct
 |dirent.  We can't require d_name in struct dirent to be a VLA since
 |there are implementations where it is not.

The standard could also introduce a macro which could be used to
space a buffer accordingly, something like (very ugly)
POSIX_GETDENTS_BYTES_FOR_DENTS(number-of-desired-dents), and use
it in the example.
Like that any possible errors with buffer space allocation would
not even be introduced (except for possible integer overflows,
maybe).

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter          he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: Pseudoterminal terminology in POSIX

2020-08-05 Thread shwaresyst via austin-group-l at The Open Group

The slave side is ancillary to the master, sorry, as physical terminals are 
ancillary to the processor hardware, imo. Inverting the relationship makes it 
look like it is the intent of the slave side to source the majority of the 
data, when more often it is only monitoring output data sourced by the master, 
or producer/processing, side with relatively infrequent input required to be 
sourced by the monitoring side. For a full duplex connection, the producer side 
is doing echoes of everything the monitoring side sources along with what it 
sources unilaterally, so it is primary user of the connection.
On Wednesday, August 5, 2020 Geoff Clare via austin-group-l at The Open Group 
 wrote:
Steffen Nurpmeso wrote, on 05 Aug 2020:
>
> Michael Kerrisk via austin-group-l at The Open Group wrote in
>  :
>  |Elliot Hughes and I both noticed a point from "Minutes of the 3rd August \
>  |2020
>  |Teleconference":
>  ..
>  |On Tue, Aug 4, 2020 at 5:52 PM Andrew Josey  wrote:
>  ...
>  |> * General news
>  |>
>  |> We discussed terminology usage, in particuler terms such as
>  |> master/slave, blacklist/whitelist.  It was agreed some terminology
>  |> for pseudo-terminals could be better described using more functionally
>  |> descriptive terms, but the details of this are left to a future bug
>  |> report.  Andrew and Geoff took an action to investigate further
>  |> and come back with an analysis.
>  ...
>  |The essence of the idea is simple. Let's not invent completely new
>  |terms, but rather rework existing (familiar) terminology a little, as
>  |follows:
>  |
>  |    pseudoterminal (device) ==> "pseudoterminal device pair"

I'm okay with that, but ...

>  |
>  |  slave ==> "terminal device"

many other things are also terminal devices, so this doesn't work unless ...

>  |          (or "terminal end of the pseudoterminal device pair")

you use this cumbersome phrasing every time you refer to it.

>  |
>  |    master ==> "pseudoterminal device"
>  |          (or "pseudoterminal end of the pseudoterminal device pair")

This makes no sense to me.  Given the phrase "pseudoterminal device pair",
I would naturally expect "pseudoterminal device" could be used to refer
to either of the individual devices in the pair, rather than one and not
the other.

> How about ancillary or accessory terminal device for the slave.

I think ancillary would actually be more applicable to the master.

> 
>  |The resulting language (as it appears in the proposed changes for the
>  |Linux manual pages) is reasonably clear, albeit a little clunky in
>  |places (wordings like "the (pseudo)terminal end of the pseudoterminal
>  |device pair" are clear, but a little verbose).
> 
> Yes.  It is terrible and absolutely unclear (to me).  And
> presumely i would become dazed if i would read an entire manual
> with the above terms.

I agree, it's too cumbersome.

My own thoughts up to now had been that, since the slave side is the
side that is intended to be used as a terminal in the normal way, the
slave should be called the "primary" device.  I hadn't come up with a
word for the master side, but Steffen's suggestion of "ancillary" works
quite well (I just saw a dictionary definition that said "providing
necessary support to the primary ...").

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England