net/if.h not connected to anything else in the spec

2020-12-02 Thread Schwarz, Konrad via austin-group-l at The Open Group
Hi,

after some (moderate) digging, it seems like the interfaces documented in 
net/if.h are not related to anything else defined in POSIX.

To recap, the routines declared there provide a mapping between network 
interface names and numeric indexes.  However, I couldn't find any other 
interface within POSIX that actually uses either network interface names or 
numeric interfaces.  This makes it kind of useless.

However, net/if.h does play a role in real Unix system, e.g., in discovering 
the broadcast address to use on a particular network.  Does anyone know why the 
more useful SIOCGIFCONF and related ioctl()s described in the Advanced 4.4 BSD 
IPC Tutorial were not included in POSIX?  Would this be something to consider 
for specification in a future revision?

--
Konrad Schwarz


RE: Should POSIX/SUS advise against MSG_OOB in recognition of RFC 6093?

2020-07-27 Thread Schwarz, Konrad
> -Original Message-
> From: Danny Niu 
> Subject: Should POSIX/SUS advise against MSG_OOB in recognition of RFC 6093?
> 
> RFC 6093 "On the Implementation of the TCP Urgent Mechanism"
> surveys the then existing implementations of TCP "URG" flag and use and 
> recommends that new applications to not use it.
> 
> In POSIX, it is said that "Support for an out-of-band data transmission 
> facility is protocol-specific"; and unlike textbooks such as "Unix
> Network Programming" (vol 1 ch.24) semantics of OOB IOs are unspecified in 
> the standard.
> These all diminishes the usefulness of using that flag in portable 
> applications.
> 
> Should we, in recognizing these, recommend against the use of MSG_OOB flag in 
> new applications, by referencing RFC 6093?
 
I took a look at RFC 6093 and RFC 854 (Telnet), which seems to be the primary 
application of the TCP Urgent mechanism, and honestly, I think the criticism of 
6093 is misdirected.

I acknowledge that the TCP Urgent mechanism is poorly specified, but the way 
Telnet uses it makes the 
intent clear: urgent mode is intended for the application to simply discard 
unprocessed received data 
until the Telnet Data Mark command is found.  It is a way of dealing with 
head-of-line blocking.

When used in this way, virtually none of the criticisms discussed by 6093 apply.

In any case, the out-of-band mechanism is already specified so abstractly by 
POSIX that
I think Shwaresyst is correct: portable applications would have a hard time 
finding any use
for it already.

--
Konrad



RE: [Issue 8 drafts 0001349]: Where to obtain ISO/IEC standards (footnote)

2020-07-20 Thread Schwarz, Konrad
> -Original Message-

> From: Quentin Rameau 

> Sent: Monday, July 20, 2020 0:23

> To: austin-group-l@opengroup.org

> Subject: Re: [Issue 8 drafts 0001349]: Where to obtain ISO/IEC standards 
> (footnote)

>

> Hello,

>

> > ==

> > Summary:Where to obtain ISO/IEC standards (footnote)

> > ==

>

> The C standard specified by current (and next) POSIX is C99, but this 
> standard doesn't seem to be available at all from the ISO/IEC, at

> least from their website which shows it as withdrawn, same for C11, if favor 
> of C18 which is the only one is accessible.

>

> What would then be the correct way to access such standard (c99)?

> Should that then be added to the footnote?

>

> Thanks for any clarification!



Hmm, I distinctly remember you used to be able to get old versions of standards 
from ISO - but I can't find
how to do so on the current version of the ISO web site either.



An alternative would be to get the final draft from the technical committee's 
website - but again,
I don't know if that still works; the links on the ISO web site all link back 
into itself and not to
the committee's work area.



Sorry of not being of more help!



--

Konrad


RE: Why does %#x omit the 0x prefix for a zero value?

2020-07-06 Thread Schwarz, Konrad
Yes, but the same logic (0 being unambiguous in all radices) equally applies to 
1.

From: shwaresyst 
Sent: Monday, July 6, 2020 18:35
To: Schwarz, Konrad (CT RDA IOT SES-DE) ; 
austin-group-l@opengroup.org
Subject: RE: Why does %#x omit the 0x prefix for a zero value?


The necessity for "0x" is to disambiguate from octal numbers with their leading 
'0', or decimal for a context allowing leading zeroes, but since a 0 is the 
same in all radices I suspect the decision was not to require it to keep field 
width minimal for delimited formats like CSV.

As to 2nd, "#.8x" forces a 10 char output for non-zero values, the "0x" 
followed by the 8 digits for the explicit precision; for zero values and the 
other format it stays at 8, as the "0x" considered part of the width, I would 
think. This could be more explicit, but I think matches existing practice for 
how many spaces get inserted to do a right justify in a field width.

________
On Monday, July 6, 2020 Schwarz, Konrad 
mailto:konrad.schw...@siemens.com>> wrote:

Sorry, this isn’t really a POSIX or a standards question, but does anyone know 
why this was defined this way?   Was it just codification of “historical 
practice” (i.e., a non-fatal bug)?



While we’re at it: when print formatting integers, are there any disadvantages 
of using a precision specification over a zero flag followed with a field 
width, i.e., “%#.8x” vs. “%#08x”?


Why does %#x omit the 0x prefix for a zero value?

2020-07-06 Thread Schwarz, Konrad
Sorry, this isn't really a POSIX or a standards question, but does anyone know 
why this was defined this way?   Was it just codification of "historical 
practice" (i.e., a non-fatal bug)?

While we're at it: when print formatting integers, are there any disadvantages 
of using a precision specification over a zero flag followed with a field 
width, i.e., "%#.8x" vs. "%#08x"?


RE: LC_CTYPE=UTF-8

2020-06-26 Thread Schwarz, Konrad
> -Original Message-
> From: Ingo Schwarze 
> Sent: Thursday, June 25, 2020 21:25
> To: Alan Coopersmith 
> Cc: Hans Åberg ; Austin Group 
> 
> Subject: Re: LC_CTYPE=UTF-8
> 
> Hi Alan,
> 
> Alan Coopersmith wrote on Thu, Jun 25, 2020 at 12:13:33PM -0700:
> > On 6/25/20 8:31 AM, Ingo Schwarze wrote:
> 
> >> Whether to standardize only C.UTF-8 or both C.UTF-8 and POSIX.UTF-8
> >> as synonyms looks a bit like asking for the best colour of a bikeshed.
> >> Given that the standard already contains the redundancy of requiring
> >> both "C" and "POSIX", maybe it is more consistent to also require
> >> both "C.UTF-8" and "POSIX.UTF-8", but i don't think that matters
> >> greatly.
> 
> > The only thought I had along those lines was that I thought the "C"
> > locale came from the C standard, and might be best left to the C
> > committee to standardize, while this group controls the "POSIX"
> > locale definition.  I suspect those following the POSIX standards
> > would end up implementing both, regardless of which specification
> > defines each.

My impression Is that the C standard shied away from all
concrete character-encoding issues, at least originally, where
alternatives such as EBCDIC were still quite relevant.
Although support for multibyte and wide characters were introduced,
this was done in a very abstract way;
I don't recall any mention of explicit encodings such as ASCII.

As such, I think it would be fine for POSIX to standardize
both POSIX.UTF-8 and C.UTF-8; I'd expect little
opposition from the C standard committee to such a move.

(Honestly, I don't know if the Microsoft Visual C library
support a C.UTF-8 locale at the moment -- I'm pretty
sure their system call level is still UTF-16).

TL;DR: for consistency, I'd prefer POSIX to define C.UTF-8
as well as POSIX.UTF-8, even without explicit blessing by
the C committee.  I don't think they reserved parts
of the locale namespace for themselves.

--
Konrad Schwarz



RE: [1003.1(2016)/Issue7+TC2 0001345]: date(1) default format

2020-06-08 Thread Schwarz, Konrad
> -Original Message-
> From: Larry Dwyer 
> The purpose of the POSIX locale is mandatory so that a conformance suite can 
> be developed for the purpose of testing and branding a
> manufacturer's system for POSIX conformance (period).

Hmm, isn't it also so that applications can make more assumptions about the 
input/output of utilities?  Defining a POSIX locale
for the sole purpose of enabling testing the compliance of said locale seems 
very redundant.  And actually, I'm pretty
sure the  POSIX locale was defined to behave as traditional, non-locale enabled 
Unix, to make it possible to have locale support
as a differentiating feature.



RE: aliases in command substitutions

2020-04-21 Thread Schwarz, Konrad
Please ignore, superseded by subsequent discussion.

> -Original Message-
> From: Schwarz, Konrad (CT RDA IOT SES-DE) 
> Sent: Tuesday, April 21, 2020 10:18
> To: Robert Elz 
> Cc: austin-group-l@opengroup.org
> Subject: RE: aliases in command substitutions
> 
> > -Original Message-
> > From: Robert Elz 
> > Sent: Monday, April 20, 2020 15:35
> > To: Schwarz, Konrad (CT RDA IOT SES-DE) 
> > Cc: austin-group-l@opengroup.org
> > Subject: Re: aliases in command substitutions
> >
> >     Date:    Mon, 20 Apr 2020 07:12:03 +
> > From:"Schwarz, Konrad" <mailto:konrad.schw...@siemens.com>
> > Message-ID:  <mailto:38be7e5d52c74c9dac140f7de5105...@siemens.com>
> >
> >   | Not sure if I understand your problem,
> >
> > I suspect probably not.
> >
> >   | but I've always understood the
> >   | case xxx in
> >   | (pattern) ...;;
> >   | esac
> >   |
> >   | (fully parenthesized pattern) syntax to have been invented precisely
> >   | to allow case statements in $() subshell notation,
> >
> > First, $() is command substitution, not a subshell (not really
> > important) and if that was someone's intent, they did a particularly
> > bad job of implementing it, as what the standard says is (XCU 2.6.3)
> >
> > With the $(command) form, all characters following the open
> > parenthesis to the matching closing parenthesis constitute the
> > command. Any valid shell script can be used for command, except a
> > script consisting solely of redirections which produces unspecified
> > results.
> >
> > Note the "any valid shell script" (with that one exception) - a valid shell 
> > script certainly includes a case statement where the optional
> '('
> > is omitted.
> 
> Note the "matching closing parentheses", which means _lexical_ matching, 
> i.e., counting the number of open and closing parentheses
> in the source text.
> 
> >
> > My guess has always been that the '(' was invented as a sop to
> > parenthese balancing editors - to make it possible for those things to 
> > assist with
> > balancing parentheses.   But that's mere speculation, I wasn't around at
> > the time.  A workaround for broken shells is another possibility.
> 
> What makes more sense: that it was invented for interactive editing (but the 
> line oriented structure of the shell makes this a non-
> issue -- it is visually clear where a case statement starts and ends, and 
> also, at top level, so 99% of uses, the case statement is not
> enclosed in parentheses, so there is nothing for the editor to skip over) or 
> to make it possible for the $() form of command
> substitution to include case statements without requiring full parsing of the 
> contents of the $()?
> 
> Note that the original form, in backticks, was purely lexical as well, but 
> the rules were convoluted (ad hoc) and the Bourne shell was
> broken in regards to nested command substitution anyway?  The $() notation 
> was invented precisely to make it easy to use nested
> command substitutions and was defined in such a way as to allow for simple 
> lexical scanning (count opening and closing parentheses)
> to determine its end.
> 
> [snippng the rest of the message]
>  I think you are on a completely wrong tangent: there is no need for subshell 
> parsing to occur when looking for the end of command
> substation; this is purely lexical scanning.
> 




RE: aliases in command substitutions

2020-04-21 Thread Schwarz, Konrad
> -Original Message-
> From: Robert Elz 
> Sent: Monday, April 20, 2020 15:35
> To: Schwarz, Konrad (CT RDA IOT SES-DE) 
> Cc: austin-group-l@opengroup.org
> Subject: Re: aliases in command substitutions
> 
> Date:Mon, 20 Apr 2020 07:12:03 +0000
> From:"Schwarz, Konrad" <mailto:konrad.schw...@siemens.com>
> Message-ID:  <mailto:38be7e5d52c74c9dac140f7de5105...@siemens.com>
> 
>   | Not sure if I understand your problem,
> 
> I suspect probably not.
> 
>   | but I've always understood the
>   | case xxx in
>   | (pattern) ...;;
>   | esac
>   |
>   | (fully parenthesized pattern) syntax to have been invented precisely
>   | to allow case statements in $() subshell notation,
> 
> First, $() is command substitution, not a subshell (not really important) and 
> if that was someone's intent, they did a particularly bad job
> of implementing it, as what the standard says is (XCU 2.6.3)
> 
>   With the $(command) form, all characters following the open
>   parenthesis to the matching closing parenthesis constitute the
>   command. Any valid shell script can be used for command, except a
>   script consisting solely of redirections which produces unspecified
>   results.
> 
> Note the "any valid shell script" (with that one exception) - a valid shell 
> script certainly includes a case statement where the optional '('
> is omitted.

Note the "matching closing parentheses", which means _lexical_ matching, i.e., 
counting the number of open and closing parentheses
in the source text.

> 
> My guess has always been that the '(' was invented as a sop to parenthese 
> balancing editors - to make it possible for those things to
> assist with
> balancing parentheses.   But that's mere speculation, I wasn't around at
> the time.  A workaround for broken shells is another possibility.

What makes more sense: that it was invented for interactive editing (but the 
line oriented structure of the shell makes this a non-issue -- it is visually
clear where a case statement starts and ends, and also, at top level, so 99% of 
uses, the case statement is not enclosed in parentheses, so
there is nothing for the editor to skip over) or to make it possible for the 
$() form of command substitution to include case statements without
requiring full parsing of the contents of the $()?

Note that the original form, in backticks, was purely lexical as well, but the 
rules were convoluted (ad hoc) and the Bourne shell was broken in regards
to nested command substitution anyway?  The $() notation was invented precisely 
to make it easy to use nested command substitutions and was
defined in such a way as to allow for simple lexical scanning (count opening 
and closing parentheses) to determine its end.

[snippng the rest of the message]
 I think you are on a completely wrong tangent: there is no need for subshell 
parsing to occur when looking
for the end of command substation; this is purely lexical scanning.




RE: Weird possibility with async processes, $!, and long running scripts

2020-03-15 Thread Schwarz, Konrad


> -Original Message-
> From: Robert Elz 
> Sent: Sunday, March 15, 2020 13:47
> To: shwaresyst 
> Cc: austin-group-l@opengroup.org
> Subject: Re: Weird possibility with async processes, $!, and long running 
> scripts
> 
> Date:Sun, 15 Mar 2020 11:39:27 + (UTC)
> From:shwaresyst 
> Message-ID:  <1641208969.3419054.1584272367...@mail.yahoo.com>
> 
>   | For that purpose both still running processes and zombie processes have
>   | to be considered as active where new ID selection would be concerned.
> 
> Yes, the kernel makes that happen - as long as the zombie still exists, there 
> is no problem (making that happen was the object of my
> first two semi-solutions - neither of which is particularly nice, and leaving 
> the process table cluttered with zombies isn't either).
> 
> Harald's suggestion of stopping scripts using pids and requiring using job 
> designators (%1 etc) would certainly help, as then the shell
> knows which is being referred to - wait becomes trivial to do correctly and a 
> built in kill can object to signalling a process (job) that has
> already completed.
> 
> But that would be a mammoth culture change, and only works when processes are 
> referenced using built-in shell commands - the job
> designators mean nothing anywhere else.
> 
> kre

For the sake of completeness:
A similar suggestion would be for a shell caching expired PIDs to renumber 
these for uniqueness internally, and for kill and other built-ins taking PIDs 
to use this renumbering.  Similarly, POSIX utilities using PIDs such as ps 
would need to be made aware of this renumbering, perhaps by turning them into 
built-ins.   Finally, applications would need to be able to make use of this 
renumbering, i.e., by adding new entries to the shell's table, so that when 
they report process IDs, they can use the renumbering.  This leads to requiring 
this renumbering to be globally unique, since there is no requirement for these 
applications (e.g. daemons) to be connected to the session's shell in any way.

Ultimately, this turns into a system functionality, pretty much identical to 
the existing process table.

It seems much better to extend the existing interfaces to be able to leave 
zombie processes in the core process table until they can be removed without 
causing ambiguity.  As always, correctness trumps efficiency.

Konrad Schwarz



RE: [1003.1(2004)/Issue 6 0000267]: time (keyword)

2020-03-02 Thread Schwarz, Konrad



> From: Nick Stoughton  
> Sent: Tuesday, February 25, 2020 00:27
> To: Chris F.A. Johnson  Cc: austin-group-l@opengroup.org
> Subject: Re: [1003.1(2004)/Issue 6 267]: time (keyword)
> 
> If we are bicycle-shedding around a name for a new utility,
> I like the idea of using the currently reserved namespace of appending a 
> colon ... "time: "
> 
> Since this is already reserved it shouldn't break either existing shells or 
> utilities.
> I know of no shells that implement a goto (which is why it was reserved in 
> the first place).
> 
> XRAT states "The restriction on ending a name with a  is to allow 
> future implementations that support
> named labels for flow control; see the RATIONALE for the break built-in 
> utility.", and the mentioned break rationale provides an
> example, but I have not heard of any shell that uses it.

I would prefer to retain the option of labeled loops and breaks.



RE: [1003.1(2008)/Issue 7 0000252]: dot should follow Utility Syntax Guidelines

2020-02-04 Thread Schwarz, Konrad
> -Original Message-
> From: Robert Elz 
> Sent: Tuesday, February 4, 2020 1:56 PM
> To: Steffen Nurpmeso 
> Cc: austin-group-l@opengroup.org
> Subject: Re: [1003.1(2008)/Issue 7 252]: dot should follow Utility Syntax 
> Guidelines
> 

> What I am at the very least unclear about, is that the way that this group 
> chose to require "--" processing,
> appears to me to also require that any arg (before that "--" or an arg not 
> starting with "-") which does start
> with "-" be treated as an option - which since we have no options (not sure 
> if any shell does, I haven't
> encountered any) would
> *require* issuing an "invalid option" error.

I can think of a reason for requiring option processing in the dot command:

I would often like to pass arguments to dot, like so,

. ./myscript arg1 arg2 ...

temporarily setting $0 $1 $2 ..., similar to shell functions.

Currently, this is not permitted by POSIX.  Allowing for option processing 
would enable
future extension to allow such usage, e.g.,

. -x ./myscript arg1 arg2 ...



RE: A question on file flags after fork

2020-01-14 Thread Schwarz, Konrad
> -Original Message-
> From: Ronald F. Guilmette 
> Sent: Tuesday, January 14, 2020 8:16 AM
> To: austin-group-l@opengroup.org
> Subject: Re: A question on file flags after fork
> 
> In message ,
> Shware Systems  wrote:
> 
> >Short answer, because both file descriptors reference the same file 
> >description...
> 
> OK.  I see where I took a wrong turn now, however I must say that I cannot 
> blame myself for having done so.  The
> language being used for the base concepts here is exceptionally stilted.  We 
> have -descriptors- and then we have
> file -descriptions-.  I get it now, but I cannot help but wish that the 
> original drafters, way back when, had
> elected to be a bit less clever and bit more obvious in their coinage of the 
> relevant terminology here.  The
> term "file desctriptor" was grandfathered in from the ancient times of UNIX.  
> So that was cast in stone and
> could not be reasonably changed.  But I would have been a LOT happier if 
> those standard drafters, back in the
> day, had elected to call what is apparently now called a "file description" 
> something else... a "purple
> aardvark" or basically anything other that the thing they finally settled on, 
> which is extraordinarily subject
> to misinterpretation, being as it is, so close to the term "file descriptor".

I imagine people were reluctant to use "file table entry", as that implied a 
certain implementation (a table).

> Moving ahead, now that my misreading has been corrected, I'd like to just 
> throw out a trial balloon and note
> that it would be pragmatically useful to provide some attributes that are 
> currently associated only with "file
> descriptions" also for file descriptors.  O_NONBLOCK is the one that is most 
> immediately apparent to me, but I
> can readily imagine usefulness also for permitting things like O_APPEND and 
> even O_RDONLY and O_WRONLY to be
> applied selectively to individual file descriptors, rather than to (shared) 
> file descriptions.  I will happily
> elaborate on a real-world scenario in which this would have been most useful 
> to have, if anyone is interested,
> and also the ugly ccode contortions that had to be applied in order to 
> work-around this particular non- feature,
> which I am now aware is 100% standard conformant.

The point of O_RDONLY and O_WRONLY being fixed is that file access permission 
checking is done only during the
open(); the resulting file descriptor can be passed on to executables with a 
different (lesser) set of permissions.
Similarly, with O_APPEND, you want to make sure that, e.g., earlier log entries 
cannot be destroyed by later ones
by processes outside of your control.

I agree that the case for O_NONBLOCK is less clear and was surprised that this 
is not stored as part
of the file descriptor (although the name does give it away).



RE: system() and pthread_atfork()

2020-01-14 Thread Schwarz, Konrad
> -Original Message-
> From: Robert Elz 
> Sent: Tuesday, January 14, 2020 11:35 AM
> To: Schwarz, Konrad (CT RDA IOT SES-DE) 
> Cc: nate.karst...@garmin.com; austin-group-l@opengroup.org
> Subject: Re: system() and pthread_atfork()


>   | The point I was trying to make with the text you did not quote is that
>   | if the OP had been more judicious in closing sockets/file descriptors,
>   | he would not have run into the problem in the first place.
> 
> The issue (as I understand it, I do not like the threading methedology, and 
> do not use it) is that in threaded
> processes, with everything happening in parallel, and no one thread having 
> any real idea what any other thread
> might be doing, there is no way to achieve the result you are expecting.
> 
> That is, at the exact same instant one thread is doing fork() another
> is doing open() or socket().   If it happens that the open() finishes one
> zeta-second before the fork() starts then the fd from that open will be 
> inherited by the child of the fork, but
> because the thread doing the open has not hat time to save the fd anywhere 
> yet, there is no way for the child
> process (which only contains the thread which forked, not the others, 
> including not the one that did the open())
> to ever discover what that fd was, or what it connects to.

But this is benign.  Only one thread is running in the new process,
that thread does not touch any sockets it knows nothing about,
and it will soon exec(), and the close-on-exit flag will do its job.

The mistake the OP was doing is to close the listening socket, which is bound 
to INADDR_ANY (and a fixed port),
in response to an IP address change.  When he does this and an unrelated fork() 
occurs,
a race ensures: if the exec() does not happen soon enough, the parent fails to 
rebind to the
socket, because it is still open in the child.

Had he simply left the existing listening socket alone, everything would have 
worked.

> 
>   | This seriously undermines the case for an all new F_CLOFORK flag
> 
> If it weren't for threading, I would not support it at all.   In any
> non-threaded context it is a stupid idea.   But I can see the need for
> it with our current threading methedology.   The real problem is that
> threads are a horrid misfeature.   Unfortunately, lots of people seem
> to like the evil things.

No use case demonstrating need for this feature has been presented up to now.



RE: system() and pthread_atfork()

2020-01-14 Thread Schwarz, Konrad
> -Original Message-
> From: Robert Elz 
> Sent: Monday, January 13, 2020 1:43 PM
> To: Schwarz, Konrad (CT RDA IOT SES-DE) 
> Cc: nate.karst...@garmin.com; austin-group-l@opengroup.org
> Subject: Re: system() and pthread_atfork()
> 
> Date:Mon, 13 Jan 2020 10:13:04 +0000
> From:"Schwarz, Konrad" <mailto:konrad.schw...@siemens.com>
> Message-ID:  
> <mailto:a45b1767f1002449a37508c2cc6003d7172c9...@defthw99em4msx.ww902.siemens.net>
> 
>   | I actually feel this problem is out-of-scope for POSIX: compliant machines
>   | are not supposed to dynamically change their IP addresses at run-time.
> 
> I have no idea what (if anything) POSIX says about IP networking 
> requirements, but I'd expect not much, but that
> (if it were stated
> somewhere) would be an error.
> 
> [DHCP is allowed to dynamically change addresses]

Captain Obvious here:
A minimal quality of implementation attribute for DHCP daemons is for
these addresses to remain fixed for as long as possible.
Cf. https://kb.isc.org/docs/isc-dhcp-44-manual-pages-dhcpdleases,
which is designed to persist address assignments across restarts,
e.g., because the hosting server needs to reboot.

Or are you suggesting applications must be (re-)coded
such that they are resistant to dynamic changes of IP addresses?
 
> All that said, I agree that anything related to this issue would be out of 
> scope for POSIX, but the more general
> problem of threaded applications (which it must be, that's the only way that 
> the process can be simultaneously
> closing & opening sockets, while also, unknown to itself, also forking)
> and the interactions wrt fork & threads, is a POSIX issue.   That the
> actual problem is networking related is just a side issue, I believe.

The point I was trying to make with the text you did not quote is that
if the OP had been more judicious in closing sockets/file descriptors,
he would not have run into the problem in the first place.

This seriously undermines the case for an all new F_CLOFORK flag and associated
paraphernalia.  Indeed, except for Solaris, no implementation has ever
implemented this, presumably because there is no real-world need for it.


smime.p7s
Description: S/MIME cryptographic signature


RE: system() and pthread_atfork()

2020-01-13 Thread Schwarz, Konrad
> -Original Message-
> From: Karstens, Nate 
> Sent: Sunday, January 12, 2020 11:52 AM
> To: 'austin-group-l@opengroup.org' 
> Subject: Re: system() and pthread_atfork()

Going back to the original problem,

> We are running Linux on an embedded system. The platform can
> change the IP address either according to a proprietary negotiation scheme
> or a manual setting. The application uses netlink to listen for IP address 
> changes;
> when this occurs the application closes all of its sockets and re-opens them 
> using the new address.
>
> A problem can occur if the application is simultaneously fork/exec-ing a new 
> process.
> The parent process attempts to bind a new socket to a port that it had 
> previously
> bound to (before the IP address change), only to fail because the child 
> process
> continues to hold a socket bound to that port.

I actually feel this problem is out-of-scope for POSIX: compliant machines
are not supposed to dynamically change their IP addresses at run-time.

Even if we accept this premise, I'm not sure I understand the problem:
suppose, on the bound socket, you used the specific IP address;
then, when switching to the new address, you could simply create
a new socket using the new specific address.  Since port numbers
are local to IP addresses, this should not create
a conflict.  (The process should close the old socket to conserve file 
descriptors).

On the other hand, if you were using INADDR_ANY, why not simply
leave the socket open?  I would expect a machine that allows dynamic
changes to the supported internet addresses to route new
connection requests to pre-existing sockets bound to INADDR_ANY.





RE: system() and pthread_atfork()

2020-01-03 Thread Schwarz, Konrad
> -Original Message-
> From: Matthew Dempsky 
> Sent: Friday, January 3, 2020 2:27 AM
> To: Schwarz, Konrad (CT RDA IOT SES-DE) 
> Cc: Karstens, Nate ; austin-group-l@opengroup.org
> Subject: Re: system() and pthread_atfork()
> 
> On Thu, Jan 2, 2020 at 5:01 AM Schwarz, Konrad 
> <mailto:konrad.schw...@siemens.com> wrote:
> > I think the right solution is for POSIX to require system() and popen() to 
> > call pthread_atfork() handlers.
> 
> How would this work for systems where system() is implemented using 
> posix_spawn()? posix_spawn()'s RATIONALE
> explicitly mentions that it can be used to implement system(), but also it's 
> meant to be implementable without
> using fork() (and thus without fork handlers).
> 
> It seems like the requirement should be more nuanced. E.g., that *if*
> system() is implemented using fork(), then it must call at-fork handlers. I'm 
> not sure how to phrase that in
> standardese though.

In my “second attempt”, I wrote

> I think the right solution is for POSIX to require system() and popen() to 
> call pthread_atfork() handlers, if they [i.e., system() and popen()] are not 
> atomic with regards to exec().




RE: system() and pthread_atfork()

2020-01-02 Thread Schwarz, Konrad
> -Original Message-

> From: Karstens, Nate 

> Sent: Thursday, December 19, 2019 12:26 AM

> To: austin-group-l@opengroup.org

> Subject: system() and pthread_atfork()

>

> The current definition of system() does not define if the pthread_atfork() 
> handlers are called. We ran into a

> scenario where this caused a problem and wanted to share it with the mailing 
> list to better understand why those

> handlers are not required and get some advice on how best to proceed.



I think the right solution is for POSIX to require system() and popen() to call 
pthread_atfork() handlers.

I haven't noticed any arguments against such a solution and it clearly fills a 
need.



I suggest you open a corresponding defect report on 
http://austingroupbugs.net






RE: Are BSDs evidently more fault tolerant than (say) SysV and Linux?

2019-11-26 Thread Schwarz, Konrad
Well, SUN obviously felt different, witness the switch from SunOS (BSD-derived) 
to Solaris (Sys V).

I think that Sys V substantially fixed the signal model and shared library 
support, at least.

> -Original Message-
> From: Danny Niu 
> Sent: Tuesday, November 26, 2019 2:32 AM
> To: Austin Group Mailing List 
> Subject: Are BSDs evidently more fault tolerant than (say) SysV and Linux?
> 
> Actually I've also asked this on Unix.StackExchange.com, but it gets closed 
> for being too broad, and I can't
> think of a way to make it specific.
> One of those fellows mentioned it's because the way BSD UFS filesystem 
> implemented made it more fault tolerant.
> 
> But I think there must be a more fundamental reason, Somewhere deep in the 
> kernel that made the operating system
> family More fault tolerant than others.
> 
> Does anyone on the list know where such mention exist(ed)?
> 
> Thanks.
> 




RE: What is this out-of-scope thing called Record IO?

2019-11-04 Thread Schwarz, Konrad
> From: Donn Terry  
> Sent: Friday, November 1, 2019 3:37 PM
> To: Scott Lurndal 
> Cc: Danny Niu ; Austin Group Mailing List 
> 
> Subject: Re: What is this out-of-scope thing called Record IO?

> That's correct. At the time the spec was written most OSs (and there were a 
> lot more different ones
> then) were loosely modeled on the punch-card systems that preceded them, 
> which means that I/O occurred
> in fixed sized "records".  (Well, everybody but (that time's version of) CDC 
> used that term
> consistently to mean "a big, custom sized, punch card on (typically) tape".) 
> The stream of data we now > take for granted was a real innovation in Unix.

> Donn

To expand on this, the dd utility (itself named after a mainframe command used 
to specify record
organization) can be used to convert between mainframe-style files and Unix 
stream of data style files.



RE: [1003.1(2013)/Issue7+TC1 0001045]: Issues with "cd -"

2019-10-23 Thread Schwarz, Konrad
> -Original Message-
> From: Austin Group Bug Tracker 
> Sent: Wednesday, October 23, 2019 3:58 PM
> To: austin-group-l@opengroup.org
> Subject: [1003.1(2013)/Issue7+TC1 0001045]: Issues with "cd -"
> 
> 
> The following issue has a resolution that has been APPLIED.
> ==
> http://austingroupbugs.net/view.php?id=1045

Could the example

case $dir in
(/*) CDPATH= cd -P "$dir";;
("") CDPATH= cd -P "";;
(*) CDPATH= cd -P "./$dir";;
esac

be shortened to

case $dir in
(/*|) CDPATH= cd -P "$dir";;
(*) CDPATH= cd -P "./$dir";;
esac

?

Also, from a usability perspective, I think it would be better if `-' lost its 
special meaning after `--'.  This would make the above code superfluous.

Konrad Schwarz



RE: [1003.1(2013)/Issue7+TC1 0001052]: ${#var} should be decimal I presume.

2019-10-23 Thread Schwarz, Konrad
> -Original Message-
> From: Austin Group Bug Tracker 
> Sent: Wednesday, October 23, 2019 4:02 PM
> To: austin-group-l@opengroup.org
> Subject: [1003.1(2013)/Issue7+TC1 0001052]: ${#var} should be decimal I 
> presume.
> 
> 
> The following issue has a resolution that has been APPLIED.
> ==
> http://austingroupbugs.net/view.php?id=1052
> ==

> Expands to the shortest representation of the decimal ...

Wouldn't this be better phrased in terms of the %d format conversion?



RE: More issues with pattern matching

2019-09-26 Thread Schwarz, Konrad



> -Original Message-
> From: Harald van Dijk 
> Sent: Thursday, September 26, 2019 4:39 PM
> To: austin-group-l@opengroup.org
> Cc: austin-group-l@opengroup.org
> Subject: Re: More issues with pattern matching
> 
> On 26/09/2019 13:13, Robert Elz wrote:
> > So, if we have
> >
> > [[:alpha]
> >
> > there is absolutely no question but that this is a bracket expr that
> > matches one of the 7 chars
> > [ : a l p h a
> > and is in no way any kind of character class reference, whatever it
> > looks like its author may have intended, and regardless of what comes
> > after it.
> >
> > If the standard says any different, or implies different, or even
> > allows different, it is simply wrong.
> 
> If this is the whole pattern, then agreed, but if this is only part of the 
> pattern, I am not sure. [[:alpha]:]]
> is interpreted by many shells (bash, bosh, mksh, zsh) as a character class 
> containing an invalid character class
> name "alpha]". It may also be treated as such in ksh and yash, but as the 
> whole pattern fails to match anything,
> it is hard to tell how exactly they interpret it. The interpretation as "any 
> of the characters in '[:alpha',
> followed by ':]]', is something I only see in osh and in your shell.

POSIX should disallow `:' and `]' in character class names.





RE: More issues with pattern matching

2019-09-26 Thread Schwarz, Konrad
> -Original Message-
> From: Robert Elz 

> So, is [[:"alpha":]] required to be treated the same as [[:alpha:]] , not 
> allowed to be treated the same,
> explicitly unspecified, or simply never considered (previously) ?

An argument for requiring [[:"alpha":]] to be the same as [[:alpha:]] is that 
it would allow character-class names
with white space, e.g., "title case".

Regards

KAS



editorial mistake in c99 man page: table reference wrong

2019-09-17 Thread Schwarz, Konrad
Hi,

I am unfortunately failing in submitting a bug report via Aardvark (perhaps 
cookies), but anyhow:

In 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/c99.html#tag_20_11_13_04,
 penultimate paragraph,

   The getconf utility can be used to get flags for the threaded programming 
   environment, as indicated in Programming Environments: Type Sizes.

should read

   The getconf utility can be used to get flags for the threaded programming 
   environment, as indicated in Threaded Programming Environment: c99 Arguments.

Best regards

Konrad Schwarz



RE: Arrays

2019-05-31 Thread Schwarz, Konrad
> -Original Message-
> From: Stephane Chazelas 
> Sent: Thursday, May 30, 2019 10:14 PM
> To: Steven Penny 
> Cc: austin-group-l@opengroup.org
> Subject: Re: Arrays

> 
> In there, instead, a list was stored in a *scalar* variable and split (and 
> globbed) upon expansion when unquoted as in:
> 
>   files='file1 file2'
>   rm -f -- $files # with IFS assumed to contain its default value
> 
> Instead of
> 
>   files=(file1 'file 2')
>   rm -f -- $files
> 
> (or equivalent) in csh, rc, es, fish...
> 
> That was a very bad design probably explained by a desire of Steve Bourne to 
> maintain some level of backward portability with the
> Thompson shell parameter expansion (where it was more like macro expansion). 
> That misfeature is by far the primary source of bugs and
> shell script security vulnerabilities nowadays (see
> https://unix.stackexchange.com/questions/171346/security-implications-of-forgetting-to-quote-a-variable-in-bash-posix-shells)

I think the macro expansion idea came from Louis Pouzin/Multics, as described 
in https://multicians.org/shell.html:

Then in 64 came the Multics design time, in which I was not much 
involved, because I had made it clear I wanted
to return to France in mid 65. However, this idea of using commands 
somehow like a programming language was
still in the back of my mind. Christopher Strachey, a British 
scientist, had visited MIT about that time, and his
macro-generator design appeared to me a very solid base for a command 
language, in particular the techniques
for quoting and passing arguments. Without being invited on the 
subject, I wrote a paper explaining how the Multics
command language could be designed with this objective. And I coined 
the word "shell" to name it. It must have been
at the end of 64 or beginning of 65.

The macro generator in question is GPM, General Purpose Macrogenerator.

Regards
Konrad



RE: Lifecycle of process CPU time clock

2019-03-22 Thread Schwarz, Konrad
> -Original Message-
> From: Yann Droneaud 
> Sent: Wednesday, March 20, 2019 4:58 PM
> To: Schwarz, Konrad (CT RDA IOT SES-DE) ; 
> austin-group-l@opengroup.org
> Subject: Re: Lifecycle of process CPU time clock
> 
> As CPU clocks are per process and per threads in a process, measuring the 
> amount of CPU time used by those process or threads,
> there cannot be a fixed set of clocks (except if there's a fixed set of 
> processes and threads).
> 
> clock_getcpuclock(pid, ) has to return a clock identifier which is 
> different from:
> - CLOCK_REALTIME
> - CLOCK_MONOTONIC
> - CLOCK_THREAD_CPUTIME_ID
> 
> (but could be the same as CLOCK_PROCESS_CPUTIME_ID if pid is equal to 0 or 
> getpid()).
> 
> pthread_getcpuclock(pthread_id, ) has to return a clock identifier which 
> is different from
> 
> - CLOCK_REALTIME
> - CLOCK_MONOTONIC
> - CLOCK_PROCESS_CPUTIME_ID
> 
> (but could be the same as CLOCK_THREAD_CPUTIME_ID if pthread_id is equal to 
> pthread_self()).
> 
> And those process/thread CPU clocks have to be tied somehow to the lifecycle 
> of the process/thread they're related to.

Sorry, I hadn't realized this.

clock_getcpuclock(pid, ) should be specified to be valid at least until the 
process status information has been retrieved with one of the wait functions 
and the process status is terminated.

pthread_getcpuclock(pthread_id, ) should be specified to be valid at least 
until the thread terminates and (for non-detached threads), the thread's result 
has not been retrieved by pthread_join().  (Note that the namespace for 
pthred_ids is the thread's containing process).



RE: Lifecycle of process CPU time clock

2019-03-20 Thread Schwarz, Konrad
> -Original Message-
> From: Yann Droneaud 
> Sent: Tuesday, March 19, 2019 9:56 PM
> To: austin-group-l@opengroup.org
> Subject: Lifecycle of process CPU time clock
> 
> Hi,
> 
> I have some questions/concerns regarding the lifecycle of a clockid_t 
> returned by clock_getcpuclock() for a process different from self
> (not 0, nor getpid()):
> 
> Given process B, having process identifier pidB;
> 
> - In process A, should clock_getcpuclock(pidB, ); always return the same 
> clockid_t during the process B lifetime ?
> 
> I don't think it specified, but I expect it, otherwise clock identifiers 
> would accumulate, which might can be considered a ressource leak.

I think the fundamental misunderstanding here is the assumption that clockid_t 
are objects with a dynamic lifetime
rather than the enumeration of the fixed set of clocks as defined by the system.



RE: C11/C17 fopen() "wx" and "exclusive access"

2019-03-13 Thread Schwarz, Konrad
> -Original Message-
> From: Geoff Clare 
> Sent: Wednesday, March 13, 2019 4:28 PM
> To: austin-group-l@opengroup.org
> Subject: C11/C17 fopen() "wx" and "exclusive access"
> 
> C11 introduced an "x" character on the end of fopen() mode arguments that 
> begin with "w", with the following requirement (quoted from
> the final C17 draft, but I don't think it changed):
> 
> Opening a file with exclusive mode ('x' as the last character in
> the mode argument) fails if the file already exists or cannot be
> created. Otherwise, the file is created with exclusive (also known
> as non-shared) access to the extent that the underlying system
> supports exclusive access.
> 
> The first part of this just means POSIX systems will need to use the O_EXCL 
> flag when creating the file, in order to conform to C11/C17.
> It is the second part that this mail is about.
> 
> I'm wondering how the C committee intended "to the extent that the underlying 
> system supports exclusive access" to be interpreted when
> the underlying system is a POSIX implementation.  All POSIX systems support a 
> limited form of exclusion related to file access: file
> permissions.  So did they intend that, in the absence of something better 
> (which we'll come to), fopen() should set the permissions bits
> to zero so that no other opens can be done on the file (except from processes 
> with appropriate privileges)?  Apparently not, because the
> description of fopen_s() in Annex K talks about "exclusive access" and "a 
> file permission that prevents other users on the system from
> accessing the file" as two different things.

To me, this seems inspired by MS DOS/Windows file system semantics, similar to 
the t/b flags for text/binary mode flags.
By default, Microsoft C opens files exclusively; the MSVC run-time library 
provides _fsopen(), among others,
to open files non-exclusively (and does not understand the "x" modifier).

> Thinking some more about applications that want to create a file and lock it 
> while they write some initial contents to it (just in general, not in 
> connection with fopen() "wx" and MFL), perhaps the race condition here is 
> something we should address anyway.  It is much the same as the 
> problem with FD_CLOEXEC that we solved by adding O_CLOEXEC, although there 
> the race was with other threads in the same process, whereas 
> here it is with other processes.

> So maybe it is worth adding a way of creating a file with a whole-file write 
> lock set anyway, for the same reason we added O_CLOEXEC.
>  (Or at least putting it in Issue 8 as a future direction, with a 
> recommendation of the flag name to use - I would suggest O_WRLCK
> to match F_WRLCK.) If we are going to do that anyway, there would be no 
> prospect of creating a loophole by changing the wording
> in the C standard, and so no need to query the C committee about their 
> intentions (unless we want them to clarify that they didn't
> mean the exclusive access to be just for the calling thread).

I think this proposal has technical merit but I thought POSIX was supposed to 
standardize existing practice only.




RE: [1003.1(2016)/Issue7+TC2 0001184]: strftime %C padding character unspecified

2019-02-24 Thread Schwarz, Konrad
> > I believe that it is intended that '-' be included in %C, %Y, and %G
> > for negative years, even without the '+' flag or a field width,
> > although that is perhaps another area that deserves some
> > clarification.  I believe that the "if and only if" in the description
> > of the '+' flag does not match existing practice.
> 
> C99 only specifies %C conversion for the range 00-99, so the results are 
> unspecified for negative years.  (For %G and %Y it doesn't
> specify a range.)
> 
> I thought that we were only adding requirements beyond C99 when the field 
> width and flags are used.
> 
> If we want to require that negative years can be converted with a plain %C 
> then this would require an explicit statement about how this is
> handled, with CX shading, not just a small modification to some unshaded text 
> about the number of characters placed in the array.
> 
> --
> Geoff Clare 
> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

Note that there is no universal definition of negative years,
see https://en.wikipedia.org/wiki/Year_zero -- or does POSIX require 
astronomical year numbering?



RE: pthread_spin_lock_t static initialization

2019-02-05 Thread Schwarz, Konrad
> -Original Message-
> From: Yann Droneaud 
> Sent: Monday, February 4, 2019 6:11 PM
> To: austin-group-l@opengroup.org
> Subject: pthread_spin_lock_t static initialization
> 
> I've recently made use of POSIX thread's spin locks and found there was no 
> static initializer for them in the Open Group specification.

> So could one give me a hint why the OpenGroup specification doesn't have a 
> static initializer for spin lock ?

FYI: I checked Butenhof's book, but that predates PTHREAD_MUTEX_INITIALIZER -- 
it has only PTHREAD_ONCE_INIT -- so remains silent on the issue.

My guess is that spinlocks have much less relevance than mutexes in practice 
and so the practical need for static initialization is much smaller.  The 
absence of PTHREAD_SPINLOCK_INITIALIZER could be pedagogically useful, as 
mutexes should be used in preference to spinlocks in nearly all situations.

With regards to C99 atomics: the atomic_flag type, designed as a basis for 
spin-locks, purposely speaks of abstract "set" and "clear" states instead of, 
say, 1 and 0, because the PA-RISC architecture had an atomic test-and-clear 
operation, but no atomic test-and-set; i.e., the clear state on that processor 
is 1 and the set state is zero.

Regards

Konrad



RE: bc Suggestions

2019-01-30 Thread Schwarz, Konrad
> -Original Message-
> From: Gavin Howard 
> Sent: Monday, January 28, 2019 7:54 PM
> To: austin-group-l@opengroup.org
> Subject: Re: bc Suggestions
> 
> On Thu, Jan 10, 2019 at 4:19 PM Gavin Howard  wrote:

> If users sometimes start bc to do number conversions, then what the Open 
> Group could do is add an option that more than 1 bc already
> has:
> -e, which allows users to input expressions directly at the command line (see
> https://github.com/gavinhoward/bc/blob/master/manuals/bc.1.ronn).
> Then users could use the alias mechanism to create all sorts of
> aliases:
> 
> alias d2o='bc -e "ibase=A;obase=8"'
> alias h2d='bc -e "ibase=16;obase=A"'
> alias o2b='bc -e "ibase=8;obase=2"'
> ..

For this particular problem, note

printf %o\\n ...

or

echo $((...))

or (non-POSIX)

gdb> p/x ...



RE: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Schwarz, Konrad
> -Original Message-
> Expressly making it defined that
>   alias foo='whatever \ '
> which does end in a space (but otherwise is the exact same thing as the 
> previous one) also does not expand aliases in the following
> word
> seems redundant to me.   Since several shells (but not all) do expand
> aliases in this case, it seems to me the best thing to do is to leave this as 
> unspecified, such that no-one sane will ever use it (if
> something is needed, just use the previous form -- but better is not to use 
> aliases at all.)

Coming from ksh, I've always understood the alias mechanism to work at the 
lexical level (macro expansion with rescanning); the quoting behavior above is 
the most natural in that context.

I think it would reduce confusion if it were explicitly mandated.



RE: [1003.1(2016)/Issue7+TC2 0001197]: Omission from 1108: LONG_MIN must be <= -2147483648

2018-07-31 Thread Schwarz, Konrad



> -Original Message-
> From: Austin Group Bug Tracker [mailto:nore...@msnkbrown.net]
> Sent: Monday, July 30, 2018 9:19 PM
> To: austin-group-l@opengroup.org
> Summary:Omission from 1108: LONG_MIN must be <= 
> -2147483648
> Description:
> In the resolution to 1108, Note 4041, while twos-complement arithmetic is 
> adequate
> to describe how the result arises, it overlooks that from a mathematical and 
> set
> theory standpoint the operation is also a range error for the *_MIN values, 
> in that
> the "correct" result is outside the range *_MIN to *_MAX. This should be 
> indicated
> in errno with ERANGE, raise SIGFPE with Code FPE_INTOVF if not masked, or
> both.

C does not have mathematical fidelity as its overriding goal; an efficient 
mapping
to existing computer architectures is more important.

It makes signed overflow undefined so that machines that trap on overflow
may do so.

When a domain error, abs(INT_MIN) is a case of signed overflow and should be 
treated
identically to all other such cases.

Use a machine or compilation environment that traps on signed overflow if you 
find
this feature important.

Konrad Schwarz



RE: About issue 0001108 and abs(INT_MIN)

2018-07-23 Thread Schwarz, Konrad



> -Original Message-
> From: Joerg Schilling [mailto:joerg.schill...@fokus.fraunhofer.de]
> Sent: Thursday, July 19, 2018 4:53 PM
> To: vincent-o...@vinc17.net; austin-group-l@opengroup.org
> Subject: Re: About issue 0001108 and abs(INT_MIN)
> 
> Vincent Lefevre  wrote:
> 
> > The problem is not just the warning. If t is signed,
> >
> >   ((t)(~((t)0) << (sizeof (t)*CHAR_BIT - 1)))
> >
> > will yield undefined behavior due to overflow. This means that
> > compilers may generate code that shows a behavior different from what

> A compiler that creates other than the expected behavior would need to create
> intentionally buggy code.
> 
> The question was to create working code that neither creates a warning with 
> newer
> nor with older compilers. Do you have such code?

I don't think such code (to detect whether an arbitrary type is signed or 
unsigned) exists.

As Vincent correctly wrote, signed arithmetic is allowed to trap;
my understanding is that this was so C could support IBM360-derived 
architectures
(non-trapping signed arithmetic is a recent addition to z-Series).

POSIX acknowledges the inability to programmatically discover the by stating, 
for each typedef
it specifies (e.g. in sys/types.h), whether the type is signed or unsigned.  
Where it does not, e.g., time_t
computation is significantly complicated, e.g., difftime() must be used in 
portable code.

((t) 1 << sizeof (t) * CHAR_BIT - 1) is an expression that evaluates to a 
t-sized word with the most
significant bit set.

Regards

Konrad



RE: can [[:digit:]] match something other than 0123456789?

2018-05-24 Thread Schwarz, Konrad
> -Original Message-
> From: Stephane Chazelas [mailto:stephane.chaze...@gmail.com]
> Sent: Sunday, May 20, 2018 10:43 PM
> To: Geoff Clare
> Cc: austin-group-l@opengroup.org
> Subject: Re: can [[:digit:]] match something other than 0123456789?
> 

> Note that having [x-y] be based on collation order would mean that things 
> like [a-z]
> would also match on uppercase letters in the latin script in locales where 
> case is
> not considered in the first weight for sorting (as is typical for English 
> locales for
> instance).
> 
> 
> Now, in a en_GB.UTF-8 locale on GNU/Linux (here ubuntu 16.04) for instance, 
> both
> bash's and ksh93's [0-9] matches on at least
> 142 different characters (see below). That matches on 0123456789 but also 
> digits 0
> (sometime 1) to 8 (sometimes 9 like for U+0669 which sorts the same as 9 
> there!)
> in other scripts, and some other random decimal digits, and some non-digits 
> and
> is far from including all the plethora of other decimal digits in Unicode.
> (unicode --max 0 --regexp 
> 'digit.(one|two|three|four|five|six|seven|eight|nine)\b' |
> grep -c '^U+'
> retuns 696 with an old version of unicode, and that doesn't even include 
> things like
> roman numerals).

I'd find [0-9] matching on just "western" digits and [[:digit:]] matching
on the locale's digits the most natural solution.  If someone wanted to match
on Devanagari or whatever digits, she could simply list them in the bracket 
expression, rather than using
"western" digits.  If [0-9] is understood to be [[:digit:]], how could one 
differentiate between "western"
and, say, Devanagari digits (other than listing them each explicitly, 
[0123456789], as Stephane has done)?

Same goes for [a-z]: these should match (or should be) the Roman letters, not 
alphabetic characters
in general.

Also, my feeling is that [[:digit:]] should match just the digits that are 
actually relevant for that locale, e.g.,
just "western" digits for en_GB.  And fractions and superscripts are not digits.

If you really want to match any digit in any language, you could add a 
"Unicode" locale or perhaps region.



RE: Proper way to use an updating FILE on a tty?

2017-10-12 Thread Schwarz, Konrad
> -Original Message-
> From: Geoff Clare [mailto:g...@opengroup.org]
> Sent: Wednesday, October 11, 2017 5:51 PM
> To: austin-group-l@opengroup.org
> Subject: Re: Proper way to use an updating FILE on a tty?
> 
> Nick Stoughton <nickstough...@gmail.com> wrote, on 10 Oct 2017:
> >
> > On Tue, Oct 10, 2017 at 7:46 AM, Schwarz, Konrad
> > <konrad.schw...@siemens.com
> > wrote:
> >
> > > POSIX FILE streams opened for update need either a fflush() or a
> > > file positioning function when switching from writing to reading,
> > > and a file positioning function when switching from reading to
> > > writing.
> > >
> > > The Newlib C library as used in Cygwin fails with perror() reporting
> > > "Illegal seek" when fseek(f, 0, SEEK_CUR) is applied to a serial
> > > device (/dev/ttySx).
> > >
> > > The fseek(f, 0, SEEK_CUR) is my attempt at a null file positioning
> > > function between reading and writing to the device.
> > >
> > > Is this a bug in Newlib?
> > > If not, is there a POSIX-sanctioned file positioning function that
> > > works without fail on serial devices?
> > > If not, does that mean that POSIX requires separate streams for
> > > input and output to a serial device?
> > > - or -
> > > Can the error be safely ignored?  (I think this is the right answer).
> >
> > XBD 3.172 defines a File Offset:
> >
> > 3.172 File Offset
> > The byte position in the file where the next I/O operation begins.
> > Each open file description associated with a regular file, block
> > special file, or directory has a file offset. A character special file
> > that does not refer to a terminal device may have a file offset. There
> > is no file offset specified for a pipe or FIFO.
> >
> >
> > Thus it follows that a character special file that refers to a
> > terminal device does NOT have a File Offset, and the error can be safely
> ignored.
> 
> I agree with the first part (which means the fseek() can fail) but I don't 
> see how you
> can conclude that it is safe to ignore the error.
> 
> The standard says "The behavior of fseek() on devices which are incapable of
> seeking is implementation-defined."  So an implementation could do anything it
> likes in this situation (as long as it documents it).
> 

After I'd formulated the original question, I noticed that the Issue 6 
informative section
of fseek() notes that

"The DESCRIPTION is updated to explicitly state that the fseek() sets the 
file-position
indicator, and then on error the error indicate is set and fseek() fails."

This behavior was not immediately apparent to me on reading the main body of 
the text
but it is indeed worded that way.  I think the reason for this change is 
precisely the
situation I've encountered -- thus it is OK to ignore fseek() reporting a 
failure in this case.

So if any change were warranted, it probably suffices to make this point more 
clear, perhaps
in an example.



Proper way to use an updating FILE on a tty?

2017-10-10 Thread Schwarz, Konrad
POSIX FILE streams opened for update need either a fflush() or a file 
positioning function
when switching from writing to reading, and a file positioning function when
switching from reading to writing.

The Newlib C library as used in Cygwin fails with perror() reporting "Illegal 
seek" when
fseek(f, 0, SEEK_CUR) is applied to a serial device (/dev/ttySx).

The fseek(f, 0, SEEK_CUR) is my attempt at a null file positioning function 
between
reading and writing to the device.

Is this a bug in Newlib?
If not, is there a POSIX-sanctioned file positioning function that works 
without fail
on serial devices?
If not, does that mean that POSIX requires separate streams for input and output
to a serial device?
- or -
Can the error be safely ignored?  (I think this is the right answer).



RE: FYI: ksh88 (/usr/xpg4/bin/sh) is not actually POSIX compliant

2017-10-09 Thread Schwarz, Konrad
> -Original Message-
> From: Martijn Dekker [mailto:mart...@inlv.org]
> Subject: Re: FYI: ksh88 (/usr/xpg4/bin/sh) is not actually POSIX compliant
> 
> Op 30-09-17 om 17:35 schreef Alan Coopersmith:
> >> Where/how would I report Solaris bugs?
> >
> > Customers with support contracts can report bugs via Oracle Support.
> 
> I was afraid it might be something like that.

Perhaps your tests could be included in the POSIX conformance test suite?



RE: sh(1): is roundtripping of the positional parameter stack possible? (Was: Re: Shell parameter expansions involving '#")

2017-05-16 Thread Schwarz, Konrad
> -Original Message-
> From: Stephane Chazelas [mailto:stephane.chaze...@gmail.com]
> To: Robert Elz
> Cc: Steffen Nurpmeso; austin-group-l@opengroup.org
> Subject: Re: sh(1): is roundtripping of the positional parameter stack

> Here, I'd fire awk and quote more than one arg at a time:
> 
> quote() {
>   LC_ALL=C awk -v q="'" -v b='\\' '
> function quote(s) {
>   gsub(q, q b q q, s)
>   return q s q
> }
> BEGIN {
>   sep = ""
>   for (i = 1; i < ARGC; i++) {
> printf "%s", sep quote(ARGV[i])
>   sep = " "
>   }
>   if (sep) print ""
> }' "$@"
> }

> Also note that if $IFS was previously unset upon calling your
> quote() (as is common when you want to restore splitting to its default
> behaviour), it would leave it assigned an empty value (which means "no
> splitting"). One common way to address  that is to do:
> 
>_save_IFS=$IFS; ${IFS+":"} unset _save_IFS
>...
>IFS=$_save_IFS; ${_save_IFS+":"} unset IFS

Really excellent work all around -- I'm very impressed.

Konrad



RE: [1003.1(2013)/Issue7+TC1 0001016]: race condition with set -C

2016-11-07 Thread Schwarz, Konrad
> -Original Message-
> From: Geoff Clare [mailto:g...@opengroup.org]
> Sent: Monday, November 07, 2016 5:20 PM
> To: austin-group-l@opengroup.org
> Subject: Re: [1003.1(2013)/Issue7+TC1 0001016]: race condition with set
> -C
> > That seems to be in contradiction with it calling the link() system
> > call.
> >
> > I suspect that text is there for attempts to call "link" on a
> > directory.
> 
> If that were the case, I would have expected it to be worded like
> Joerg's man page quote.  Instead it just says "A user may need
> appropriate privileges to invoke the link utility."

When do you call link(1) in lieu of ln(1)?

BTW: the rationale for ln says:
This volume of POSIX.1-2008 does not allow the ln utility
to unlink existing destination paths by default for the
following reasons:

The ln utility has historically been used to provide locking for
shell applications, a usage that is incompatible with ln unlinking
the destination path by default.  There was no corresponding
technical advantage to adding this functionality.



RE: [1003.1(2013)/Issue7+TC1 0001016]: race condition with set -C

2016-11-02 Thread Schwarz, Konrad
> -Original Message-
> From: Geoff Clare [mailto:g...@opengroup.org]
> Sent: Tuesday, November 01, 2016 11:27 AM
> To: austin-group-l@opengroup.org
> Subject: Re: [1003.1(2013)/Issue7+TC1 0001016]: race condition with set
> -C
> 

> > Why are we bothering to attempt to make > (with -C) atomic just to
> > solve a problem that already has a better solution ?
> 
> The problem is not limited to lock files.  That's just being used as an
> example because it's the case where problems are most likely to occur.

Well, the central argument of http://austingroupbugs.net/view.php?id=1016 is 
locking:

"One common use of set -C is to implement a simple file locking mechanism,
but this is impossible to do safely."

I agree with Robert Elz that the issue should be resolved by referring
to link(2)/ln(1) -- which has been atomic in Unix for a long time --,
and possibly state in the standard that set -C may not be atomic.