Re: [Issue 8 drafts 0001798]: Must posix_getdents remember file offsets across exec?
Adding in Corinna Vinshcen, one of the Cygwin developers. She had problems trying to post directly on the bug page, so we can use email replies and summarize the results back to the bug. On Mon, Jan 22, 2024 at 03:30:20PM +, Austin Group Bug Tracker via austin-group-l at The Open Group wrote: > > A NOTE has been added to this issue. > == > https://austingroupbugs.net/view.php?id=1798 > == > Reported By:eblake > Assigned To: > == > Project:Issue 8 drafts > Issue ID: 1798 > Category: System Interfaces > Type: Clarification Requested > Severity: Objection > Priority: normal > Status: New > Name: Eric Blake > Organization: Red Hat > User Reference: ebb.posix_getdents > Section:XSH posix_getdents > Page Number:1567 > Line Number:52609 > Final Accepted Text: > == > Date Submitted: 2024-01-22 15:13 UTC > Last Modified: 2024-01-22 15:30 UTC > == > Summary:Must posix_getdents remember file offsets across > exec? > == > > -- > (0006632) eblake (manager) - 2024-01-22 15:30 > https://austingroupbugs.net/view.php?id=1798#c6632 > -- > Correction - I'm told that the attempted Cygwin implementation also has > problems after dup(); it is unclear whether the states should be linked > (reading an entry on one fd, grabbing its offset, then using the other fd > to read entries, it is unclear whether the second fd starts reading from > the point where the fd was at the time of dup() or at the shared point > reached by the first fd, and whether the second fd can safely lseek() to > the offset read by the first fd). Easiest would be to state that dup() has > the same limitations as fork()/exec - namely, that any mid-stream directory > traversal in either side of the split is unspecified, and the only portable > thing is to start a new traversal by lseek'ing back to 0 (at which point, > the implementation no longer has to worry about sharing a half-read DIR* > across fd copies or processes). > > Issue History > Date ModifiedUsername FieldChange > == > 2024-01-22 15:13 eblake New Issue > 2024-01-22 15:13 eblake Name => Eric Blake > 2024-01-22 15:13 eblake Organization => Red Hat > 2024-01-22 15:13 eblake User Reference=> > ebb.posix_getdents > 2024-01-22 15:13 eblake Section => XSH > posix_getdents > 2024-01-22 15:13 eblake Page Number => 1567 > 2024-01-22 15:13 eblake Line Number => 52609 > 2024-01-22 15:30 eblake Note Added: 0006632 > == > > -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: Questions on strftime vs. POSIX
Hello Paul, The Austin Group meeting today revisited the topic, and came up with some further thoughts on the matter. Would you be available to attend an upcoming Austin Group teleconference meeting (typically Mondays and Thursdays, at 11am Eastern / 8am Pacific) to speed up the back-and-forth resolution of issues that may still arise? For a typical meeting invite, see: https://www.mail-archive.com/austin-group-l@opengroup.org/msg12295.html The leading consensus during today's meeting was that we would very much like mktime() and strftime("%s") to produce the same value within a given implementation, while still warning the user that the presence of time zones means that there are ambiguous cases where different implementations may come up with different results (one using the offset present before crossing the shift point; the other normalizing to the offset present after crossing the shift point). That means we probably need to reword the requirements for mktime() at the same time we pick up your suggested changes for strftime(). The preliminary thought on how to accomplish that was as follows: | Suggested changes ... | | On page 1428 line 47966 section mktime(), change: | | shall calculate the time since the Epoch value using either the offset in effect before the change or the offset in effect after the change. | | to: | | shall calculate the time since the Epoch value using either the offset in effect before the change or the offset in effect after the change; mktime() may use the value of tm_gmtoff to decide which of these two results is the more appropriate to return. | | (and add tm_gmtoff to the list of fields strftime %s can use) Note that the suggested wording leans towards only using tm_gmtoff to disambiguate, rather than using it always; but that sounds like it differs from TZDB where your latest patches appear to use tm_gmtoff always and ignore tm_isdst and global environment; and we still want to see if the final wording can allow TZDB's behavior to be considered compliant. Another common thought expressed today is that the sequence strftime("%s",...,gmtime(my_timet)) is, more often than not, likely to be a bug (you even said so in your point (3)), in that it is not obvious whether it will produce a value relative to UTC or the user's current timezone. Although your earlier email expressed a desire to let it follow the principle of least surprise for the user, we are wondering if the standard should call out in APPLICATION USAGE that such a usage is only portable if the user has also done something like setenv("TZ", "UTC0", 1) (and possibly also tzset()) in close proximity; as well as calling out the fact that multi-threaded applications may need to take even more steps to be careful of global environment manipulations. We probably need to amend the standard to state that gmtime() must set tm_gmtoff to 0 (right now, that requirement is not there), along with everything else being touched in the resolution of bug 1797. Then again, if you are going to set TZ to UTC0, localtime() and gmtime() should produce the same values (or am I overlooking a case where they can be different?) - at which point, the standard can be more explicit in recommending that strftime("%s") is best used with localtime() rather than gmtime(). Is there any chance of exploiting a flag character in the format string, such as "%#s" meaning to interpret the struct tm as generated by gmtime() rather than by the local time zone? I note that GNU date(1) has already commandeered %:z, %::z, and %:::z as extensions to produce various different formattings of %z, as the reason for considering how such an extension might work. But at this point, that would be too much invention to directly include in the resolution for bug 1797. If nothing else, the mental contortions required to think about the best path forward (whether we need to add even more wording to allow existing implementations to remain compliant while still allowing the best quality-of-implementation to work in the maximum number of scenarios) gave us all the more reason to give more weight to the idea of eventually standardizing tzalloc() and friends (along with a replacement to strftime() that takes an explicit timezone argument) for Issue 9; but first we have to get bug 1797 ready for Issue 8 TC1. https://austingroupbugs.net/view.php?id=1794 For the Austin Group, I will also point out that Paul has recently been active in a current conversation on the bug-gnulib mailing list, where developers are trying to come up with a nicer wrapper functions that takes both struct tm and nanoseconds (for a %n specifier), as well as an indication of local vs UTC timezone, and produces a useful time format from a single interface. For example, https://lists.gnu.org/archive/html/bug-gnulib/2024-02/msg00077.html https://lists.gnu.org/archive/html/bug-gnulib/2024-02/msg00064.html Eric Blak
Re: Re: Questions on strftime vs. POSIX
Widening the scope of this conversation, with Paul's permission. Context for the Open Group readers: per my Action Item from Monday's meeting, I emailed Paul regarding https://austingroupbugs.net/bug_view_page.php?bug_id=1797 On Mon, Feb 05, 2024 at 10:51:34AM -0800, Paul Eggert wrote: > On 2024-02-05 08:15, Eric Blake wrote: > > > Did you consider the effect of the change on applications that > > populate struct tm directly (and don't currently set tm_gmtoff, except > > perhaps by zeroing the structure)? > > Yes. Very few apps do that. (I looked for some in the GNU code I help > maintain, and found none.) They are greatly outnumbered by the applications > that call localtime/localtime_r/mktime/gmtime/gmtime_r/etc. and pass the > result to strftime, which is what this bug report is about. > > > > Does the latest tzdata code only use tm_gmtoff in the rare cases when > > it is necessary for disambiguation, or is it always used (overriding > > the timezone data)? The bug description implies the former, but the > > desired action would allow the latter. > > The former. That is, TZDB 2024a strftime looks only at tm_gmtoff, tm_year, > tm_mon, tm_day, tm_hour, tm_min, and tm_sec to determine %s, because that's > all you need. > > The desired action allows either the TZDB behavior, or the glibc behavior > which if I recall consults tm_gmtoff only when tm_isdst is ambiguous. The > TZDB behavior is technically better than the glibc behavior for three > reasons: (1) it removes a multithreading bottleneck, (2) even in a > single-threaded platform it's faster because mktime is slower than using > tm_gmtoff, and (3) when user code mistakenly calls gmtime and then strftime > then %s does what the user expects. The bug report that caused TZDB to > behave this way was about (3), but (1) and (2) also play a part. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: Recommendation for POSIX ed consideration
Hello Andrew, I'm forwarding your message on to the full Austin Group. On Sun, Dec 10, 2023 at 11:37:40PM -0500, Andrew L. Moore wrote: > Hi, > I am the author of the original GNU ed and maintain an alternative (and I > might add, much more robust) version at github.com/slewsys/ed. > > One thing that I'd love to see the POSIX committee explore is the exit > status of ed. Per the standard: > > EXIT STATUS > > The following exit values shall be returned: > > 0. Successful completion without any file or command errors. > >0. An error occurred. > > The problem with this behavior is that, in interactive use, it common to > make errors, correct them and then write the corrected file. But by exiting > with an error, even after successfully writing, this prevents ed from being > used as the editor for many utilties, which abort when the editor exits with > a non-zero error code. > > In the version of GNU ed handed over to Antonio, the behavior was that after > a successful write, the error status is reset to zero. This had no impact > on traditional scripting and merely allowed ed to be much more friendly, > e.g., for writing git commits. Unfortunately, Antonio updated GNU ed at some > point to follow POSIX, which is sub-optimal. > -AM > -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: bug#65659: RFC: changing printf(1) behavior on %b
On Fri, Sep 01, 2023 at 07:19:13AM +0200, Phi Debian wrote: > Well after reading yet another thread regarding libc_printf() I got to > admit that even %B is crossed out, (Yet already choosen by ksh93) > > The other thread also speak about libc_printf() documentting %# as > undefined for things other than a, A, e, E, f, F, g, and G, yet the same > thread also talk about a A comming late (citing C99) in the dance, meaning > what is undefined today become defined tomorow, so %#b is no safer. > Caution: The proposal here is for %#s (an alternative string), not %#b (which C2x wants to be similar to %#x, in that it outputs a '0b' prefix for all values except bare '0'). Yes, there is a slight risk that C may decide to define %#s. But as the Austin Group includes a member of WG14, we are able to advise the C committee that such an addition is not wise. > My guess is that printf(1) is now doomed to follow its route, keep its old > format exception, and then may be implement something like c_printf like > printf but the format string follow libc semantic, or may be a -C option to > printf(1)... Adding an option to printf is also a possibility, if there is wide-spread implementation practice to standardize. If someone wants to implement 'printf -C' right now, that could help feed such a future standardization. But it is somewhat orthogonal to the request in this thread, which is how to allow users to still access the old %b behavior even if %b gets repurposed in the future; if we can get multiple implementations to add a %#s alias now, it makes the future decisions easier (even if it is too late for Issue 8 to add any new features, or for that matter, to make any normative changes other than marking %b obsolescent as a way to be able to revisit it in the future for Issue 9). > > Well in all case %b can not change semantic in the bash script, since it is > there for so long, even if it depart from python, perl, libc, it is > unfortunate but that's the way it is, nobody want a semantic change, and on > next routers update, see the all internet falling appart :-) How many scripts in the wild actually use %b, though? And if there are such scripts, anything we can do to make it easy to do a drop-in replacement that still preserves the old behavior (such as changing %b to %#s) is going to be easier to audit than the only other currently-portable alternative of actually analyzing the string to see if it uses any octal or \c escapes that have to be re-written to portably function as a printf format argument. POSIX is not mandating %#s at this time, so much as suggesting that if implementations are willing to implement it now, it will make Issue 9 easier to reason about. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: bug#65659: RFC: changing printf(1) behavior on %b
On Fri, Sep 01, 2023 at 08:59:19AM +0100, Stephane Chazelas wrote: > 2023-08-31 15:02:22 -0500, Eric Blake via austin-group-l at The Open Group: > [...] > > The current POSIX says that %b was added so that on a non-XSI > > system, you could do: > > > > my_echo() { > > printf %b\\n "$*" > > } > > That is dependant on the current value of $IFS. You'd need: > > xsi_echo() ( > IFS=' ' > printf '%b\n' "$*" > ) Let's read the standard in context (Issue 8 draft 3 page 2793 line 92595): " The printf utility can be used portably to emulate any of the traditional behaviors of the echo utility as follows (assuming that IFS has its standard value or is unset): • The historic System V echo and the requirements on XSI implementations in this volume of POSIX.1-202x are equivalent to: printf "%b\n" "$*" " So yes, the standard does mention the requirement to have a sane IFS, and I failed to include that in my one-off implementation of my_echo(). Thank you for pointing out a more robust version. > > Or the other alternatives listed at > https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo/65819#65819 > > [...] > > Bash already has shopt -s xpg_echo > > Note that in bash, you need both > > shopt -s xpg_echo > set -o posix > > To get a XSI echo. Without the latter, options are still > recognised. You can get a XSI echo without those options with: > > xsi_echo() { > local IFS=' ' - > set +o posix > echo -e "$*\n\c" > } > > The addition of those \n\c (noop) avoids arguments being treated as > options if they start with -. As an extension, Bash (and Coreutils) happen to honor \c always, and not just for %b. But POSIX only requires \c handling for %b. And while Issue 8 has taken steps to allow implementations to support 'echo -e', it is still not standardized behavior; so your xsi_echo() is bash-specific (which is not necessarily a problem, as long as you are aware it is not portable). > [...] > > The Austin Group also felt that standardizing bash's behavior of %q/%Q > > for outputting quoted text, while too late for Issue 8, has a good > > chance of success, even though C says %q is reserved for > > standardization by C. Our reasoning there is that lots of libc over > > the years have used %qi as a synonym for %lli, and C would be foolish > > to burn %q for anything that does not match those semantics at the C > > language level; which means it will likely never be claimed by C and > > thus free for use by shell in the way that bash has already done. > [...] > > Note that %q is from ksh93, not bash and is not portable across > implementations and with most including bash's gives an output > that is not safe for reinput in arbitrary locales (as it uses > $'...' in some cases), not sure it's a good idea to add it to > the standard, or at least it should come with fat warnings about > the risk in using it. %q is NOT being added to Issue 8, but $'...' is. Bug 1771 asked if %q could be added to Issue 8, but it came it past the deadline for feature requests, so the best we could do is add a FUTURE DIRECTIONS blurb that mentions the idea. But since FUTURE DIRECTIONS is non-normative, we can always change our mind in Issue 9 and delete that text if it turns out we can't get consensus to standardize some form of %q/%Q after all. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: bug#65659: RFC: changing printf(1) behavior on %b
On Thu, Aug 31, 2023 at 03:10:58PM -0400, Chet Ramey wrote: > On 8/31/23 11:35 AM, Eric Blake wrote: > > In today's Austin Group call, we discussed the fact that printf(1) has > > mandated behavior for %b (escape sequence processing similar to XSI > > echo) that will eventually conflict with C2x's desire to introduce %b > > to printf(3) (to produce 0b000... binary literals). > > > > For POSIX Issue 8, we plan to mark the current semantics of %b in > > printf(1) as obsolescent (it would continue to work, because Issue 8 > > targets C17 where there is no conflict with C2x), but with a Future > > Directions note that for Issue 9, we could remove %b entirely, or > > (more likely) make %b output binary literals just like C. > > I doubt I'd ever remove %b, even in posix mode -- it's already been there > for 25 years. But the longer that printf(3) supports "%b" to output binary values, the more surprised new shell coders will be that printf(1) %b does not behave the same. What's more, other languages have already started using %b for binary output (python, for example), so it is definitely gaining in mindshare. That said, I also agree with your desire to keep the functionality in place. The current POSIX says that %b was added so that on a non-XSI system, you could do: my_echo() { printf %b\\n "$*" } and then call my_echo everywhere that a script used to depend on XSI echo (perhaps by 'alias echo=my_echo' with aliases enabled), for a much quicker portability hack than a tedious search-and-replace of every echo call that requires manual inspection of its arguments for translation of any XSI escape sequences into printf format specifications. In particular, code like [var='...\c'; echo "$var"] cannot be changed to use printf by a mere s/echo/printf %s\\n/. Thus, when printf was invented and standardized for the shell, the solution at the time was to create [printf %b\\n "$var"] as a drop-in replacement for XSI [echo "$var"], even for platforms without XSI echo. Nowadays, I personally have not seen very many scripts like this in the wild (for example, autoconf scripts prefer to directly use printf, rather than trying to shoe-horn behavior into echo). But assuming such legacy scripts still exist, it is still much easier to rewrite just the my_echo wrapper to now use %#s\\n instead of %b\\n, than it would be to find every callsite of my_echo. Bash already has shopt -s xpg_echo; I could easily see this being a case where you toggle between the old or new behavior of %b (while keeping %#s always at the old behavior) by either this or some other shopt in bash, so that newer script writers that want binary output for %b can do so with one setting, while scripts that must continue to run under old semantics can likewise do so. > > > But that > > raises the question of whether the escape-sequence processing > > semantics of %b should still remain available under the standard, > > under some other spelling, since relying on XSI echo is still not > > portable. > > > > One of the observations made in the meeting was that currently, both > > the POSIX spec for printf(1) as seen at [1], and the POSIX and C > > standard (including the upcoming C2x standard) for printf(3) as seen > > at [3] state that both the ' and # flag modifiers are currently > > undefined when applied to %s. > > Neither one is a very good choice, but `#' is the better one. It at least > has a passing resemblence to the desired functionality. Indeed, that's what the Austin Group settled on today after I first wrote my initial email, and what I wrote up in a patch to GNU Coreutils (https://debbugs.gnu.org/65659) > > Why not standardize another character, like %B? I suppose I'll have to look > at the etherpad for the discussion. I think that came up on the mailing > list, but I can't remember the details. Yes, https://austingroupbugs.net/view.php?id=1771 has a good discussion of the various ideas. %B is out for the same reason as %b: although the current C2x draft wording says that % is reserved for implementation use, other than [AEFGX] which already have a history of use by C (as it was, when C99 added %A, that caused problems for some folks), it goes on to _highly_ encourage any implementation that adds %b for "0b0" binary output also add %B for "0B0" binary output (to match the x/X dichotomy). Burning %B to retain the old behavior while repurposing %b to output lower-case binary values is thus a non-starter, while burning %#s (which C says is undefined) felt nicer. The Austin Group also felt that standardizing bash's behavior of %q/%Q for outputting quoted text, while too late for Issue 8, has a good chance of success, even though C says %q is reserved for standardization by C. Our reasoning there is tha
RFC: changing printf(1) behavior on %b
In today's Austin Group call, we discussed the fact that printf(1) has mandated behavior for %b (escape sequence processing similar to XSI echo) that will eventually conflict with C2x's desire to introduce %b to printf(3) (to produce 0b000... binary literals). For POSIX Issue 8, we plan to mark the current semantics of %b in printf(1) as obsolescent (it would continue to work, because Issue 8 targets C17 where there is no conflict with C2x), but with a Future Directions note that for Issue 9, we could remove %b entirely, or (more likely) make %b output binary literals just like C. But that raises the question of whether the escape-sequence processing semantics of %b should still remain available under the standard, under some other spelling, since relying on XSI echo is still not portable. One of the observations made in the meeting was that currently, both the POSIX spec for printf(1) as seen at [1], and the POSIX and C standard (including the upcoming C2x standard) for printf(3) as seen at [3] state that both the ' and # flag modifiers are currently undefined when applied to %s. [1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/printf.html "The format operand shall be used as the format string described in XBD File Format Notation[2] with the following exceptions:..." [2] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap05.html#tag_05 "The flag characters and their meanings are: ... # The value shall be converted to an alternative form. For c, d, i, u, and s conversion specifiers, the behavior is undefined. [and no mention of ']" [3] https://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html "The flag characters and their meanings are: ' [CX] [Option Start] (The .) The integer portion of the result of a decimal conversion ( %i, %d, %u, %f, %F, %g, or %G ) shall be formatted with thousands' grouping characters. For other conversions the behavior is undefined. The non-monetary grouping character is used. [Option End] ... # Specifies that the value is to be converted to an alternative form. For o conversion, it shall increase the precision, if and only if necessary, to force the first digit of the result to be a zero (if the value and precision are both 0, a single 0 is printed). For x or X conversion specifiers, a non-zero result shall have 0x (or 0X) prefixed to it. For a, A, e, E, f, F, g, and G conversion specifiers, the result shall always contain a radix character, even if no digits follow the radix character. Without this flag, a radix character appears in the result of these conversions only if a digit follows it. For g and G conversion specifiers, trailing zeros shall not be removed from the result as they normally are. For other conversion specifiers, the behavior is undefined." Thus, it appears that both %#s and %'s are available for use for future standardization. Typing-wise, %#s as a synonym for %b is probably going to be easier (less shell escaping needed). Is there any interest in a patch to coreutils or bash that would add such a synonym, to make it easier to leave that functionality in place for POSIX Issue 9 even when %b is repurposed to align with C2x? -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: encoding question
On Sat, Jul 15, 2023 at 10:41:49PM +, Thorsten Glaser via austin-group-l at The Open Group wrote: > Hi, > > I get that the POSIX locale must be a single-byte character locale > where all 256 octets are characters. I’ve got a question about the > wide character representation. > > Assuming my POSIX locale uses ASCII as encoding, I’ve got the whole > portable character set (and then some) in the first 128 codepoints, > which have the ASCII code as both octet SBCS value and wchar_t value. > In this scenario, is it permissible to map the other 128 codepoints > “high” i.e. to wchar_t values > 0x0100? You're not the first to ask this question. Here's a link to a proposed patch to glibc on the same topic just this month, after noting that musl has already dealt with it: https://sourceware.org/pipermail/libc-alpha/2023-July/149588.html https://sourceware.org/pipermail/libc-alpha/2023-July/150021.html https://www.openwall.com/lists/musl/2022/11/10/2 The conclusion in those links appears to be that it is compliant to have the 8-bit characters map to wchar_t codepoints that are not valid Unicode characters, but which are distinct enough to preserve all other properties needed to treat the POSIX locale as a single-byte locale with 256 "characters" and proper collation sequence without encoding errors. Whether the mapping is to the 0xdcXX or 0xdfXX range of reserved codepoints in Unicode is a matter of implementation choice; both choices exist in implementations already out there. > > I’m reading the standard as yes, but not asking already landed me > in trouble in the past so I’d rather… That's a wise course of action. And while maybe the standard could make this easier, the fact that there are already two commonly chosen ranges already in play is not going to make it easy to mandate a specific mapping. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used
On Fri, Apr 21, 2023 at 05:00:14PM +0200, Alejandro Colomar wrote: > > > > The wording I see in <https://austingroupbugs.net/view.php?id=1641#c6262> > > doesn't seem to cover the case of aliasing a sockaddr_storage as a > > protocol-specific address for setting other members. > > > > Aliasing rules don't allow one to declare an object of type > > sockaddr_storage and then fill the structure as if it were another > > structure, even if alignment and size are correct. We would need > > some wording that says something like: > > > > When a pointer to a sockaddr_storage structure is first aliased as a > > pointer to a protocol-specific address structure, the effective type > > of the object will be set to the protocol-specific structure. I'll add that as a comment to the Austin Group page; it seems like a reasonable statement of intent (POSIX already says that struct sockaddr_storage is sufficiently sized and aligned; all that remains is for the compiler to be aware that we intend to use a more-appropriate effective type once we have the storage allocated). > > > > This is similar to what happens when malloc(3) is assigned to a > > non-character type. That's a big hammer, but it does the job. Maybe > > we would need some looser language? I CCd GCC, in case they have > > concerns about this wording. > > > > Cheers, > > Alex > > > >> > >> I quite like this way of putting it. It subsumes both what I wrote and > >> the related potential headache with deciding whether the sa_family_t > >> field is considered an object or just a range of bytes within a larger > >> object. > >> > >> zw > > > > For the man pages, I've rewritten it to the following: > > > $ git diff > diff --git a/man3type/sockaddr.3type b/man3type/sockaddr.3type > index 2fdf56c59..e610aa0f5 100644 > --- a/man3type/sockaddr.3type > +++ b/man3type/sockaddr.3type > @@ -117,6 +117,14 @@ .SH HISTORY > was invented by POSIX. > See also > .BR accept (2). > +.PP > +These structures were invented before modern ISO C strict-aliasing rules. > +If aliasing rules are applied strictly, > +these structures would be impossible to use Maybe "extremely difficult" instead of "impossible" to use (if I understand this thread correctly, it is possible to memcpy() from one struct into different storage of a different effective type where the memcpy()'s intermediate aliasing through char* avoids the UB). > +without invoking Undefined Behavior (UB). > +POSIX Issue 8 will fix this by requiring that implementations > +make sure that these structures > +can be safely used as they were designed. > .SH NOTES > .I socklen_t > is also defined in > > > I guess this is simple enough that it should work as documentation. It seems fine from my perspective. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used
On Thu, Apr 06, 2023 at 02:05:15PM -0400, Zack Weinberg wrote: > On Thu, Apr 6, 2023, at 12:31 PM, Alejandro Colomar via Libc-alpha wrote: > > On 4/6/23 18:24, Eric Blake wrote: > >> here's the updated wording that the Austin Group tried today (and we > >> plan on starting a 30-day interpretation feedback window if there are > >> still adjustments to be made to the POSIX wording): > >> > >> https://austingroupbugs.net/view.php?id=1641#c6255 > > > > Thanks! That wording (both paragraphs) LGTM. > > If I could suggest an additional change, the focus on aliasing > _diagnostics_ rather misses the point IMHO. We don't just want the > compiler to _not complain_ about accesses to sa_family_t, we want it to > treat the accesses as _legitimate_. So, instead of > > # Additionally, the structures shall be defined in such a way that > # these casts do not cause the compiler to produce diagnostics about > # aliasing issues in accessing the sa_family_t member of these > # structures when compiling conforming application (xref to XBD section > # 2.2) source files. > > may I suggest wording along the lines of > > # Additionally, the structures shall be defined in such a way that > # the compiler treats an access to the stored value of the sa_family_t > # member of any of these structures, via an lvalue expression whose type > # involves any other one of these structures, as permissible, despite the > # more restrictive rules listed in ISO C section 6.5p7. I like it as an improvement; I've added your suggestion to the POSIX bug report as one of the comments received during the 30-day interpretation window, to see what the other standards developers think. Since Issue 7 is tied to C99, and Issue 8 will be tied to C17, both of which use the same section number despite being a different edition of the C standard, being that specific may work. Or, we might try something focusing more on wording instead of document location, as in: Additionally, the structures shall be defined in such a way that the compiler treats an access to the stored value of the sa_family_t member of any of these structures, via an lvalue expression whose type involves any other one of these structures, as permissible even if the types involved would not otherwise be deemed compatible with the effective type of the object ultimately being accessed. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Austin Group questions on iconv()
In today's Austin Group meeting, the folks discussing POSIX had a question for Bruno and/or anyone else with an idea on how the standards should approach a difference in behavior between Solaris and GNU iconv() implementations. For context, today's meeting minutes: https://posix.rhansen.org/p/2023-03-09 around line 1635 and the bugs leading to the question: https://austingroupbugs.net/view.php?id=1635 "0001635: iconv: please be more explicit in input-not-convertible case" still open - iconv() resulting in EILSEQ not because of input encoding error but because of output being unable to encode the transliteration https://austingroupbugs.net/view.php?id=1007 "0001007: iconv function not allowed to fail to convert valid sequences" resolved at https://austingroupbugs.net/view.php?id=1007#c3330, standardizing the //IGNORE, //TRANSLIT, and //NON_IDENTICAL_DISCARD modifiers It seems that bug 1635 is saying that the Solaris implementation provides a conversion that application writers can use to get reliable output but does not provide some desired features, and the standard should change to acknowledge that the GNU implementation provides some of those desired features. However, the GNU implementation includes some ambiguities that make it unreliable. It seems to ask us to change the standard to allow a modified version of the GNU iconv() function that could be reliably interpreted by an appication writer. For example, overloading EILSEQ to mean that there was an invalid character in the input stream or that there was no transliteration available in the output codeset to convert that input character makes it impossible for an application to determine which of those two problems caused iconv() to fail. Can we get an explanation on how an application writer is supposed to write code to reliably use the iconv() in GNU libc, given the above example? Can we get help in identifying exactly what changes need to be made to POSIX (after bugid:1007 has been integrated) to allow GNU behavior and get reliable results without breaking applications that currently work with the Solaris iconv() interface. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2008)/Issue 7 0000561]: NUL-termination of sun_path in Unix sockets
On Wed, Nov 30, 2022 at 08:54:03AM -0600, Eric Blake via austin-group-l at The Open Group wrote: > > ... > > |https://austingroupbugs.net/view.php?id=561 > > First, I chose that wording because 'sizeof(struct > sockaddr_un.sun_path)' doesn't compile. You are right that 'sizeof > NAME.sun_path' does compile, if NAME is an expression of type struct > sockaddr_un, but the sentence becomes longer to introduce some object > named NAME of the correct type just to get to the shorter sizeof > expression. However, we can make that edit if it makes sense. Having written that, I did test that 'sizeof(((struct sockaddr_un*)0)->sun_path)' compiles with gcc, although I'm less certain of whether the C standard permits that (or even if that permission has changed over time) - the expression argument to sizeof is unevaluated, which counters the argument that you can't normally evaluate a dereference of a NULL pointer. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2008)/Issue 7 0000561]: NUL-termination of sun_path in Unix sockets
On Mon, Nov 28, 2022 at 07:30:36PM +0100, Steffen Nurpmeso via austin-group-l at The Open Group wrote: > Austin Group Bug Tracker wrote in > : > ... > |https://austingroupbugs.net/view.php?id=561 > ... > |-- > | (0006085) geoffclare (manager) - 2022-11-28 16:24 > | https://austingroupbugs.net/view.php?id=561#c6085 > |-- > ... > |char sun_path[size] Socket pathname > |storage. > ... > |[.] However, because sun_path is required to be the > |last member of the struct, an application can deduce the size by using > |sizeof(struct sockaddr_un) - offsetof(struct sockaddr_un, > |sun_path). > > I am glued to old habits, but given it is the last field and of > a known fixed size sizeof(NAME.sun_path) should be all that is > necessary. (It definitely is in practice.) > (And all this different to SUN_LEN(), of course.) Two comments in response: First, I chose that wording because 'sizeof(struct sockaddr_un.sun_path)' doesn't compile. You are right that 'sizeof NAME.sun_path' does compile, if NAME is an expression of type struct sockaddr_un, but the sentence becomes longer to introduce some object named NAME of the correct type just to get to the shorter sizeof expression. However, we can make that edit if it makes sense. Second, given alignment issues, a choice of an odd size coupled with other members that require even alignment could permit an implementation where sizeof(struct sockaddr_un) > offsetof(struct sockaddr_un, sun_path) + sizeof(NAME.sun_path) due to padding bytes added for alignment reasons. I don't know of any such implementations in practice (the choice of 92, 104, and 108 as the most common sizes tends to be so that the overall struct sockaddr_un has a size of 128 bytes, which is a nice power-of-two boundary). Then again, intentionally forcing struct sockaddr_un to have a padding byte after sun_path might be an implementation's way of guaranteeing that it can handle a NUL byte even if the application didn't pass one in. Therefore, do we need to modify the wording in this proposal to ensure that struct sockaddr_un is not allowed to have padding bytes after sun_path to match existing practice? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2016/18)/Issue7+TC2 0001457]: Add readlink(1) utility
On Fri, Jul 22, 2022 at 05:04:09PM +0100, Jonathan Wakely wrote: > On Fri, 22 Jul 2022 at 15:53, Robert Elz via austin-group-l at The > Open Group wrote: > > Aside from that possibility the only reason would seem to be the same > > as why echo (real ones) have -n (and trashy ones have \c) and why > > printf(1) needs a \n to print one ... there are times that it is useful > > to write a partial line to stdout (or wherever) and there's no reason > > that the output of readlink could not be intended to be a part of such > > a gradually constructed output line. > > But then shouldn't *every* command that prints output have a -n option? > > If you need to include the output of readlink in gradually constructed > output you can do what you have to do with other commands: > > printf '%s' "$(readlink foo)" That strips trailing newlines that may have been important. The link contents $'abc' and $'abc\n' are indecipherable under your approach of a path through $() and printf. If you are going to output a constructed filename to stdout, you really DO want: readlink -n foo && echo /newfile to produce the output "link/content/newfile" when foo contains 'link/content', and still handle the case where foo's content is instead something with a trailing newline. > > The fact that echo and printf have that feature means you don't need > it everywhere. You don't need it for utilities that are seldom used in generating partial file names; but for programs like dirname and readlink, providing a simpler way to use the utility in the context of building up a larger file name without losing intermediate trailing newlines that would be eaten by $() is enough of a worry that adding things like -n to make it more useful was worthwhile to the implementors. I'm aware that 'dirname -n' is not common implementation practice, but since 'readlink -n' does appear to be, there's no harm in standardizing it that way. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2016/18)/Issue7+TC2 0001457]: Add readlink(1) utility
On Fri, Jul 22, 2022 at 09:26:45AM +0200, Quentin Rameau via austin-group-l at The Open Group wrote: > Hello, > > > == > > https://austingroupbugs.net/view.php?id=1457 > > == > > > == > > Summary:Add readlink(1) utility > > == > > > -nDo not output a trailing > > character. > > Out of curiosity, what's a use-case for that? Good question. My initial thought was that the construct: var=$(readlink -- "$name") will NOT assign var to the correct contents if $name is a symlink that resolves to a string containing trailing newlines, as $() would strip not only the newline added by readlink, but also the newlines from the link contents. But using: var=$(readlink -n -- "$name") will not fare any better; it will also strip trailing newlines from the link content. The only reliable way to accurately capture the contents of a symlink in a shell variable is to do something like: tmp=$(readlink -n -- "$name"; printf .) var=${tmp%.} at which point the addition of -n doesn't really help, because you could also do: tmp=$(readlink -- "$name"; printf .) var=${tmp%?.} with fewer characters typed. So the only actual answer I can come up with is "existing practice in readlink implementations in the wild", where we'd have to ask the program designers why they thought -n was useful. [If readlink is implemented as a shell builtin, then you could have an extension where: readlink -v var -n -- "$name" assigns $var to the full symlink contents, without any extra or stripped newlines, but such an extension is not what we are proposing to standardize] -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: Can struct sockaddr_un.sun_path be a flexible array member?
On Sun, Jul 17, 2022 at 03:46:52PM -0700, Nick Stoughton via austin-group-l at The Open Group wrote: > Note that a flexible array member is not the same thing as a variable > length array, and although both entered the standard in C99, previous > versions allowed the FAM to be specified as an array of length 0. > > The C standard notes that: > > In most situations, the flexible array member is ignored. In particular, > the size of the structure is as if the flexible array member were omitted > ... > and "sizeof" does just that (omits the flexible array member). > > The normative text does not seem to preclude the use of a flexible array > member but does not specify any mechanism to obtain the size if it were so. > I believe that it is a bug in the standard that it is not made clearer that > the implementation should define the size somehow. I know of no > implementation that uses a flexible array here. Please feel free to submit > a bug to austingroupbugs.net with this. Or better yet, help with amending the existing bug to propose the desired wording changes: https://www.austingroupbugs.net/view.php?id=561 Based on an earlier meeting, our current thoughts are: - Add requirement that sun_path be last member of struct sockaddr_un, and that it have a constant (although unspecified) size rather than being an open array - Add application usage to functions dealing with sockname to recommend memory > sizeof(struct sockaddr_un) preinitialized to 0 when it is desired to ensure NUL termination - Leave SUN_LEN out of the standard; we don't want variable-length sun_path -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: Latest on POSIX efforts to standardize gettext
On Thu, May 05, 2022 at 09:31:41AM -0500, Eric Blake via austin-group-l at The Open Group wrote: > Hello GNU and Illumos folks, > > The Austin Group (those in charge of the POSIX specification) have > been working on a draft to incorporate the gettext(3) family of > functions and related gettext(1) utilities into the next revision of > POSIX (per https://austingroupbugs.net/view.php?id=1122). After > several months of near-weekly conference calls, the latest draft of > the work has finally reached the point where it is ready for more > thorough analysis by a wider group of readers. You can view the > current state of the draft here: > > https://posix.rhansen.org/p/gettext_draft Another question came up today (line 1172 in the draft at the time I wrote this email). Given the following test file test.c: #include #include int main(){ printf("%s\n",dgettext("foobar","test")); } Running "xgettext test.c", on Solaris, the resulting .po file is called "foobar.po" and contains the msgid "test". Running it on GNU, the resulting .po file is called "messages.po" and there is no indication that the msgid belongs to "foobar". According to the L18nux specification, the Solaris behavior is intended. Why does GNU xgettext deviate? Knowing whether this is considered a bug that future GNU xgettext will fix, vs. intentional behavior that the standard should purposefully not constrain, can impact what wording is chosen for the standard here. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Latest on POSIX efforts to standardize gettext
Hello GNU and Illumos folks, The Austin Group (those in charge of the POSIX specification) have been working on a draft to incorporate the gettext(3) family of functions and related gettext(1) utilities into the next revision of POSIX (per https://austingroupbugs.net/view.php?id=1122). After several months of near-weekly conference calls, the latest draft of the work has finally reached the point where it is ready for more thorough analysis by a wider group of readers. You can view the current state of the draft here: https://posix.rhansen.org/p/gettext_draft In particular, this draft has an action item to me to reach out to you on the following question (currently found at line 1138 of that document, or search for "A.I."): In the msgfmt(1) utility, there is currently a difference between GNU and Illumos implementations on detecting duplicate msgid strings, and which command line switch(es) make detection of duplicates possible. The question is whether GNU msgfmt would be willing to use the current -c option (--check) have a mode for erroring out on duplicate msgid strings, or even adding a new command line option (-n appears to be available, for a mnemonic of 'no dupes') to have the duplicate detection available without requiring -c. In addition to answering that question, any review of the rest of the proposed wording (particularly anything that is still colored and thus represents edits since the last time we asked for review) is still appreciated. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Issue 8 drafts 0001556]: clarify meaning of \n used in a bracket expression in a sed context address or s-command
Adding bug-...@gnu.org into this conversation. On Mon, Apr 25, 2022 at 02:50:22AM +0200, Christoph Anton Mitterer via austin-group-l at The Open Group wrote: > Hey. > > Geoff, I haven't had time yet to look at your updated proposal of > #1550, not sure whether I manage to do it this night or in the next > days. > But I'll definitely reply, so please be a bit more patient. :-) > > > However, on thing came to my minds again, which I think needs further > discussion... > > > > The current "solution" to a number of previous problems is: > > Inside a bracket expression there cannot be any escape sequences. > Therefore, there cannot be any \n (in the sense of ) nor any > \c (in the sense of "un-delimitering" the delimiter character c). > > > While this is per se perfectly valid (and solves numerous issues), it > has one problem: > > (at least) GNU sed breaks it already! > > > > As you noted yourself in > https://www.austingroupbugs.net/view.php?id=1556#c5621 > > it requires POSIXLY_CORRECT=1 to work as it should. > > $ printf 'a\\b\n' | sed 's/a[\n]b/X/' > a\b > $ printf 'a\nb\n' | sed 's/a[\n]b/X/' > a > b > $ printf 'a\nb\n' | sed -z 's/a[\n]b/X/' > X > $ printf 'anb\n' | sed 's/a[\n]b/X/' > anb > $ export POSIXLY_CORRECT=1 > $ printf 'a\\b\n' | sed 's/a[\n]b/X/' > X > $ printf 'a\nb\n' | sed 's/a[\n]b/X/' > a > b > $ printf 'a\nb\n' | sed -z 's/a[\n]b/X/' > a > b > $ printf 'anb\n' | sed 's/a[\n]b/X/' > X > $ > > > NOT so for GNU's extension of '\s': > '\s' > Matches whitespace characters (spaces and tabs). Newlines > embedded in the pattern/hold spaces will also match... > (and I assume neither for any similar such extensions): > > $ printf 'asb\n' | sed 's/a[\s]b/X/' > X > $ printf 'a\\b\n' | sed 's/a[\s]b/X/' > X > $ printf 'a b\n' | sed 's/a[\s]b/X/' > a b > $ export POSIXLY_CORRECT=1 > $ printf 'asb\n' | sed 's/a[\s]b/X/' > X > calestyo@heisenberg:~$ printf 'a\\b\n' | sed 's/a[\s]b/X/' > X > calestyo@heisenberg:~$ printf 'a b\n' | sed 's/a[\s]b/X/' > a b > $ > > > It also works as expected for escaped delimiter characters: > $ printf 'aDb\n' | sed 'sDa[\D]bDXD' > X > $ printf 'a\\b\n' | sed 'sDa[\D]bDXD' > X > > even when the delimiter char has also special meaning when escaped (as > with '\s'): > $ printf 'asb\n' | sed 'ssa[\s]bsXs' > X > $ printf 'a\\b\n' | sed 'ssa[\s]bsXs' > X > $ printf 'a b\n' | sed 'ssa[\s]bsXs' > a b > > > (all the above with GNU sed 4.8). > > > So the only problematic case seems to be '\n'. > > > > I don't want to step on anyone's toes... but GNU sed is probably one of > the (if not the) major implementation of sed, isn't it? > > > And regardless of POSIXLY_CORRECT, the standard describes now a > behaviour (namely that the bracket expression [\n] is the literal > characters '\' or 'n' and *not* )... which is not shared by a > major implementation, at least not with its default settings. > > Anyone who reads the standard would assume that [\n] is not a > . > And of course we could just say "well your implementation is not > compliant" or "look at it's documentation, where it says about > POSIXLY_CORRECT" ... but that doesn't seem so good to me. > > Usually, implementations extend POSIX rather gracefully, but this is a > more serious deviation. > > > I mean should we just leave it at that? > > Or should we add some hint, e.g. indicating that portable applications > should not use '\n' but rather 'n\' ... or perhaps even generally place > '\' last in the bracket expression? > > > The best would of course be to get GNU change it's behaviour, though I > have no idea how likely that is ;-) > > I had tried to reach out to GNU and BusyBox sed maintainers before, and > while I got replies from BusyBox' I couldn't get in touch with GNU's. > > Is there anyone who's in contact with these people? The GNU sed developers can be reached at bug-...@gnu.org (per the output of 'sed --help', and as done in this email). So if I'm restating your complaint correctly, you are worried that GNU sed's non-POSIX behavior (what you get by default when POSIXLY_CORRECT is not set) treats the four-byte sequence '[\n]' in an s-command regex as a bracket expression for the single character of a literal newline (that is, interpreting \n as an escape sequence even though it is inside a bracket expression), instead of as a bracket expression for either of a literal backslash or literal n; but concur that its behavior when being POSIX-compliant matches the POSIX rules. POSIX can't control what GNU sed does when in non-POSIX mode
Re: how do to cmd subst with trailing newlines portable (was: does POSIX mandate whether the output…)
filtering out any who really are considered as that. > > That gave quite some matches: > BRF.gz: /x2e BRAILLE PATTERN DOTS-46 > BRF.gz: /x2f BRAILLE PATTERN DOTS-34 > EBCDIC-AT-DE-A.gz: /x2e ACKNOWLEDGE (ACK) > EBCDIC-AT-DE-A.gz: /x2f BELL (BEL) charmaps are useful to iconv in converting file contents between more encodings that are possible than what is permitted in locales. > IBM918.gz: /x2f BELL (BEL) > INIS-CYRILLIC.gz: /x2e RIGHTWARDS ARROW > INIS-CYRILLIC.gz: /x2f INTEGRAL > ISO_10646.gz: /x01/x2ELATIN CAPITAL LETTER I WITH OGONEK > ISO_10646.gz: /x01/x2FLATIN SMALL LETTER I WITH OGONEK > ISO_10646.gz: /x04/x2ECYRILLIC CAPITAL LETTER YU > ISO_10646.gz: /x04/x2FCYRILLIC CAPITAL LETTER YA > ISO_10646.gz: /x06/x2EARABIC LETTER KHAH > ISO_10646.gz: /x06/x2FARABIC LETTER DAL > ISO_10646.gz:/x1E/x2ELATIN CAPITAL LETTER I WITH DIAERESIS > AND ACUTE > ISO_10646.gz:/x1E/x2FLATIN SMALL LETTER I WITH DIAERESIS AND > ACUTE > ISO_10646.gz: /x22/x2ECONTOUR INTEGRAL > ISO_10646.gz:/x25/x2EBOX DRAWINGS RIGHT HEAVY AND LEFT DOWN > LIGHT > ISO_10646.gz:/x25/x2FBOX DRAWINGS DOWN LIGHT AND HORIZONTAL > HEAVY > ISO_11548-1.gz: /x2e BRAILLE PATTERN DOTS-2346 > ISO_11548-1.gz: /x2f BRAILLE PATTERN DOTS-12346 > JIS_C6220-1969-JP.gz: /x2EKATAKANA LETTER > SMALL YO > JIS_C6220-1969-JP.gz: /x2FKATAKANA LETTER > SMALL TU > > Since all these (well except perhaps ISO_10646) use 0x2E and 0x2F for > other characters than . and / ... doesn't that already mean that > they're invalid with respect to POSIX? Not quite. You didn't ALSO check whether those charmaps define as something that overlaps with a multibyte character. But you are right that there are some charmaps which iconv can support but which cannot be used as a locale in a given POSIX environment. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2016/18)/Issue7+TC2 0001440]: Calling `system("-some-tool")` fails (although it is a valid `sh` command)
On Sat, Oct 30, 2021 at 08:21:55PM -0400, Wayne Pollock via austin-group-l at The Open Group wrote: > Is it guaranteed that on conforming systems nohup (and friends) must not > accept or > delete the first "--"? For the example to work, nohup must not discard the > "--". > But might it? I'm not sure why you claim nohup would not work if it discards "--". Just because the standard does not require nohup to accept options does not mean that implementations cannot have options as an extension. > > Section 1.4 "Utility Description Defaults" of the Introduction states > "... Default Behavior: When this section is listed as "None.", it means that > the > implementation need not support any options. Standard utilities that do not > accept > options, but that do accept operands, shall recognize "--" as a first > argument to be > discarded. ..." > > And nohup fits that description; its OPTIONS section is listed as "None". Correct, and that text does not need changing. As you correctly quoted, that means that nohup MUST accept and discard an initial "--", the same as basename (another utility where I have seen the common bug of handling -- incorrectly in some implementations). If you want to invoke another app that may begin with "-", or if you want to ensure that a later "--" is passed to the utility itself regardless of whether nohup has the (non-standard) extension of reordering options after arguments, you can always write: nohup -- $utility -- $non_option And a quick test demonstrates that at least GNU Coreutils' nohup is compliant (it supports long options, which are already an extension to the standard, but not short options; but it does honor -- for attempting to execute $utility that may begin with -): $ POSIXLY_CORRECT=1 nohup -- printf -- abc 2>/dev/null | cat abc $ POSIXLY_CORRECT=1 nohup printf -- abc 2>/dev/null | cat abc $ nohup --version | head -n1 nohup (GNU coreutils) 8.32 $ nohup -- --version nohup: ignoring input and appending output to 'nohup.out' nohup: failed to run command '--version': No such file or directory $ rm nohup.out $ > Maybe nohup needs to be among the utilities that do not recognize "--". No. While we are explicit that echo is one of the few apps needing an exception to not recognize "--", that exception does NOT need to apply to nohup. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: Interpretation starting for a 30 day review (1440)
On Sat, Oct 30, 2021 at 12:46:55AM +0700, Robert Elz via austin-group-l at The Open Group wrote: > Date:Fri, 29 Oct 2021 10:00:04 -0700 > From:Nick Stoughton > Message-ID: > > > | Just for reference, the C standard says: > > Thanks, it was a little hard to imagine just how they would be > able to (with a straight face) talk about args to "sh" ... > > | So I agree, we should change the wording here so that for Issue 7 we only > | state what implementations should expect to do when Issue 8 comes out, and > | give application developers strong warnings about how to work around the > | issues caused by the possible (certain?) loss of the '--' in existing > | implementations. > > If there was going to be a new Issue 7 rev, before Issue 8, that would > perhaps be a plausible approach - but unless something has changed, and > Issue8 is not to be the next version released, that doesn't really work. Another thing to consider: if enough implementations fix things NOW to use "--" in system() and popen(), then by the time we actually DO release Issue 8, it will already be common enough practice to standardize it. But I also agree with your argument that at a bare minimum, we owe the reader some Rationale text explaining that older versions of the standard did not require sane behavior for arguments starting with '-' or '+', and that applications can always space-stuff their commands to ensure desired behavior regardless of whether the underlying implementation has Issue7 or Issue8 semantics (if we go ahead and require "--" in Issue8). At any rate, I've now filed a glibc bug, so we'll see what other libc authors think about both the POSIX bug and your reaction about it being premature to standardize a requirement of "--" (vs. just merely recommending it and documenting what portable apps must do in the meantime). https://sourceware.org/bugzilla/show_bug.cgi?id=28519 -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Question regarding gettext behavior on iconv failure
Hello GNU gettext maintainers, In today's Austin Group meeting, we developed an example of using the proposed POSIX standardization of gettext() and encountered a situation where we felt that GNU gettext may have a bug. For context, the entire example is at: https://posix.rhansen.org/p/gettext_split The example in question set up several .po files and a specific environment to test various pluralization/transcoding fallbacks, and concludes with a snippet where a string with an encoding error in ISO-8859-1 is output in spite of an iconv failure, rather than the string passed in to ngettext(): n_recipients = 1; // The following outputs "1 Empfänger" encoded in UTF-8: printf("%s\n", ngettext("recipient", "recipients", n_recipients)); bind_textdomain_codeset("mail", "ASCII"); n_recipients = 1; // The following outputs "recipient" with the same encoding as the "recipient" // argument to ngettext (remember, the the system is assumed to not support // conversion from ISO/IEC 8859-1 to ASCII): printf("%s\n", ngettext("recipient", "recipients", n_recipients)); // On GNU gettext, "1 Empfänger" is output in ISO-8859-1 here (i.e. no conversion is done). I think we already agreed on considering this behavior a bug, This raises a few questions: does the GNU gettext team agree that this can be considered a bug, and if so, will a future gettext release behave differently? Or if it is intentional and not a bug, can you provide justification for the behavior as well as tweaks to the proposed standard wording for gettext requirements and the worked example? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: SIGSTKSZ is now a run-time variable
On 3/9/21 1:34 PM, Eric Blake via austin-group-l at The Open Group wrote: > On 3/9/21 10:14 AM, shwaresyst wrote: >> >> To me that looks like a conformance violation and should be reverted. There >> is no _SC_SIGSTKSZ defined in by the standard, to begin with, so >> that use of sysconf() is a non-portable extension on its own. > > Portable apps can't use _SC_SIGSTKSZ, but the standard generally permits > implementations to define further constants. Then again, re-reading XSH > 2.2.2: > > " Implementations may add symbols to the headers shown in the following > table, provided the identifiers for those symbols either: > > Begin with the corresponding reserved prefixes in the table, or > ..." > > but the table lacks a row for with _CS_* and _SC_* constants. > Looks like you found an independent defect. Not quite, because later it states "The following identifiers are reserved regardless of the inclusion of headers: 1. With the exception of identifiers beginning with the prefix _POSIX_, all identifiers that begin with an and either an uppercase letter or another are always reserved for any use by the implementation.", so an implementation can blindly add _SC_* constants at will without violating the standard. Still, I opened: https://www.austingroupbugs.net/view.php?id=1456 to try and add some clarification. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: SIGSTKSZ is now a run-time variable
On 3/9/21 10:14 AM, shwaresyst wrote: > > To me that looks like a conformance violation and should be reverted. There > is no _SC_SIGSTKSZ defined in by the standard, to begin with, so > that use of sysconf() is a non-portable extension on its own. Portable apps can't use _SC_SIGSTKSZ, but the standard generally permits implementations to define further constants. Then again, re-reading XSH 2.2.2: " Implementations may add symbols to the headers shown in the following table, provided the identifiers for those symbols either: Begin with the corresponding reserved prefixes in the table, or ..." but the table lacks a row for with _CS_* and _SC_* constants. Looks like you found an independent defect. > > I could see the definition of SIGSTKSZ being changed to the static minimum a > particular processor requires, or is initially allocated as a 'safe' amount, > rather than static "default size", and moving SIGSTKSZ to . This > would contrast to MINSIGSTKSZ as the lowest value for a platform for all > supported processors. Then an application could use sysconf() to query for > the maximum size the configuration supports if it wants to use more than > that, as a runtime increasable limit. As I understand it, the concern in glibc is less about runtime increasability, so much as ABI compatibility with applications compiled against older headers at a time when the kernel had less state information to store during a context switch. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: SIGSTKSZ is now a run-time variable
On 3/9/21 9:26 AM, Andreas Schwab wrote: > On Mär 09 2021, Eric Blake via Libc-alpha wrote: > >> The question becomes whether glibc is in violation of POSIX for having >> made the change, or whether POSIX needs to be amended to allow SIGSTKSZ >> to be non-preprocessor-safe and/or non-constant. > > POSIX already allows non-preprocessor-safe. True, but expanding 'SIGSTKSZ' to 'sysconf (_SC_SIGSTKSZ)' is not a symbolic constant., as it is not "a compile-time constant expression with an integer type', per definition 3.380. Looks like this discussion is happening in parallel in: https://sourceware.org/bugzilla/show_bug.cgi?id=20305 I can open a defect against POSIX if we decide that is needed, but want some consensus first on whether it is glibc's change that went too far, or POSIX's requirements that are too restrictive for what glibc wants to do. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: SIGSTKSZ is now a run-time variable
[adding glibc and Austin group lists] On 3/6/21 12:50 PM, Bruno Haible wrote: > Hi, > > Carol Bouchard wrote in > <https://lists.gnu.org/archive/html/bug-m4/2021-03/msg0.html>: >> A change that was introduced is the >> #define SIGSTKSZ is no longer a statically defined variable. It's value can >> only be determined at run time. >> >> # define SIGSTKSZ sysconf (_SC_SIGSTKSZ) > > This is invalid. POSIX:2018 [1] defines two lists of macros: > > 1) "The header shall define the following macros which shall > expand to integer constant expressions that need not be usable in > #if preprocessing directives:" > > 2) "The header shall also define the following symbolic > constants:" > > SIGSTKSZ is in the second list. This implies that it must expand to a constant > and that it must be usable in #if preprocessing directives. The question becomes whether glibc is in violation of POSIX for having made the change, or whether POSIX needs to be amended to allow SIGSTKSZ to be non-preprocessor-safe and/or non-constant. > > Besides being invalid, it is also not needed. The alternate signal stack > needs to be dimensioned according to the CPU and ABI that is in use. For > example, > SPARC processors tend to use much more stack space than x86 per function > invocation. Similarly, 64-bit execution on a bi-arch CPU tends to use more > stack > space than 32-bit execution, because return addresses and other pointers are > 64-bit vs. 32-bit large. But once you have fixed the CPU and the ABI, there is > no ambiguity any more. > >> This affects m4 code since the code assumes a statically defined variable >> which >> can be determined at preprocessor time. > > POSIX guarantees this assumption. > >> Please advise how I can get past this. > > Fix your . https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=6c57d320484988e87e446e2e60ce42816bf51d53 shows where glibc made the change, and I've now seen reports of several projects failing to build when using glibc with this change included. > > Bruno > > [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html > > -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2016)/Issue7+TC2 0001345]: date(1) default format
On 7/13/20 4:07 AM, Geoff Clare wrote: J William Piggott wrote, on 12 Jul 2020: On Mon, 6 Jul 2020, Geoff Clare wrote: There is no way we are going to change the required d_t_fmt value for the POSIX locale. Why? Because every implementation would have to change, and all applications that rely on the current value would potentially break (depending on which fields they use). We would need a very good reason to make such a change. Has it been discussed with 'we'? Would any of them like to comment on this please? I've been using "we" to refer to the whole Austin Group, i.e. everyone on this mailing list, so yes it has been discussed (in this thread). The three people who's opinions matter are the organisational representatives who would vote on it if it came to that. I'm sure if any of them disagree with what I've said they will comment. To make matters clearer, as one of the organisational reps (Open Group), I'm of the mind that mandating a change to existing practice is undesirable. Standardizing a new format value (especially if there is existing practice to copy from) or adding better documentation to make it clear about the intentional differences between strftime vs. date are both less invasive than a mandatory change to the contents of an existing format value. So I'm concurring with Geoff's handling of the responses so far. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2008)/Issue 7 0000411]: adding atomic FD_CLOEXEC support
On 3/12/20 1:02 PM, shwaresyst wrote: Fyi, the Last updated: date at top wasn't changed. On Thursday, March 12, 2020 Austin Group Bug Tracker wrote: A NOTE has been added to this issue. -- (0004796) eblake (manager) - 2020-03-12 16:35 https://www.austingroupbugs.net/view.php?id=411#c4796 -- minor tweak to the attached files to fix an instance of O_CLOEXEC that should be SOCK_CLOEXEC in relation to accept4(). Thanks. I'll re-upload with that additional date tweak. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2008)/Issue 7 0000252]: dot should follow Utility Syntax Guidelines
On 2/4/20 9:29 AM, Eric Blake wrote: On 2/4/20 9:16 AM, Robert Elz wrote: I am putting this in a new thread, as it isn't really important, more just amusing, but the solution to this issue, with respect to the "." command, is I think, causing that command to be in violation of the standard (in a completely different way than what the previous discussion is about). The resolution of the issue (as was previously noted) adds the words: The dot special built-in shall support XBD Section 12.2 (on page 215). Last time I looked, "." was neither a lower case letter, nor a digit, in any character set. Hence, the resolution of this issue has caused a contradiction in the standard - those guidelines are both required, and ignored, all in the same command. We could fix all this by changing the name of the "." command, probably to "source" as that's already supported by some shells, but is this degree of penantry really important, or do we just live with the standard being inconsistent with itself? Good catch. However, I don't think we can require the name 'source'; better would be a fix along the lines of what we do for 'tail', in documenting explicit exceptions to the XBD guidelines. Something like: The dot special built-in shall support XBD Section 12.2 (on page 215), except that it does not comply to Guideline 1 or 2 due to its name. or maybe The dot special built-in shall support Guidelines 3-14 of XBD Section 12.2 (on page 215). For what it's worth, the standard was already self-contradictory for the [ utility; it is also required to support XBD Section 12.2 with an exception for Guideline 10; but would need a similar exemption for Guideline 1 and 2. I don't think reopening bug 252 is correct, but a new bug fixing both '.' and '[' would be in order. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2008)/Issue 7 0000252]: dot should follow Utility Syntax Guidelines
On 2/4/20 9:16 AM, Robert Elz wrote: I am putting this in a new thread, as it isn't really important, more just amusing, but the solution to this issue, with respect to the "." command, is I think, causing that command to be in violation of the standard (in a completely different way than what the previous discussion is about). The resolution of the issue (as was previously noted) adds the words: The dot special built-in shall support XBD Section 12.2 (on page 215). Last time I looked, "." was neither a lower case letter, nor a digit, in any character set. Hence, the resolution of this issue has caused a contradiction in the standard - those guidelines are both required, and ignored, all in the same command. We could fix all this by changing the name of the "." command, probably to "source" as that's already supported by some shells, but is this degree of penantry really important, or do we just live with the standard being inconsistent with itself? Good catch. However, I don't think we can require the name 'source'; better would be a fix along the lines of what we do for 'tail', in documenting explicit exceptions to the XBD guidelines. Something like: The dot special built-in shall support XBD Section 12.2 (on page 215), except that it does not comply to Guideline 1 or 2 due to its name. or maybe The dot special built-in shall support Guidelines 3-14 of XBD Section 12.2 (on page 215). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2008)/Issue 7 0000252]: dot should follow Utility Syntax Guidelines
shell (and dash, bosh, and pdksh - so maybe ksh88) it works just fine. Keeping that complicates all of this - otherwise I would simply implement "if there is exactly one arg it must be intended to be the required file name, whatever it looks like, otherwise..." and things would be a little simpler. kre ps: if it turns out I know someone in the balloting group for final approval of this, I'd suggest to them that they do not approve the new version while it contains the kind of incompatibility and breakage for no particularly good reason that seems to me to exist. But I think you've overlooked the fact that you ARE allowed to have the extension behavior for all except '--' that preserves your goal of "if there is one argument, it is a filename even if it starts with '-'", while still remaining compliant to the standard's "'. -- "$arg"' must treat $arg as a filename even if it starts with '-'". -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: $/ as a textual exit status (Was [1003.1(2016)/Issue7+TC2 0001321]: exit status for false should be 1-125)
On 1/31/20 10:15 AM, Joerg Schilling wrote: Robert Elz wrote: Date:Fri, 31 Jan 2020 11:43:17 +0100 From:Joerg Schilling Message-ID: <5e3404c5.tqmsutrzovb6+pjf%joerg.schill...@fokus.fraunhofer.de> | The real problem I see is that more than 30 years after waitid() has been | introduced to be able to return all 32 bits of the exit() call parameter, | bosh is still the only shell that does no longer live in the 1970s | with respect to exit() code handling. To me, this tells everything. If a function is needed, and a solution is provided, it gets used (might take a little while, but it happens). When a solution is provided to a problem that doesn't really exist, it tends to simply be ignored. This looks like a missinterpretation. The main issue seems to be that most kernel implementations implemented waitid() in a useless way. This changed approx. 4 years ago, when FreeBSD fixed their waitid()... The Linux kernel still only tracks 8 bits of information, truncating during _exit(). Although I have in the past raised the issue to Linux kernel developers that Linux' waitid() is non-compliant, no one has yet submitted a Linux kernel patch to update the process struct to track 32 bits and fix _exit/waitid to expose them. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2013)/Issue7+TC1 0001045]: Issues with "cd -"
On 10/23/19 9:49 AM, Geoff Clare wrote: Hi Konrad. The status changing to APPLIED means that the edits have now been made to the troff source of the standard. Although that doesn't mean they are set-in-stone, we would need a good reason to reopen the bug to change the resolution, and then update the troff to reflect the new resolution. As regards removing the "" case from the example, the parenthetical note after the code explains why that is there. The proposal was not to delete the "" case, but... case $dir in (/*) CDPATH= cd -P "$dir";; ("") CDPATH= cd -P "";; (*) CDPATH= cd -P "./$dir";; esac be shortened to case $dir in (/*|) CDPATH= cd -P "$dir";; to condense two cases into one. Except that it uses the wrong syntax; the correct spelling would be (/*|'') (or using more spacing, '( /* | '' )'). I don't have any qualms with condensing the example from a technical standpoint (if done correctly), but question whether it counts as a mere editorial change worth making this late in the process for this bug. (*) CDPATH= cd -P "./$dir";; esac ? Also, from a usability perspective, I think it would be better if `-' lost its special meaning after `--'. This would make the above code superfluous. Coding that up would be at odds with existing practice, so even if we were to choose that way if designing from scratch, I don't think we can make that change now. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: Draft minutes of the 5th August 2019 Teleconference
On 8/7/19 4:43 PM, enh wrote: > What's the plan for the qsort_r interface, given that glibc and BSD have > mutually incompatible ones (which is why I didn't add it to Android)? Per http://austingroupbugs.net/view.php?id=900#c4112, FreeBSD was planning to switch over to the glibc signature, making it easier to standardize things as 'qsort_r' as presented in the bug, rather than as 'posix_qsort_r'. But as there is still a 30-day window for Open Group objections, we may very well receive an objection to the name 'qsort_r' where we would have to go with 'posix_qsort_r'. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: Draft minutes of the 5th August 2019 Teleconference
On 8/6/19 4:48 AM, Geoff Clare wrote: > These are the draft minutes from yesterday's call. Andrew will need > to allocate the Austin-xxx document number and add the file to the > document register after he returns. > > > Minutes of the 5th August 2019 Teleconference Austin-xxx Page 1 of 1 > Submitted by Geoff Clare, The Open Group. 6th August 2019 Followup: > Bug 1220: Add an API to query the name of a locale category of a locale > object OPEN > http://austingroupbugs.net/view.php?id=1220 > > Action: Eric to ask if The Open Group is willing to sponsor this interface. > ... > > Bug 1263: Add ppoll() OPEN > http://austingroupbugs.net/view.php?id=1263 > > Action: Eric to ask if The Open Group is willing to sponsor this interface. Now complete, along with earlier actions to ask about sponsorship of qsort_r in bug 900 and reallocarray in bug 1218. I proposed a 30 day window for any comments or objections, and will follow up in early September (with the assumption that no objections is tacit approval that we proceed with the new interfaces). > Bug 374: malloc(0) and realloc(p,0) must not change errno on success OPEN > http://austingroupbugs.net/view.php?id=374 > > Geoff had noticed an overlap between changes suggested in this open bug > and the changes needed to align with C17. > > We also noted that glibc does not conform to the change we made in > 2008-TC1 to require that errno is set to an implementation-defined > value if realloc(p,0) returns null. This matches the change made in > C17 7.22.3.1 (overview) which says that if a null pointer is returned in > the size 0 case it is "to indicate an error". However, 7.22.3.5 (realloc) > still says "If size is zero and memory for the new object is not > allocated, it is implementation-defined whether the old object is > deallocated" and "The realloc function returns a pointer to the new > object [...], or a null pointer if the new object has not been allocated" > which seems to imply a null pointer can be returned in this case without > it being considered an error. > > Action: Eric to ask about this on the glibc mailing list. Also done; Florian Weimer has replied to the bug in note 4510, and in fact,... > > Action: Nick to draft a Clarification Request to WG14. ...says he already raised a similar question to WG14 in May 2018 (although I do not have a URL handy to that thread). In fact, the call to standardize reallocarray() may also want to depend on the outcome here. http://austingroupbugs.net/view.php?id=1218 -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: x[ as first word in sh
On 7/29/19 1:08 PM, Stephane Chazelas wrote: > That's a follow-up on > https://www.mail-archive.com/bug-bash@gnu.org/msg23451.html > > Is there anything in the POSIX spec that allows: > > x[ foo > > To be interpreted as anything other than invoking the "x[" > command with "foo" as argument? > > I had the vague recollection that there was but I can't find it > now. If there's not, that would be a bug in the spec as several > shells including ksh, bash, zsh, yash treat it as the start of > an array element assignment (like in: > > x[ foo > + 1]=value Ouch. I think you've identified a real problem. In XSH 2.13.3, we explicitly added wording to allow unmatched unquoted '[' in a word to be used in its role similar to 'test' (the difference being whether a later ']' argument is necessary): If the pattern contains an open bracket ( '[' ) that does not introduce a bracket expression as in XBD RE Bracket Expression, ... If the pattern does not match any existing filenames or pathnames, the pattern string shall be left unchanged. So by that argument, if the shell parses 'x[' as a word, then because it does not form a valid glob, it must be used unchanged as the command name. But that explicit wording does not cover whether 'x[' has to be delimited as a word. XSH 2.3 states in rule 7 that an unquoted blank ends the delimiting of any prior word, but the behavior you are showing for shells that parse a[b]= as an array assignment are trying to find the matching ] before delimiting the first word (so the shell extension of array assignment is somehow acting as a quoting context that prevents the whitespace thus parsed from being the unquoted blank that ends the delimiting of the word). The shell grammar, at XSH 2.10.1, allows for array assignments in rule 7b: If the TOKEN contains an unquoted (as determined while applying rule 4 from Token Recognition) character that is not part of an embedded parameter expansion, command substitution, or arithmetic expansion construct (as determined while applying rule 5 from Token Recognition): If the TOKEN begins with '=', then rule 1 shall be applied. If all the characters in the TOKEN preceding the first such form a valid name (see XBD Name), the token ASSIGNMENT_WORD shall be returned. Otherwise, it is unspecified whether rule 1 is applied or ASSIGNMENT_WORD is returned. with the intent that a[b]= can be an ASSIGNMENT_WORD in shells with array extensions, but can also be WORD for shells that treat it as glob to determine a command name. But without any explicit specification of permitting whitespace in the array arguments, it looks like there is a discrepancy between POSIX requirements and existing shell behavior. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching
On 6/21/19 4:00 PM, Stephane Chazelas wrote: >> The fact that bash 5's >> behavior breaks as_echo in the presence of certain filenames is >> definitely a discouraging regression; but I haven't paid enough >> attention to the details of this thread to know if it was broken only in >> the initial bash 5 release and since fixed in a followup patch, or if it >> is still broken with all of Chet's current official patches applied on top. > [...] > > Chet has clarified that it was intentional and to match Geoff's > interpretation of the standard. Chet has just mentioned he's > added a new posixglob option (on by default) to the devel branch > today > (http://git.savannah.gnu.org/cgit/bash.git/commit/?h=devel=48492ffae22d692594757e53fb4580ebb1f506cf) > which when disabled reverts to the old behaviour. The sad part will be if the behavior controlled by 'set -o posixglob' or 'shopt -s posixglob' (I haven't yet checked which of the two means Chet added it under) will actually be setting behavior NOT specified by POSIX, depending on how this current thread plays out. And even if he leaves in a knob, I hope that the default for that knob when bash is invoked as /bin/sh is historical behavior (what bash decides to default to in bash mode is a different matter). But as long as it remains on Chet's development branch and not a 5.1 release or an official patch to 5.0, there's still time for Chet to change it... > > To quote two striking examples that have already been given, > that interpretation of the standard would mean that: > > pattern='\.' > grep $pattern file > > Which in all shells is documented to search for lines that > contain a dot in "file" would now be required to instead search > for lines that contain at least one character in "file", as \ is > now a glob quoting operator, and \. happens to match the . > directory entry (on those systems where . is included in the > result of readdir() at least and with shells that don't skip . > and .. in glob expansions). Where's the glob character that causes $pattern to be subjected to globbing? Had there also been a '*', '[', or '?' in $pattern, I could (sort of) see the logic to the unquoted $pattern being subjected to use as a glob pattern. But when there are no globbing characters at all, why does \. suddenly serve to cause a glob lookup (where \ is then erased by the globbing procedure) and match '.' in the current directory? (And yes, this one is also confusing because of the ongoing work on the other open bug about whether shells should be permitted to always omit '.' from globbing, regardless of whether readdir() omitted it) > > and > > touch %sn > cmd='printf %s\n' > $cmd test > > which in all shells is documented to output test would > now be required to output testn (without newline). > > That's what bash5 now implements. And that is indeed the regression in behavior, which seems to not be the historical practice of any earlier shell. If the standard has to permit bash 5 behavior by leaving it unspecified, we've still rendered a lot of existing scripts broken; better would be if we can agree on standard wording (and Chet updates bash 5 to match) to do what has traditionally been done of NOT globbing a sequence that does not contain '*', '[' or '?'. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching
On 6/21/19 2:47 PM, Stephane Chazelas wrote: > In http://austingroupbugs.net/bug_view_page.php?bug_id=1222 > I asked that POSIX *allows*, not even mandate an interface > supported by one sh implementation and documented as such for > over 25 years (since before the first version of the POSIX.2 > specification) that addresses that: echo -E - "$var" > > That's not such a useless feature. Without it, echo can't be > used to output arbitrary data, which is exactly what that > autoconf as_echo is trying to do. POSIX has already long-documented the fact that echo cannot be used to output arbitrary data, and recommended the use of printf. autoconf's as_echo should be viewed as a thin shim around printf these days, rather than an attempt to portably use echo (the name as_echo hearkens back to the days when echo was a shell builtin everywhere but printf was not, so using echo was preferable to forking when it was possible; but while things have evolved since then, the name stuck). The fact that bash 5's behavior breaks as_echo in the presence of certain filenames is definitely a discouraging regression; but I haven't paid enough attention to the details of this thread to know if it was broken only in the initial bash 5 release and since fixed in a followup patch, or if it is still broken with all of Chet's current official patches applied on top. > > It was rejected (even the "just allowing it") on the ground that > it would break existing scripts (without providing any evidence; > no need to look now, I know it does break some). Part of the reason for that rejection (since I remember being on that call) was that the only example provided of 'echo -E -' not outputting the '-' was for zsh in non-POSIX mode - but zsh is already notoriously and intentionally non-POSIX when not in POSIX mode. The assumption made during the teleconference is that zsh in POSIX mode could just as easily comply with what all other shells do in strict compliance mode of outputting a literal '-', if zsh still wants to try for POSIX compliance (and even that fact is less obvious, as we have not had as many comments from zsh developers as we have had from other shells that are at least trying to come to common grounds via POSIX). > > That would break scripts that pass "-" as the *first* argument of > *one* command (echo) and that happen to be interpreted by a shell > that has implemented that allowed, but not required feature. > > My point was so that POSIX warn people against expecting "echo > -" to output "-" as it does not in all shells in practice. Perhaps that point could still be made as a non-normative point in the application usage section of echo, but if the only shell affected by the problem is zsh in non-POSIX mode, it felt like a bit much to be added at the time. > > Instead, now, POSIX want to *mandate* not only allow a feature > that not a single shell has done, that is not needed at all, and > that would potentially break all scripts that pass an unquoted > word expansion containing a backslash in *any* position, in > *any* argument to *any* command. > > Isn't there some level of double standard there? You're reading far too much into the outcome of the current discussion. I'm not yet convinced that POSIX is trying to mandate behavior at odds with existing shell practice, and the various mailing list threads on the topic are far from over. Various proposals may have added words that can be construed in that manner, but that does not mean that POSIX has adopted that proposal, nor that it will do without first addressing the problematic wording. We intentionally did not reach a final resolution on the backslash issue on yesterday's call because of the continued activity on the mailing list. And the fact that you have demonstrated several time-bombs where existing shell scripts coupled with historical shell behaviors can result in non-obvious changes in behavior based on the contents of the current working directory make this an interesting problem. But part of the issue is coming up with acceptable wording that either permits existing practice (at the risk of rendering common shell script examples in the wild as tickling unspecified behaviors), or which tightens things to be less unpredictable (even if it renders existing shells as non-compliant). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: Resetting getopt's state
On 12/28/18 4:50 AM, Joerg Schilling wrote: > Simon Ser wrote: > >> Hi, >> >> There's currently no way to reset getopt's internal state. This means >> you can't use getopt for two different argument vectors. Are there >> plans to standardize a way to do so? This topic also recently came up in the qemu mailing list, so it IS something that the standard should consider addressing: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00987.html >> >> The current standard says: >> >>> If the application sets optind to zero before calling getopt(), the >>> behavior is unspecified. >> >> Many libcs allow optind to be set to zero to reset getopt. Some BSDs >> have an additional optreset variable that can be set. Both BSD and glibc have extensions that require tracking additional hidden state, and thus also provide extensions for resetting that additional hidden state (glibc with semantics that depend on reading POSIXLY_CORRECT from the environment, and that differ based on whether '-' or '+' was the first character in optstring; then both glibc and BSD support an optstring of "a::" to allow optional arguments to -a). It's a shame that glibc picked 'optind=0' and BSD picked 'optreset=1' as their two hard-reset mechanisms. The standard explicitly calls out 'optind=0' as having unspecified behavior (which permits the glibc extension); the BSD choice is a bit harder to work with (as 'optreset' is not a name reserved for the POSIX namespace, so it shouldn't be visible to a strictly conforming application). There is also hidden state that MUST be tracked by getopt() in order to properly handle merged short options (that is, if the user passes "-ab" in argv[1], getopt() must leave optind=1 when returning 'a' even though the second call will return 'b' instead of 'a'). That hidden state is what Joerg mentions here: > > This would be a nonstandard method that still does not address the needs for > getopt() as it does not allow to restore the previous state. > > There however is a method in use since 30 years that is useful. It has been > introduced by AT to allow the Bourne Shell to use getopt() for builtin > commands. This method is based on an additional global integer named "_sp" > that is used as the index in a multi option string. Restoring the previous > state is needed to permit to call shell builtins in the getopts(1) parsing > loop. > > The initial value for _sp is 1 and if the value is set to 1, this resets the > internal state of getopt(). Restoring the previous value allows to restore the > previous state. And it also means that even if optind == 2, setting optind = 1 might not fully reset things if _sp is not currently 1 But note that the hidden state tracked by _sp is implicitly cleared any time getopt() returns -1 (because you are no longer processing later merged options from the same optind value) - that is, on implementations that have _sp, note that _sp is implicitly reset to -1 after getopt() reaches the end of options. Thus, the end effect should be the same whether we expose _sp for application use (preferably with a name that is reserved by the standard rather than risking conflicts with existing names in portable user programs), or whether we document that it IS portable in practice to perform a SOFT reset of getopt() state by running getopt() until it returns -1 prior to assigning optind = 1, while still leaving the door open for hard reset if you used extensions beyond POSIX (leading '-', leading '+', changing POSIXLY_CORRECT in the environment, or use of '::'). Thus, I'm leaning towards writing a defect that does just the latter (documenting optind=1 after getopt() returned -1 as a soft reset, and leaving it as implementation extensions for providing a hard reset), without bothering to expose _sp to applications (although _sp can remain as one of the implementation extensions). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [RFC/PATCH glibc 0/2] setting working dir in posix_spawn()
On 9/9/18 3:34 PM, Florian Weimer wrote: On 09/08/2018 12:54 AM, Eric Blake wrote: Also, I've realized that we do NOT need posix_spawn_file_actions_addopenat(). The main benefit of openat() is that you can redirect relative file names according to an fd of your choice, without affecting global state. But during posix_spawn(), there are no other threads competing for global state (if you are doing a library implementation where the chdir() is done between fork() and exec()), so: openat(mydir, "file", mode); can be decomposed to: posix_spawn_file_actions_addopen(, 5, ".", O_RDONLY|O_DIRECTORY, 0); posix_spawn_file_actions_addfchdir(, mydir); posix_spawn_file_actions_addopen(, 4, "file", mode, 0); posix_spawn_file_actions_addfchdir(, 5); posix_spawn_file_actions_addclose(, 5); Is it possible to choose an appropriate value for the directory descriptor automatically? Not that I know of. But it's tougher than it looks - my initial thought was "what about a magic negative number" that says to auto-allocate at the next free fd (the way AT_FDCWD is a magic number) - but since the allocation of fds is done at a later point (the posix_spawn() call) than the addition to file_actions (the posix_spawn_file_actions_addopen()), there is no way to predict WHAT that fd will actually resolve to, and thus no way to reuse that fd in posix_spawn_file_actions_addfchdir(), posix_spawn_file_actions_adddup2(), or posix_spawn_file_actions_addclose() as needed. In other words, by the time you're using posix_spawn(), you're already stuck with having to micro-manage your fds - and if you want to avoid closing something important by accident, you practically have to do: scratch_fd = open("/dev/null", O_RDONLY|O_CLOEXEC); posix_spawn_file_actions_addopen(, scratch_fd, ...); posix_spawn(, , ); close(scratch_fd); What about support for AT_EMPTY_PATH, for upgrading an O_PATH descriptor? I think this operation still needs openat. O_PATH and AT_EMPTY_PATH are Linux/glibc extensions not in POSIX. So yes, they are worth thinking about in terms of what glibc should provide, but I'm not sure if they are sufficient on their own to require POSIX to worry about posix_spawn_file_actions_addopenat(), but rather might argue that glibc should add posix_spawn_file_actions_addopenat_np(). Or looking at it another way - I'm trying to stick to the initial philosophy documented in the posix_spawn() RATIONALE section (page 1457 in the 2017 edition): The requirements for posix_spawn( ) and posix_spawnp( ) are: • They must be implementable without an MMU or unusual hardware. • They must be compatible with existing POSIX standards. Additional goals are: • They should be efficiently implementable. • They should be able to replace at least 50% of typical executions of fork( ). • A system with posix_spawn( ) and posix_spawnp( ) and without fork( ) should be useful, at least for realtime applications. • A system with fork( ) and the exec family should be able to implement posix_spawn( ) and posix_spawnp( ) as library routines. Adding just posix_spawn_file_actions_addfchdir() is lighter-weight than adding posix_spawn_file_actions_addopenat(), posix_spawn_file_actions_fchdirat(), and others. If you really have to deal with things like O_PATH or AT_EMPTY_PATH, then pre-open the fd in the parent and use posix_spawn_file_actions_adddup2(), rather than making file_actions more complicated. And we're not trying to replace 100% of fork/exec, but merely try to add a common-enough chdir paradigm to make it easier to replace the common 50%. Also, note that http://austingroupbugs.net/view.php?id=411 is also somewhat relevant, which states: At line 46976 [XSH posix_spawn_file_actions_adddup2], add a sentence: If fildes and newfildes are equal, then the action shall ensure that the FD_CLOEXEC flag of fildes is cleared (even though dup2( ) would leave it unchanged). After line 46999 [XSH posix_spawn_file_actions_adddup2], add the following: > Although dup2( ) is required to do nothing when fildes and newfildes are equal and fildes is an open descriptor, the use of posix_spawn_file_actions_adddup2( ) is required to clear the FD_CLOEXEC flag of fildes. This is because there is no counterpart of posix_spawn_file_actions_fcntl( ) that could be used for clearing the flag; it would also be possible to achieve this effect by using two calls to posix_spawn_file_actions_adddup2( ) and a temporary fildes value known to not conflict with any other file descriptors, coupled with a posix_spawn_file_actions_close( ) to avoid leaking the temporary, but this approach is complex, and risks EMFILE or ENFILE failure that can be avoided with the in-place removal of FD_CLOEXEC. There is no need for posix_spawn_file_actions_adddup3( ), since it makes no sense to create a file descriptor with FD_CLOEXEC set before spawning the child process, where that file descriptor would im
Re: [RFC/PATCH glibc 0/2] setting working dir in posix_spawn()
[reviving a REALLY old thread] https://sourceware.org/ml/libc-alpha/2010-08/msg00107.html On 08/27/2010 01:35 AM, Jonathan Nieder wrote: (pruned cc's, +cc:libc-alpha) Eric Blake wrote: On 08/26/2010 12:18 AM, Jonathan Nieder wrote: Do you think there would be any interest in a posix_spawn() variant that takes a dir parameter? I am imagining something like this: Of your variants, I would most prefer: int posix_spawn_file_actions_addchdir(posix_spawn_file_actions_t *file_actions, int dirfd); Today, I just submitted http://austingroupbugs.net/view.php?id=1208, then in searching my mail archives, I found this related thread that never had a response at the time, so I'm now offering a reply. Compared to my thoughts 8 years ago, my new writeup proposed int posix_spawn_file_actions_addchdir(posix_spawn_file_actions_t *restrict file_actions, const char *restrict name); int posix_spawn_file_actions_addfchdir(posix_spawn_file_actions_t *file_actions, int dirfd); which is slightly different from your RFC based on my older thoughts. But in re-reading your email, I see that we could indeed get by with JUST the fchdir() signature, since chdir("foo") can generally be decomposed into fchdir(open("foo", O_RDONLY|O_DIRECTORY)). Okay, here's a proof of concept (for the easy case --- a fork()- based implementation for Linux). Patches apply to 8b2b771^. For that matter, it may also be worth adding posix_spawn_file_actions_addopenat, which mirrors the recent addition of openat() semantics. Sounds like a good idea. I did not try it because I did not want to think about whether it would cause the __spawn_action struct to grow (and if so, what ramifications that would have, if any). Also, I've realized that we do NOT need posix_spawn_file_actions_addopenat(). The main benefit of openat() is that you can redirect relative file names according to an fd of your choice, without affecting global state. But during posix_spawn(), there are no other threads competing for global state (if you are doing a library implementation where the chdir() is done between fork() and exec()), so: openat(mydir, "file", mode); can be decomposed to: posix_spawn_file_actions_addopen(, 5, ".", O_RDONLY|O_DIRECTORY, 0); posix_spawn_file_actions_addfchdir(, mydir); posix_spawn_file_actions_addopen(, 4, "file", mode, 0); posix_spawn_file_actions_addfchdir(, 5); posix_spawn_file_actions_addclose(, 5); We don't need to add posix_spawn_file_actions_addFOO for every possible FOO that typically gets called between fork/exec, as long as we can string together enough bare components to get the feature parity within the single-threaded context of posix_spawn() for what is otherwise expensive if done in the parent as a wrapper around posix_spawn(), even if it adds more verbosity into the posix_spawn_* calls require to get the same desired effects. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Coordination on standardizing gettext() in future POSIX
Hello GNU gettext folks, Jörg Schilling is interested in standardizing gettext() and friends in a future version of POSIX (as a replacement to the hard-to-use catgets() that is currently standardized). See http://austingroupbugs.net/view.php?id=1122 While there are probably things in GNU gettext that won't be standardized (for example, xgettext(1) has some long-only options, but POSIX will only standardize short options), it is worth coordinating the bare minimum set of features that are portable across GNU and other implementations of gettext, as well as any wording changes that need to be added (such as documenting thread-safety, locale interactions, whether bindtextdomain() can only safely be used once prior to creating threads, and so on) in order to actually be included in the standards. Thus, this email is more of an introduction to make sure everyone interested in the project is aware of where to write/review any wording proposals for accomplishing the addition into POSIX. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: D1095R0/N2xxx draft 4: Zero overhead deterministic failure - A unified mechanism for C and C++
On 08/08/2018 07:19 PM, Eric Blake wrote: We've just had a discussion on whether standard-compliant abs() (which is currently undefined on INT_MIN) should be permitted and/or required to have well-defined behavior I failed to provide a summary to my thoughts: I think your paper's example should NOT use abs(), but instead some other function (whether you merely rename your existing example to 'myabs', or pick a different function which DOES have well-defined errno semantics right now), precisely because abs() does NOT currently have well-defined errno semantics and it is controversial on whether such semantics should be given to it. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: D1095R0/N2xxx draft 4: Zero overhead deterministic failure - A unified mechanism for C and C++
On 08/08/2018 05:24 PM, Niall Douglas wrote: https://docs.google.com/viewer?a=v=forums=MTEwODAzNzI2MjM1OTc0MjE3MjkBMDIyMjg0NDY2NTc4NzYyMDQzODYBX1RlYjRCNjREQUFKATAuMQFpc29jcHAub3JnAXYy=0 Comments are welcome, particularly on how best to offer POSIX functions in a form both binary compatible with old code, and which calls the _Fails(errno) form in newly compiled code. An initial comment in regards to the example on page 5: 1 int abs(int x) 2 { 3 if(x == INT_MIN) 4 { 5 errno = ERANGE; 6 return 0; 7 } 8 return (x < 0) ? -x : x; 9 } We've just had a discussion on whether standard-compliant abs() (which is currently undefined on INT_MIN) should be permitted and/or required to have well-defined behavior (either in the one direction of returning INT_MIN, as that is the fewest assembly instructions on typical hardware, or in the direction of adding errno handling, as you have done here). The verdict is not final (I wish I could point you to mailing list archives, but https://www.opengroup.org/austin/mailarchives/ points to gmane, which is no longer functional, and I don't know of any other web archival visiting the Austin list). But so far, a rough consensus from the discussion on bug http://austingroupbugs.net/view.php?id=1108 and http://austingroupbugs.net/view.php?id=1197 is that integral functions, like abs(), should NOT signal a range error for performance reason (or that setting errno to ERANGE should be a feature of floating-point math, not integer math), and that the wording in 1108 will be once again relaxed to leave behavior of abs(INT_MIN) undefined, rather than well-defined (any specific implementation can, of course, define behavior as an extension to POSIX). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: sed -e 'a\' -e text
On 08/07/2018 10:20 AM, Shware Systems wrote: That is a bug in those shells, conformance wise. No buts. Consideration of quoting happens after line joining, for all forms, as noted in the sections on quoting and tokenization. I don't think that even qualifies as a permitted extension, which is partly why the $'...' form is being added; it has the \n escape. Huh? The standard is clear that the following three sequences are identical in producing a literal backslash followed by a newline: "\\ " '\ ' $'\\\n' and all shells are compliant to that. Backslash-newline line joining does NOT happen inside single quoting, but only inside double-quoting and in unquoted text. Or more precisely, newline joining occurs when backslash is not quoted; when neither single- nor double-quoting is active, only backslash escaping can quote the backslash; when double-quoting is active, backslash is an escape character and is behaves unquoted unless backslash-escaped; but when single-quoting is active backslash is NOT an escape character and thus always behaves as quoted (cannot behave as unquoted, and therefore does not need escaping). You need: -e a\bar to disable the join now, so the character before the NL isn't a '\', for a conforming script. Then concatenation as part of quote removal keeps the '\' and NL. You are correct that such a script results in the same input to sed, but the original example using just -e a\ is identical in behavior. Please quit spreading misinformation. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [1003.1(2008)/Issue 7 0000262]: sed with multiple -e options
On 08/07/2018 10:18 AM, Joerg Schilling wrote: Stephane Chazelas wrote: 2018-08-07 15:46:33 +0100, Stephane Chazelas: [...] - or a variant thereof that covers historical implementations, that is same as above except that a fragment can't end in a backslash. [...] Correction: can't end in *an unescaped* backslash. sed -e 'w file\\' -e q (write to a file called file\) should still be OK. OK, but please note that strings that end in an unescaped backslash are uspecified at shell level already, so it is unspecified how the sed command is called (unlesss you use execl()). Huh? In shell, '\' with single-quoting is well-specified as a single backslash ('\\' is well-specified as two backslashes). And there is no way to write a double-quoted string that ends in an unescaped backslash, since both "\\" and "\"" are well-defined. (It is possible to write a construct that results in an unterminated string starting with a double quote, such as: eval echo '"' but such constructs are already problematic for not having a terminating ", rather than for ending in an unescaped backslash.) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: perror() changes the orientation of stderr to byte-oriented mode if stderr is not oriented yet.
On 06/29/2018 03:45 AM, Geoff Clare wrote: Eric Blake wrote, on 28 Jun 2018: I'm forwarding an email originally sent to the Cygwin list. What do others think? Is there enough grounds in the argument below that the CX-shading in POSIX is too strict compared to existing implementations, and that I ought to open a bug to change the wording on the requirements of perror() vs. stdout orientation? This issue arose in 2005 when C99 TC2 added perror() to the list of byte input/output functions and created a conflict with POSIX. The end result was that C99 TC3 removed it so that POSIX would not need to change. See http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_322.htm Although not in the C standard, we should also make sure that psignal() and psiginfo() have the same treatment as whatever we decide for perror(), since all three share the wording about "shall not change the orientation of the standard error stream". -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: perror() changes the orientation of stderr to byte-oriented mode if stderr is not oriented yet.
I'm forwarding an email originally sent to the Cygwin list. What do others think? Is there enough grounds in the argument below that the CX-shading in POSIX is too strict compared to existing implementations, and that I ought to open a bug to change the wording on the requirements of perror() vs. stdout orientation? On 06/28/2018 11:28 AM, Craig Howland wrote: On 06/27/2018 08:55 AM, Corinna Vinschen wrote: ... On Jun 27 20:01, Takashi Yano wrote: POSIX states: The perror() function shall not change the orientation of the standard error stream. However, cygwin perror() function changes the orientation of stderr to byte-oriented mode if stderr is not oriented yet. I suggest that POSIX is in error. The POSIX statement about not changing the orientation is an extension to the C standard (CX, to be precise). POSIX is always careful to defer to the C standard, which I think does indirectly specify that perror() is byte-oriented. The C standard actually does not directly talk about the orientation of perror(). However, it directly defines (quoting from the N1570 C11 draft): "The input/output functions are given the following collective terms: — The wide character input functions — those functions described in 7.29 that perform input into wide characters and wide strings: fgetwc, fgetws, getwc, getwchar, fwscanf, wscanf, vfwscanf, and vwscanf. — The wide character output functions — those functions described in 7.29 that perform output from wide characters and wide strings: fputwc, fputws, putwc, putwchar, fwprintf, wprintf, vfwprintf, and vwprintf. — The wide character input/output functions — the union of the ungetwc function, the wide character input functions, and the wide character output functions. — The byte input/output functions — those functions described in this subclause that perform input/output: fgetc, fgets, fprintf, fputc, fputs, fread, fscanf, fwrite, getc, getchar, printf, putc, putchar, puts, scanf, ungetc, vfprintf, vfscanf, vprintf, and vscanf." Please note that perror() is not listed. While this could be interpreted to mean it can be both, the proper way for that have to been done would be for it to appear in both lists--which it does not. However, perror() is defined in the same stdio.h subclause (i.e. 7.21) as all of the byte functions, against the wide-character functions in wchar.h (7.29). So even though the C standard is sloppy and does not directly have perror() in the enumerated list, it is included by the general statement about the subclause. However, you could argue that it was purposely left out, which is why they bothered to list the others. Against this are the definition or perror(), itself, and that they really should have listed perror() as an exception if it was so intended, and (as already-mentioned) perror() should be in both lists if it is to be dual-oriented. Here is the argument based on the perror() definition: "void perror(const char *s); ... It writes a sequence of characters to the standard error stream thus: first (if s is not a null pointer and the character pointed to by s is not the null character), the string pointed to by s followed by a colon (:) and a space; then an appropriate error message string followed by a new-line character." Things to note: 1) It is a regular character pointer, not a wide character pointer. Those characters, if supplied, are written. (It says nothing about converting them to wide if need be, it says "the string pointed to by s".) 2) "error message string". It does not say 'or wide-character error message string if needed'. 3) Followed by a "new-line character". It does not say "new-line wide character", which is used throughout the wchar.h section (7.29). So there is definitely a weakness in the C standard, but I think it is clear that perror() is a byte output function. If the user wants to print to a wide-character stream, the only pure way to do it would be to turn strerror() (used by perror()) output into a wide-character string. POSIX noted this weakness, but fixed it with a bad extension, rather than classifying perror() as byte--which is clearly is. Therefore, the newlib perror() behavior is correct and should not be changed. It definitely is a mess and there really ought to be a perrorw() function. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: can [[:digit:]] match something other than 0123456789?
On 05/18/2018 12:24 PM, Wheeler, David A wrote: This conversation seems strange; many locales use digits other than 0-9 to represent numbers. The Eastern Arabic, Perso-Arabic variant, and Urdu variant all have digits, they just aren't 0-9. In Unicode/ISO-646 in particular there are the digits U+0660 through U+0669 and U+06F0 through U+06F9. When I visited Saudi Arabia I saw the Eastern Arabic digits everywhere, not just 0-9. For more: https://en.wikipedia.org/wiki/Eastern_Arabic_numerals Here's an example, U+0662: http://www.fileformat.info/info/unicode/char/0662/index.htm This is a decimal digit with value 2. Java agrees. It sounds like there are different use cases. Maybe there needs to be a standard way to represent different cases, e.g., "exactly 0-9", "a digit in the current locale", and "a member of Unicode Character Category 'Number, Decimal Digit'". I don't know if there's a need to distinguish the second and third cases. It seems to me that [[:digit::]] should mean the second or third case. The problem is that the definition of isdigit() means only the first case (exactly the locale-independent 10 digits in the portable file name character set, whether locales are based on ASCII or EBCDIC), and the definition of [[:FOO:]] defers to isFOO() where possible. Yes, it may be nice to have additional classification routines, but as has been pointed out elsewhere in this thread, doing it solely by one character at a time may not be sufficient to capture all Unicode rules compared to what people really want to search for (for example, when searching for a character with an accent, you want to be able to find both the composed character, and the sequence of a plain character plus combining mark character, that both represent the same concept, but an iswFOO() test does not work on the latter example, since it occupies more than one character). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: can [[:digit:]] match something other than 0123456789?
On 05/15/2018 03:43 PM, Stephane Chazelas wrote: Does that mean that [0-9] is also guaranteed to match on 0123456789 only? And that then [[:digit:]] in regexp/fnmatch is close to useless as it's longer than [0-9] Yes, I think that's a fair conclusion for the C locale, by virtue of the fact that the standard requires the encoding for 0-9 to be contiguous and in order. and is a bit misleading as it suggests it would be affected by localisation (like the other character classes) while it's not. It's still useful in non-C locales within regexp, since ALL uses of - for ranges within [] has unspecified (or was it implementation-defined) semantics outside of the C locale. Using a named reference guarantees the desired semantics of exactly 10 characters, rather than skirting on the grounds of whether the range operator behaves as desired in all locales rather than just the C locale. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: can [[:digit:]] match something other than 0123456789?
On 05/15/2018 12:50 PM, Stephane Chazelas wrote: You're a bit late to the party on this question :) digit Define the characters to be classified as numeric digits. In the POSIX locale, only: 0 1 2 3 4 5 6 7 8 9 Please read http://austingroupbugs.net/view.php?id=1078 where this wording has been tightened to cover ALL locales, not just the POSIX locale, to better match with C requirements on isdigit(). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: Laundry list
On 04/27/2018 12:10 PM, Martijn Dekker wrote: > > I don't know of any way to accomplish that except by the de-facto > standard mechanism of "#! /usr/bin/env sh". There is a long-time and > highly widespread expectation that this will work. > >> In addition to shell >> scripts, the shebang hack is also commonly used with awk and sed >> scripts (just to name two other POSIX-specified languages). > > IMO, that's another good reason to standardise the hashbang path plus > the location of /usr/bin/env. If we standardize #! and the existence of /usr/bin/env, we should also consider standardizing the BSD invention of 'env -S' that GNU coreutils is now copying, as it serves as a very nice workaround for passing multiple arguments to the real interpreter through the #! line even when the OS passes only a single argument to env (as the #! interpreter). https://www.freebsd.org/cgi/man.cgi?query=env https://lists.gnu.org/archive/html/coreutils/2018-04/msg00011.html -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [1003.1(2008)/Issue 7 0001064]: basename() and dirname(): Specification is not complete enough to allow existing thread-unsafe implementations
On 12/14/2017 09:47 PM, Robert Elz wrote: > One final question about the intent for basename() / dirname() ... > > With the current (Issue7.TC2) wording of how these functions are defined, > it is clear that the sequence > > bn1 = basename(buf1); > bn2 = basename(buf2); > > leaves bn1 undefined (the value may have been modified.) > > The proposed new wording does not allow that any more. Well, it does for at least the corner case of an implementation that returns "." and "/" by modifying a single internal buffer, rather than by pointing to two separate const strings. But yes, having bn1 persist even after the second basename() within the same thread matches existing behavior of all implementations that modified the caller's input. > I understand the > intent is to require that implementations become thread safe, but that > could be achieved using thread local storage for a static buffer to > hold the result - but if the above is required to work (that is, the > results of basename() and dirname() are not permitted to be overwritten > by a subsequent call in the same thread) then that will not work, and the > only implementation technique possible (that I can think of anyway) will > be buffer modification. Thread local storage for a static buffer may not be large enough to support all possible inputs, and the goal was that the function cannot fail. Thus, in-buffer modification is the only viable solution for large inputs. > > That's OK with me (some NetBSD developers are less than thrilled about > all of this...) if that is the intent, I just wanted to make sure that > it was understood what the effects of this change are, compared with > what was there before. > > For what it is worth, if I had to guess, I'd say that the likely NetBSD > outcome of all of this is that we deprecate the 2 functions - mark them > as not to be used any more (though they'd still be supported in an > Issue 8 compatible way for compliance) and switch everything in the > NetBSD src tree to use a new interface (perhaps a slightly reworked > version of Ed Schouten's proposal in his note added to this issue in > the middle of last year (2016)). That approach matches what the GNU folks have already done years ago, when gnulib introduced their own base_name() and dir_name() functions with different (but reliable) semantics, eschewing the use of basename() and dirname() in GNU code. (Another aspect of the GNU code is that on DOS-like systems, base_name() handles drive letters, which is something that basename() completely ignores because POSIX does not have the notion of drive letters.) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: Should "exec" run a shell function?
On 07/18/2017 04:48 AM, Geoff Clare wrote: > > On page 2398 line 76737 section 2.14 exec, add to EXAMPLES: > > Execute the implementation's printf utility, ensuring that any > shell built-in version is not executed instead, and using a subshell > so that the shell continues afterwards: > > (exec printf '%g\n' "$float_value") The standard does not require printf %g to work; can we use a better example? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature