Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching
On 23/09/2019 16:39, Austin Group Bug Tracker wrote: -- (0004564) geoffclare (manager) - 2019-09-23 15:39 http://austingroupbugs.net/view.php?id=1234#c4564 -- [...] For the shell only, it is unspecified whether or not a character inside a bracket expression preserves the literal value of the following character. I noticed now that the resolution of bug 1233 was to not change the rules for how is treated inside bracket expressions (yet), but this does change it. This may be worth mentioning in bug 1233. "inside a bracket expression" is probably too narrow, given the exception in 2.13.3: bs='\' set -- ?[$bs.* 2.13.3 states that despite the * not being part of a bracket expression, it may be treated as an ordinary character: If the pattern contains an open bracket ( '[' ) that does not introduce a bracket expression as in XBD RE Bracket Expression, it is unspecified whether other unquoted pattern matching characters within the same slash-delimited component of the pattern retain their special meanings or are treated as ordinary characters. For example, the pattern "a*[/b*" may match all filenames beginning with 'b' in the directory "a*[" or it may match all filenames beginning with 'b' in all directories with names beginning with 'a' and ending with '['. This is because the shell may have already committed to parsing it as an ordinary character as it was under the impression a bracket expression had started. 2.13.3 should be modified to also state that despite the indirect backslash not being part of a bracket expression, it is also unspecified whether it preserves the literal value of the following character here. appropriate.to:3. If a specified pattern contains any '*', '?' or '[' characters that will be treated as special (see [xref to 2.13.1]), it shall be matched against existing filenames and pathnames, as appropriate. How is this intended to interact with that exception in 2.13.3? For an unquoted [* word, this may either unconditionally produce "[*", or it may expand to the names of files starting with "[". In shells where it unconditionally produces "[*", does that mean pathname expansion is not performed, as none of the characters are treated as special? Or does "unspecified" mean it may be treated as a pattern matching character when determining whether pathname expansion is going to be performed, but then as a literal character during the actual pathname expansion? This is relevant if the [* is at the end of a larger pattern containing an indirect backslash. (As an aside, why is this exception limited to patterns used for filename expansion? Existing practice is that it applies to all patterns: case [a in [*) echo match ;; *) echo no match ;; esac This prints "no match" in bosh, dash, ksh, and pdksh and posh.) Finally, just to clarify, with bs='\' and expanding [/$bs.] in a context where pathname expansion could be performed, it is my understanding that this is not a bracket expression, despite the word containing what would be a bracket expression when used in other contexts, therefore this would be required to expand to "[/\.]" regardless of the contents of the file system. Is that understanding correct? Cheers, Harald van Dijk
[1003.1(2016)/Issue7+TC2 0001289]: netdb.h - in_port_t and in_addr_t do not appear to be needed
A NOTE has been added to this issue. == http://austingroupbugs.net/view.php?id=1289 == Reported By:joelsherrill Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1289 Category: Base Definitions and Headers Type: Clarification Requested Severity: Editorial Priority: normal Status: New Name: Joel Sherrill Organization: RTEMS.org User Reference: Section:netdb.h Page Number:First paragraph Line Number:NA - used web Interp Status: --- Final Accepted Text: == Date Submitted: 2019-09-27 19:17 UTC Last Modified: 2019-09-27 19:58 UTC == Summary:netdb.h - in_port_t and in_addr_t do not appear to be needed == -- (0004572) shware_systems (reporter) - 2019-09-27 19:58 http://austingroupbugs.net/view.php?id=1289#c4572 -- While not directly referenced, the addrinfo structure and getnameinfo() interface use the incomplete sockaddr type, which in practice will be completed with members of those types for IP4 and IP6 sockets. The current wording is, I strongly suspect, from known implementations all doing a #include of another header that declared these types and completed sockaddr before any of the new declarations and prototypes, to leave open some implementaton may choose not to do it this way. The types are required, however, as sockaddr is required to support IP4, at least. Issue History Date ModifiedUsername FieldChange == 2019-09-27 19:17 joelsherrill New Issue 2019-09-27 19:17 joelsherrill Name => Joel Sherrill 2019-09-27 19:17 joelsherrill Organization => RTEMS.org 2019-09-27 19:17 joelsherrill Section => netdb.h 2019-09-27 19:17 joelsherrill Page Number => First paragraph 2019-09-27 19:17 joelsherrill Line Number => NA - used web 2019-09-27 19:58 shware_systems Note Added: 0004572 ==
[1003.1(2016)/Issue7+TC2 0001293]: Add methods associated with manipulating pthread affinity on SMP systems
The following issue has been SUBMITTED. == http://austingroupbugs.net/view.php?id=1293 == Reported By:joelsherrill Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1293 Category: Base Definitions and Headers Type: Enhancement Request Severity: Editorial Priority: normal Status: New Name: Joel Sherrill Organization: RTEMS.org User Reference: Section:pthread.h and add sys/cpuset.h Page Number:NA - addition request Line Number:NA - addition request Interp Status: --- Final Accepted Text: == Date Submitted: 2019-09-27 19:53 UTC Last Modified: 2019-09-27 19:53 UTC == Summary:Add methods associated with manipulating pthread affinity on SMP systems Description: At least FreeBSD, Linux, and RTEMS implement a very similar set of methods to support manipulation of pthread affinity on SMP systems. This is a proposal to consider the addition of a set of methods supporting this capability. The end solution should include the capabilities in existing implementations but needs to standardize the method signatures and types. - sys/cpuset.h defines a data structure which is a bitmap representing the affinity set and operations on that structure. It is similar in concept to sigset_t but beyond filling and clearing operations, there methods that are more like boolean operations. Ref for Linux: - pthread.h includes methods to set the affinity as part of the pthread_attr_t that is set at creation time and dynamically. The dynamic methods on Linux are defined as: #include int pthread_setaffinity_np(pthread_t thread, size_t cpusetsize, const cpu_set_t *cpuset); int pthread_getaffinity_np(pthread_t thread, size_t cpusetsize, cpu_set_t *cpuset); Those that set the attribute are defined as: #include int pthread_attr_setaffinity_np(pthread_attr_t *attr, size_t cpusetsize, const cpu_set_t *cpuset); int pthread_attr_getaffinity_np(const pthread_attr_t *attr, size_t cpusetsize, cpu_set_t *cpuset); Desired Action: Add support for manipulation of pthread affinity. == Issue History Date ModifiedUsername FieldChange == 2019-09-27 19:53 joelsherrill New Issue 2019-09-27 19:53 joelsherrill Name => Joel Sherrill 2019-09-27 19:53 joelsherrill Organization => RTEMS.org 2019-09-27 19:53 joelsherrill Section => pthread.h and add sys/cpuset.h 2019-09-27 19:53 joelsherrill Page Number => NA - addition request 2019-09-27 19:53 joelsherrill Line Number => NA - addition request ==
[1003.1(2016)/Issue7+TC2 0001291]: Add method to obtain pthread attributes
The following issue has been SUBMITTED. == http://austingroupbugs.net/view.php?id=1291 == Reported By:joelsherrill Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1291 Category: Base Definitions and Headers Type: Enhancement Request Severity: Comment Priority: normal Status: New Name: Joel Sherrill Organization: RTEMS.org User Reference: Section:pthread.h Page Number:NA - addition request Line Number:NA - addition request Interp Status: --- Final Accepted Text: == Date Submitted: 2019-09-27 19:36 UTC Last Modified: 2019-09-27 19:36 UTC == Summary:Add method to obtain pthread attributes Description: This is a commonly added pthread capability but there is little agreement on the API name. https://musl.openwall.narkive.com/dD88I7eH/pthread-getattr-np provides this list of API names for this capability in the context of discussing how to get the current stack information: glibc: pthread_getattr_np freebsd: pthread_attr_get_np netbsd: pthread_attr_get_np and pthread_getattr_np RTEMS follows glibc/linux with pthread_getattr_np. If this capability is provided, then the current stack information can also be obtained. The naming pattern pthread_[sg]attr_* is used to modify the pthread_attr_t structure used in a pthread_create() call. The Linux name of pthread_attr_get() seems like a choice which is an easy name and wouldn't be confused. Desired Action: Add pthread_attr_get() API. == Issue History Date ModifiedUsername FieldChange == 2019-09-27 19:36 joelsherrill New Issue 2019-09-27 19:36 joelsherrill Name => Joel Sherrill 2019-09-27 19:36 joelsherrill Organization => RTEMS.org 2019-09-27 19:36 joelsherrill Section => pthread.h 2019-09-27 19:36 joelsherrill Page Number => NA - addition request 2019-09-27 19:36 joelsherrill Line Number => NA - addition request ==
[1003.1(2016)/Issue7+TC2 0001292]: Add pthread_setname, pthread_getname
The following issue has been SUBMITTED. == http://austingroupbugs.net/view.php?id=1292 == Reported By:joelsherrill Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1292 Category: Base Definitions and Headers Type: Enhancement Request Severity: Comment Priority: normal Status: New Name: Joel Sherrill Organization: RTEMS.org User Reference: Section:pthread.h Page Number:NA - addition request Line Number:NA - addition request Interp Status: --- Final Accepted Text: == Date Submitted: 2019-09-27 19:37 UTC Last Modified: 2019-09-27 19:37 UTC == Summary:Add pthread_setname, pthread_getname Description: At least FreeBSD, Linux, RTEMS, QNX, and MKS currently provide pthread_setname_np() and pthread_getname_np() with this signature. #include int pthread_setname_np(pthread_t thread, const char *name); int pthread_getname_np(pthread_t thread, char *name, size_t len); NOTE: QNX has int instead of size_t for the len parameter of pthread_getname_np(). AFAIK no implementation adds thread name as a pthread attribute Desired Action: This is a request to have these methods added to the base set of pthread methods. == Issue History Date ModifiedUsername FieldChange == 2019-09-27 19:37 joelsherrill New Issue 2019-09-27 19:37 joelsherrill Name => Joel Sherrill 2019-09-27 19:37 joelsherrill Organization => RTEMS.org 2019-09-27 19:37 joelsherrill Section => pthread.h 2019-09-27 19:37 joelsherrill Page Number => NA - addition request 2019-09-27 19:37 joelsherrill Line Number => NA - addition request ==
[1003.1(2016)/Issue7+TC2 0001290]: arpa/inet.h - origin of socklen_t is unclear
The following issue has been SUBMITTED. == http://austingroupbugs.net/view.php?id=1290 == Reported By:joelsherrill Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1290 Category: Base Definitions and Headers Type: Clarification Requested Severity: Editorial Priority: normal Status: New Name: Joel Sherrill Organization: RTEMS.org User Reference: Section:arpa/inet.h Page Number:NA - used web Line Number:NA - used web Interp Status: --- Final Accepted Text: == Date Submitted: 2019-09-27 19:20 UTC Last Modified: 2019-09-27 19:20 UTC == Summary:arpa/inet.h - origin of socklen_t is unclear Description: Ref: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/arpa_inet.h.html In arpa/inet.h, socklen_t is used as an argument to inet_ntop() but you have to pull a thread to figure out how it is defined based on the single include file in the Synopsis. The thread for .h files is arpa/inet.h -> netinet/in.h -> sys/socket.h. Desired Action: Clarify the source of socklen_t. == Issue History Date ModifiedUsername FieldChange == 2019-09-27 19:20 joelsherrill New Issue 2019-09-27 19:20 joelsherrill Name => Joel Sherrill 2019-09-27 19:20 joelsherrill Organization => RTEMS.org 2019-09-27 19:20 joelsherrill Section => arpa/inet.h 2019-09-27 19:20 joelsherrill Page Number => NA - used web 2019-09-27 19:20 joelsherrill Line Number => NA - used web ==
[1003.1(2016)/Issue7+TC2 0001289]: netdb.h - in_port_t and in_addr_t do not appear to be needed
The following issue has been SUBMITTED. == http://austingroupbugs.net/view.php?id=1289 == Reported By:joelsherrill Assigned To: == Project:1003.1(2016)/Issue7+TC2 Issue ID: 1289 Category: Base Definitions and Headers Type: Clarification Requested Severity: Editorial Priority: normal Status: New Name: Joel Sherrill Organization: RTEMS.org User Reference: Section:netdb.h Page Number:First paragraph Line Number:NA - used web Interp Status: --- Final Accepted Text: == Date Submitted: 2019-09-27 19:17 UTC Last Modified: 2019-09-27 19:17 UTC == Summary:netdb.h - in_port_t and in_addr_t do not appear to be needed Description: Ref: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/netdb.h.html In netdb.h, both in_port_t and in_addr_t are “may define” but do not appear to be needed. In discussing this, our assumption is that when this header was added to the standard, at least one implementation defined or needed these two types. They do not appear to be strictly needed. Desired Action: Clarification/update is requested. == Issue History Date ModifiedUsername FieldChange == 2019-09-27 19:17 joelsherrill New Issue 2019-09-27 19:17 joelsherrill Name => Joel Sherrill 2019-09-27 19:17 joelsherrill Organization => RTEMS.org 2019-09-27 19:17 joelsherrill Section => netdb.h 2019-09-27 19:17 joelsherrill Page Number => First paragraph 2019-09-27 19:17 joelsherrill Line Number => NA - used web ==
Draft minutes of the 26 September 2019 Teleconference
These are the draft minutes from yesterday's call. Andrew will need to allocate the Austin-xxx document number and add the file to the document register after he returns. Regards, Geoff. --- Minutes of the 26 September 2019 Teleconference Austin-xxx Page 1 of 1 Submitted by Geoff Clare, The Open Group. 27th September 2019 Attendees: Don Cragun, IEEE PASC OR Nick Stoughton, USENIX, ISO/IEC JTC 1/SC 22 OR Joerg Schilling, FOKUS Fraunhofer Geoff Clare, The Open Group Eric Blake, Red Hat, Open Group OR Mark Ziegast, SHware Systems Dev. Apologies: Andrew Josey, The Open Group * General news None * Outstanding actions (Please note that this section has been flushed to shorten the minutes - to locate the previous set of outstanding actions, look to the minutes from 13th June 2019 and earlier) Bug 1254: "asynchronous list" description uses "command" instead of "AND-OR list" OPEN http://austingroupbugs.net/view.php?id=1254 Action: Joerg to investigate how his shell behaves. Bug 700 - Nick to raise this issue with the C committee Bug 713 - Nick to raise with the C committee. Bug 739 - Nick to raise with the C committee. Bug 1216 - Eric to ask if The Open Group is willing to sponsor this interface, referencing bug note 4478. * Current Business Bug 1190: backslash has two special meanings in the shell and only loses one of them in bracket expressionsAccepted as Marked http://austingroupbugs.net/view.php?id=1190 (This bug was resolved in the 23rd September teleconference, but was omitted from the previous minutes.) This item is tagged for TC3-2008. Interpretation response The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Rationale: - None. Notes to the Editor (not part of this interpretation): --- On page 184 line 6087 section 9.3.5 RE Bracket Expression, change: The special characters '.', '*', '[', and '\\' (, , , and , respectively) shall lose their special meaning within a bracket expression. to: When the bracket expression appears within a BRE, the special characters '.', '*', '[', and '\\' (, , , and , respectively) shall lose their special meaning within the bracket expression. When the bracket expression appears within an ERE, the special characters '.', '(', '*', '+', '?', '{', '|', '$', '[', and '\\' (, , , plus-sign>, , , , dollar-sign>, , and , respectively) shall lose their special meaning within the bracket expression; ('^') shall lose its special meaning as an anchor. When the bracket expression appears within a shell pattern (see [xref to XCU 2.13]), the special characters '?', '*', and '[' (, , and , respectively) shall lose their special meaning within the bracket expression; whether or not ('\\') loses its special meaning as a pattern matching character is described in [xref to XCU 2.13.1], but in contexts where a shell-quoting can be used it shall retain its special meaning (see [xref to XCU 2.2]). For example: $ ls ! $ - \ a b c $ echo [a\-c] - a c $ echo [\!a] ! a $ echo ["!\$a-c"] ! $ - a c $ echo [!"\$a-c"] ! \ b $ echo [!\]\\] ! $ - a b c Bug 1234: in most shells, backslash doesn't have two meaning wrt pattern matching Accepted as Marked http://austingroupbugs.net/view.php?id=1234 This item is tagged for TC3-2008. Interpretation response 1. The standard clearly states in XCU 2.13.1 that backslash has an escaping role in shell patterns that is distinct from its role as a quoting character, and conforming implementations must conform to this. 2. The standard states in XCU 2.13.3 that patterns in pathname expansion are matched against existing files regardless of the pattern contents, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: - 1. Although existing practice in some shells is not to treat backslash as special in situations where shell quoting does not affect the pattern (such as in word expansions when a pattern used in pathname expansion is "indirect", i.e. not present in the original word but resulting from an earlier expansion), relaxing the standard to allow this behavior would be undesirable, as it would mean that the only way to match a literal '?', '*' or '[' would be to put them in a bracket expression, unlike all other contexts where these characters are special and they can be escaped with backslash. Application writers should be able to use an unquoted unescaped backslash that is not inside a bracket expression in a pattern and have it interpreted the same way across the shell (in
Re: More issues with pattern matching
"Schwarz, Konrad" wrote: > > -Original Message- > > From: Robert Elz > > > So, is [[:"alpha":]] required to be treated the same as [[:alpha:]] , not > > allowed to be treated the same, > > explicitly unspecified, or simply never considered (previously) ? > > An argument for requiring [[:"alpha":]] to be the same as [[:alpha:]] is that > it would allow character-class names > with white space, e.g., "title case". There is no need to do this since my implementations for [[:alpha:]] first check for the resence of ":]" and then use the text bewteen [[: and :]] as character class name. I expect other implementations to do the same. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More issues with pattern matching
Robert Elz wrote, on 27 Sep 2019: > > | In the case of [x[:bogus:]], the use of both colons clearly indicates > | the intention to use the new character-class feature. If the name > | between the colons is not a valid class name, that is likely due to > | an error on the user or application writer's part when typing the name. > > I had been waiting for that argument, it is the only one that is > half way rational, and supports that position. But half way is as > far as it gets. > > POSIX allows locales to define new char classes, it says so, XBD 7.3, > page 141, lines 4218-4226. > > Since a locale is allowed to define a new char class name, the shell > (or regcomp() for the RE case) cannot know whether the user here: > > | For example, if a user types: > | > | grep '[[:alhpa:]]' file > > made a typo for alpha (the standard posix defined char class), or really > intended alhpa a locale specific char class in some locale which is not > the current one. > > Making this some kind of error, in either REs, or shell patterns (whatever > the effect of that is) makes it impossible for users to ever safely, and > simply, use the locale specific locale name. > > They cannot even test which locale is in use as (aside from it being > impossible > to be sure which locales have added this new char class to their definitions) > there's no guarantee that even if we know that LC_CTYPE=EN_dislexic > contains the alhpa character class, in some implementations, there is no > sane way to know whether the current impoementation does. > > That is, unless you're requiring that before a locale specific char class > can be used, the user (on the command line) or script, is required to > query the locale and test whether the char class is defined there or not. > > Requiring that would be absurd. You might consider it absurd, but it is what the standard requires applications and users to do in order to avoid "undefined results" (as per XBD 9.1 under "invalid"). The standard even acknowledges that applications need to be able to do that, in the APPLICATION USAGE for the locale utility: Implementations are not required to write out the actual values for keywords in the categories LC_CTYPE and LC_COLLATE ; however, they must write out the categories (allowing an application to determine, for example, which character classes are available). In a C program, finding out if a character class name is valid for the current locale is simply a matter of calling wctype(name) and checking whether it returns (wctype_t)0. So applications using fnmatch(), glob() or regcomp() can do that before using a name that isn't one of the mandated ones. > | > So, is [[:"alpha":]] required to be treated the same as [[:alpha:]] , > | > not allowed to be treated the same, explicitly unspecified, or simply > | > never considered (previously) ? > | > | I believe the intention is that it be treated the same as [[:alpha:]]. > > Good, that is what I would have hoped. Now maybe we should add something > to make that explicit. Yes, I think an addition is warranted. Maybe we should add a new paragraph to 2.13 (before the 2.13.1 heading) along the lines of: In the shell, any quoting characters (see [xref to 2.2]) that are present in a word to be used as a pattern, and are treated as special, shall participate in pattern matching only through their effects on other characters; they shall not themselves be treated as pattern characters. For example: ls -ld \\* lists files with names that begin with a single , ls -ld "?"* lists files with names that begin with a , ls -ld [[:'alpha':]]* lists files with names that begin with an alphabetic character in the current locale, and ls -ld [[':alpha:']]* lists files with names that begin with a character from the set { '[', ':', 'a', 'l', 'p', 'h' } followed by a ']'. > > | The word "may" has a strict usage. See XBD 1.5 - it "Describes a > | feature or behavior that is optional for an implementation that > | conforms to POSIX.1-2017." > | > | However, there have been cases in the past where incorrect uses "may" > | have been found and changed to "can". > | > | In any case, the "shall" in XCU 2.13.1 overrides it. > > Only for shell patterns, we still need to decide whether it was the > defined "may" or an erroneous use which should be replaced by "can" > for regular expressions. Given the shell imperative, and the desire > to make bracket expressions in sh patterns and REs as equivalent as > possible, I suspect the latter. The rationale in XRAT says the opposite (A.9.1): The ISO POSIX-2:1993 standard required bracket expressions like "[^[:lower:]]" to match multi-character collating elements such as "ij". However, this requirement led to behavior that many users did not expect and that could not feasibly be mimicked in user code, and it was rarely if ever