[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
The following issue has a resolution that has been APPLIED. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Applied Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:https://austingroupbugs.net/view.php?id=249#c6006 Resolution: Accepted As Marked Fixed in Version: == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-11-08 14:32 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590 2010-11-11 16:42 Don Cragun Note Edited: 590 2010-11-11 16:44 Don Cragun Interp Status => --- 2010-11-11 16:44 Don Cragun Final Accepted Text => See https://austingroupbugs.net/view.php?id=249#c590 2010-11-11 16:44 Don Cragun Status Under Review => Resolved 2010-11-11 16:44 Don Cragun Resolution Open => Accepted As Marked 2010-11-11 16:44 Don Cragun Tag Attached: issue8 2010-12-09 16:12 Don Cragun Note Edited: 590 2015-07-31 15:59 stephane
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
The following issue has been RESOLVED. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Resolved Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:https://austingroupbugs.net/view.php?id=249#c6006 Resolution: Accepted As Marked Fixed in Version: == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-20 15:14 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590 2010-11-11 16:42 Don Cragun Note Edited: 590 2010-11-11 16:44 Don Cragun Interp Status => --- 2010-11-11 16:44 Don Cragun Final Accepted Text => See https://austingroupbugs.net/view.php?id=249#c590 2010-11-11 16:44 Don Cragun Status Under Review => Resolved 2010-11-11 16:44 Don Cragun Resolution Open => Accepted As Marked 2010-11-11 16:44 Don Cragun Tag Attached: issue8 2010-12-09 16:12 Don Cragun Note Edited: 590 2015-07-31 15:59 stephane Issue Monitored:
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-20 15:12 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0006006) geoffclare (manager) - 2022-10-20 15:12 https://austingroupbugs.net/view.php?id=249#c6006 -- These are the agreed changes from https://posix.rhansen.org/p/bug249 (omitting \u and \U). Page and line numbers are for the 2013 edition (C138.pdf) At page 2319 line 73573 (XCU section 2.1, Shell Introduction, item 4) change:The shell performs various expansions (separately) on different parts of each command, resulting in a list of pathnames and fields to be treated as a command and arguments; see [xref to 2.6].xto:For each word within a command, the shell processes backslash escape sequences inside dollar-single-quotes (see [xref to 2.2.4]) and then performs various word expansions (see [xref to 2.6]). In the case of a simple command, the results usually include a list of pathnames and fields to be treated as a command name and arguments; see [xref to 2.9]. At page 2320 line 73594 (XCU section 2.2, Quoting) change:The various quoting mechanisms are the escape character, single-quotes, and double-quotes.to:The various quoting mechanisms are the escape character, single-quotes, double-quotes, and dollar-single-quotes. At page 2320 lines 73609-73611 (XCU 2.2.3, Double-Quotes), change:$ The shall retain its special meaning introducing parameter expansion (see Section 2.6.2), a form of command substitution (see Section 2.6.3), and arithmetic expansion (see Section 2.6.4).to:$ The shall retain its special meaning introducing parameter expansion (see Section 2.6.2), a form of command substitution (see Section 2.6.3), and arithmetic expansion (see Section 2.6.4), but shall not retain its special meaning introducing the dollar-single-quotes form of quoting (see [xref to 2.2.4]). At page 2321 lines 73626-73627 (XCU 2.2.3, Double-Quotes), change:A single-quoted or double-quoted string that begins, but does not end, within the "`...`" sequenceto:A quoted (single-quoted, double-quoted, or dollar-single-quoted) string that begins, but does not end, within the "`...`" sequence After page 2321 line 73635 (end of XCU section 2.2), insert a new subsection:2.2.4 Dollar-Single-Quotes A sequence of characters starting with a immediately followed by a single-quote ($') shall preserve the literal value of all characters up to an unescaped terminating single-quote ('), with the exception of certain backslash escape sequences, as follows: \" yields a (double-quote) character, but note that can be included unescaped. \' yields an (single-quote) character. \\ yields a character. \a yields an character. \b yields a character. \e yields an character. \f yields a character. \n yields a character. \r yields a character. \t yields a character. \v yields a character. \cX yields the control character listed in the Value column of [xref to XCU Table 4.21] in the Operands section of the stty utility when X is one of the characters listed in the ^c column of the same table, except that \c\\ yields the control character since the character must be escaped. \xXX yields the byte whose value is the
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-20 10:08 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0006004) geoffclare (manager) - 2022-10-20 10:08 https://austingroupbugs.net/view.php?id=249#c6004 -- > However, there is nothing in the text being added to XRAT about why \ > is unspecified, and there probably should be, for future generations. I found an old email from Jilles Tjoelker which says:I have found three different behaviours: 1. change backslash-newline to a newline, in ksh93, mksh and zsh; 2. leave backslash-newline unchanged, in bash (both non-POSIX and POSIX mode); 3. delete backslash and newline, in FreeBSD sh. I have added a suggestion based on this to the bug249 etherpad page in blue. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-20 09:03 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0006003) geoffclare (manager) - 2022-10-20 09:03 https://austingroupbugs.net/view.php?id=249#c6003 -- > Also, assuming that \u and \U are part of the C standard (I actually have > no idea on that assumption) then explaining why they're not being included > here would also help. I have added a suggestion for this to the bug249 etherpad page in blue. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590 2010-11-11 16:42 Don Cragun Note Edited: 590 2010-11-11 16:44 Don Cragun Interp Status => --- 2010-11-11 16:44 Don Cragun
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-20 08:49 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0006002) geoffclare (manager) - 2022-10-20 08:49 https://austingroupbugs.net/view.php?id=249#c6002 -- > Another of the possible fixes is to make it unspecified what happens when > there are 3 octal digits, and the first is not 0 1 2 or 3. Since the description is silent about what happens if the value is too large to be represented in a byte, the behaviour is implicitly unspecified. The same is already true for the printf utility, which says:"\ddd", where ddd is a one, two, or three-digit octal number, shall be written as a byte with the numeric value specified by the octal numberand (for %b):"\0ddd", where ddd is a zero, one, two, or three-digit octal number that shall be converted to a byte with the numeric value specified by the octal number Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Robert Elz wrote, on 19 Oct 2022: > > | I can't see anything "a few lines earlier" that implies quotation-mark > | needs to be escaped. Please give the exact wording change you would > | like to see. > > I think Steffen is referring to: > >\" yields a (double-quote) character. > > the first bullet point in the (new) section 2.2.4, and that all he > means to change would be to add to that sentence something like: > > , but note that the double-quote character is not required to be > escaped to be included > > (just before the '.' that ends the existing sentence). Thanks, I see Steffen's point now. I have added: , but note that can be included unescaped to the bug249 etherpad page in blue. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Geoff Clare wrote in : |Steffen Nurpmeso wrote, on 18 Oct 2022: |> Austin Group Bug Tracker wrote in |> <2969d655ede7498ce22799a53d077...@austingroupbugs.net>: |> ... |>|https://austingroupbugs.net/view.php?id=249 |> ... |>| https://austingroupbugs.net/view.php?id=249#c5995 |> ... |>|If a \e or \cX escape sequence specifies a character that does not \ |>|have an |>|encoding in the locale in effect when these backslash escape sequences \ |>|are |> |> \e only yields escape U+1B? |> Since "this standard requires support for all of the control |> characters except NULL (matching what is done in the stty |> utility)" \e is always supported. It is in (US-)ASCII and thus |> ISO-8859-1 and thus in the lower 256 codepoints of Unicode. |> (It is also in that EBCDIC thing.) | |"This standard requires support for all of the control characters except |NULL" just means that the shell is required to recognise $'\c[' as |specifying , it doesn't mean that has to have an encoding |in all locales. See XBD 6.2: | |The POSIX locale [...]. Other locales shall contain the characters |in Table 6-1 (on page 105) and may contain any or all of the |control characters identified in Table 6-2 (on page 110) | | is in Table 6-2. You are right, i see, U+001B is not in the portable character set, only an optional part of character sets. This is so far off daily live i would never have reflected that on my own. ISO 6429, ECMA-48, ECMA-35 from December 1971 includes it even. I downloaded a version, it is typewriter written. Thank you. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-19 21:29 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0006001) steffen (reporter) - 2022-10-19 21:29 https://austingroupbugs.net/view.php?id=249#c6001 -- re 5997: Also, assuming that \u and \U are part of the C standard (I actually have no idea on that assumption) then explaining why they're not being included here would also help. Supporting them is easy (i share implementation with \X and \x, just the length differs, and the aftermath). You get UTF-32 code points and convert them to UTF-8. In an UTF-8 locale you are done, if you have iconv(3) you pass it through and are done on success, otherwise i do i = snprintf(stackbuf, sizeof stackbuf, "\\%c%0*X", (no > 0xu ? 'U' : 'u'), (int)(no > 0xu ? 8 : 4), (u32)no); This is really easy no? The problem with ISO as i recall them were the mysterious codepoint holes that i did not understand (render some valid ranges undefined, what for?). The other problem are grapheme sequences that span over multiple UTF-32 codepoints. This only in the iconv(3) case anyway (and last i looked no iconv(3) did care for them, so entirely hypothetic). That is: if adjacent \U escapes exist (in the single quote), and an iconv(3) fallback is to be driven, pass them all as a continous string. I do not do that in my mailer. But i should, as there _are_ graphems which span multiple codepoints. I repeat that adding an additional \$VARiable sequence would turn $'' into a quoting mechanism that includes the capabilities of all other mechanisms. It would allow quoting entire sentences etc even with embedded expansions. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-19 20:53 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0006000) steffen (reporter) - 2022-10-19 20:53 https://austingroupbugs.net/view.php?id=249#c6000 -- qMy MUA says ? : $'\567' s-nail: \0 argument exceeds byte: $'\567': \567 ? : $'\U11' s-nail: \U argument exceeds 0x10: $'\U11': \U11 Because these are almost ever written constants as opposed to variable expansions, it seems to me hinting an error seems better than anything else. bash is known to parse all sorts of integers beyond any measure, wrapping around; i do not think it is desirable to standardize it, it requires unrolling ascii-to-integer instead of using standardized functions (busybox ash uses those, for example (in general, maybe not for \567)). Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Robert Elz wrote in <28905.1666177...@jacaranda.noi.kre.to>: |Date:Wed, 19 Oct 2022 08:26:46 +0100 |From:"Geoff Clare via austin-group-l at The Open Group" \ | |Message-ID: | || I can't see anything "a few lines earlier" that implies quotation-mark || needs to be escaped. Please give the exact wording change you would || like to see. | |I think Steffen is referring to: | | \" yields a (double-quote) character. | |the first bullet point in the (new) section 2.2.4, and that all he |means to change would be to add to that sentence something like: | |, but note that the double-quote character is not required to be |escaped to be included | |(just before the '.' that ends the existing sentence). Yes, thank you. I find it remarkable in the cryptic shell expansion context. Quotation-mark does not bite if it is not escaped. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-19 14:04 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005999) kre (reporter) - 2022-10-19 14:04 https://austingroupbugs.net/view.php?id=249#c5999 -- Re https://austingroupbugs.net/view.php?id=249#c5998 busybox is one I don't have, so cannot test (along with ksh88, though it probably doesn't have $' at all). Another of the possible fixes is to make it unspecified what happens when there are 3 octal digits, and the first is not 0 1 2 or 3. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590 2010-11-11 16:42 Don Cragun Note Edited: 590 2010-11-11 16:44 Don Cragun Interp Status => ---
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-19 12:22 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005998) hvd (reporter) - 2022-10-19 12:22 https://austingroupbugs.net/view.php?id=249#c5998 -- > There are multiple ways this could be fixed, but as far as I can tell, all shells which I can test, which also implement $' (as other than a $ preceding a single quoted string) all conform to: >\ddd yields the byte whose value is the least significant 8 >bits of the octal value ddd (one to three octal digits). busybox ash does not behave this way. I do not know the logic behind what it is doing, but I am seeing $'\567' result in '.', not 'w'. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-19 11:50 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005997) kre (reporter) - 2022-10-19 11:50 https://austingroupbugs.net/view.php?id=249#c5997 -- In https://austingroupbugs.net/view.php?id=249#c5995 in the proposed new section 2.2.4, the penultimate bullet point is: \ddd yields the byte whose value is the octal value ddd (one to three octal digits). Which suggests to me that \567 is required to yeild the byte whose octal value is 0567 which is something of a challenge. There are multiple ways this could be fixed, but as far as I can tell, all sells which I can test, which also implement $' (as other than a $ preceding a single quoted string) all conform to; \ddd yields the byte whose value is the least significant 8 bits of the octal value ddd (one to three octal digits). I understand the need for the final bullet point (\ gives unspecified results) -- I see all 3 reasonable interpretations for that in at least one shell which supports $' (treating it as a line join, simply pretending the backslash is not there is ) is treating the sequence literally \ is ) [aside: only the first makes any sense to me, the ability to insert a line wrap in the middle of the string is useful, and while it can be achieved by ending the $' string, adding a \ and then starting a new $' string, that is cumbersome - on the other hand the 2nd interp (simply drop the \) can be done by simply omitting the \ in the input string, and the latter, by properly escaping the \ (resulting in \\) both of which are easy, but never mind]. However, there is nothing in the text being added to XRAT about why \ is unspecified, and there probably should be, for future generations. Also, assuming that \u and \U are part of the C standard (I actually have no idea on that assumption) then explaining why they're not being included here would also help. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Date:Wed, 19 Oct 2022 08:26:46 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: | I can't see anything "a few lines earlier" that implies quotation-mark | needs to be escaped. Please give the exact wording change you would | like to see. I think Steffen is referring to: \" yields a (double-quote) character. the first bullet point in the (new) section 2.2.4, and that all he means to change would be to add to that sentence something like: , but note that the double-quote character is not required to be escaped to be included (just before the '.' that ends the existing sentence). kre
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Steffen Nurpmeso wrote, on 18 Oct 2022: > > Austin Group Bug Tracker wrote in > <2969d655ede7498ce22799a53d077...@austingroupbugs.net>: > ... > |https://austingroupbugs.net/view.php?id=249 > ... > | https://austingroupbugs.net/view.php?id=249#c5995 > ... > |If a \e or \cX escape sequence specifies a character that does not have an > |encoding in the locale in effect when these backslash escape sequences are > > \e only yields escape U+1B? > Since "this standard requires support for all of the control > characters except NULL (matching what is done in the stty > utility)" \e is always supported. It is in (US-)ASCII and thus > ISO-8859-1 and thus in the lower 256 codepoints of Unicode. > (It is also in that EBCDIC thing.) "This standard requires support for all of the control characters except NULL" just means that the shell is required to recognise $'\c[' as specifying , it doesn't mean that has to have an encoding in all locales. See XBD 6.2: The POSIX locale [...]. Other locales shall contain the characters in Table 6-1 (on page 105) and may contain any or all of the control characters identified in Table 6-2 (on page 110) is in Table 6-2. > And likewise in "the unsupported character might be replaced with > multiple characters, shell-special or regular (e.g. if is > not supported $'\e' may be replaced by "???", "XXX" or "")" > \e seems a particularly bad example thus. > > (Also quotation-mark does not _need_ to be escaped, it can. It > might be worthwhile to point this out? A few lines earlier.) I can't see anything "a few lines earlier" that implies quotation-mark needs to be escaped. Please give the exact wording change you would like to see. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Austin Group Bug Tracker wrote in <2969d655ede7498ce22799a53d077...@austingroupbugs.net>: ... |https://austingroupbugs.net/view.php?id=249 ... | https://austingroupbugs.net/view.php?id=249#c5995 ... |If a \e or \cX escape sequence specifies a character that does not have an |encoding in the locale in effect when these backslash escape sequences are \e only yields escape U+1B? Since "this standard requires support for all of the control characters except NULL (matching what is done in the stty utility)" \e is always supported. It is in (US-)ASCII and thus ISO-8859-1 and thus in the lower 256 codepoints of Unicode. (It is also in that EBCDIC thing.) "This standard makes the results implementation-defined if \e or \cX specifies a character that is not present in the current locale" cannot be true for \e then, either. And likewise in "the unsupported character might be replaced with multiple characters, shell-special or regular (e.g. if is not supported $'\e' may be replaced by "???", "XXX" or "")" \e seems a particularly bad example thus. (Also quotation-mark does not _need_ to be escaped, it can. It might be worthwhile to point this out? A few lines earlier.) --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-10-18 10:42 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005995) geoffclare (manager) - 2022-10-18 10:42 https://austingroupbugs.net/view.php?id=249#c5995 -- These are the agreed changes from https://posix.rhansen.org/p/bug249 (omitting \u and \U). Page and line numbers are for the 2013 edition (C138.pdf) At page 2319 line 73573 (XCU section 2.1, Shell Introduction, item 4) change:The shell performs various expansions (separately) on different parts of each command, resulting in a list of pathnames and fields to be treated as a command and arguments; see [xref to 2.6].xto:For each word within a command, the shell processes backslash escape sequences inside dollar-single-quotes (see [xref to 2.2.4]) and then performs various word expansions (see [xref to 2.6]). In the case of a simple command, the results usually include a list of pathnames and fields to be treated as a command name and arguments; see [xref to 2.9]. At page 2320 line 73594 (XCU section 2.2, Quoting) change:The various quoting mechanisms are the escape character, single-quotes, and double-quotes.to:The various quoting mechanisms are the escape character, single-quotes, double-quotes, and dollar-single-quotes. At page 2320 lines 73609-73611 (XCU 2.2.3, Double-Quotes), change:$ The shall retain its special meaning introducing parameter expansion (see Section 2.6.2), a form of command substitution (see Section 2.6.3), and arithmetic expansion (see Section 2.6.4).to:$ The shall retain its special meaning introducing parameter expansion (see Section 2.6.2), a form of command substitution (see Section 2.6.3), and arithmetic expansion (see Section 2.6.4), but shall not retain its special meaning introducing the dollar-single-quotes form of quoting (see [xref to 2.2.4]). At page 2321 lines 73626-73627 (XCU 2.2.3, Double-Quotes), change:A single-quoted or double-quoted string that begins, but does not end, within the "`...`" sequenceto:A quoted (single-quoted, double-quoted, or dollar-single-quoted) string that begins, but does not end, within the "`...`" sequence After page 2321 line 73635 (end of XCU section 2.2), insert a new subsection:2.2.4 Dollar-Single-Quotes A sequence of characters starting with a immediately followed by a single-quote ($') shall preserve the literal value of all characters up to an unescaped terminating single-quote ('), with the exception of certain backslash escape sequences, as follows: \" yields a (double-quote) character. \' yields an (single-quote) character. \\ yields a character. \a yields an character. \b yields a character. \e yields an character. \f yields a character. \n yields a character. \r yields a character. \t yields a character. \v yields a character. \cX yields the control character listed in the Value column of [xref to XCU Table 4.21] in the Operands section of the stty utility when X is one of the characters listed in the ^c column of the same table, except that \c\\ yields the control character since the character must be escaped. \xXX yields the byte whose value is the hexadecimal value XX (one or more hex
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://www.austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://www.austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-03-14 21:22 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005751) shware_systems (reporter) - 2022-03-14 21:22 https://www.austingroupbugs.net/view.php?id=249#c5751 -- Re: 5746 According to the Etherpad bug249 page, currently line 316, escaped sequences are converted after alias substitution completes, if needed (in case $' strings are part of an alias body), and before tilde or other expansions of XCU 2.6, with removal of the "$'" and trailing during the Quote Removal phase. At that point the converted text should be treated the same as a single-quoted string that didn't need escape sequences would. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://www.austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://www.austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-03-14 01:22 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005747) calestyo (reporter) - 2022-03-14 01:22 https://www.austingroupbugs.net/view.php?id=249#c5747 -- Also, if \e \cX \u \U would be standardised as be dependant on some locale (and not just producing *always* UTF-8 - which I'm not sure whether this would be a good idea), then the following should be highlighted somewhere for educational purposes: E.g. bash does right now: $ LC_ALL=en_US.UTF-8 $ foo() { echo $'\u2208' ; } $ foo ∈ $ LC_ALL=C $ echo $'\u2208' \u2208 $ foo ∈ $ However, I'd bet that many people would expect the 2nd invocation of foo to also result in \u2208, or whatever representation U+2208 has in the then current locale. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://www.austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://www.austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2022-03-14 00:52 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005746) calestyo (reporter) - 2022-03-14 00:52 https://www.austingroupbugs.net/view.php?id=249#c5746 -- Another possible issue with the most recent(?) proposed text in https://www.austingroupbugs.net/view.php?id=249#c2809 ... I'm not an expert, but my understanding was that the quotings are not "resolved" during token recognition ("During token recognition no substitutions shall be actually performed...") but in Quote Removal, right? Previously that said: "The quote characters (, single-quote, and double-quote) that were present in the original word shall be removed unless they have themselves been quoted." which was enough, because the desired literal string was already the result when any quotes were removed, but now with $'...', this is no longer the case. I miss somehow the step which says that any escape sequences (like \t, \xXX, etc.) have to be replaced depending on their respective definition. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-03-16 09:40 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005275) geoffclare (manager) - 2021-03-16 09:40 https://austingroupbugs.net/view.php?id=249#c5275 -- In the HTML version, the equivalent of XCU Table 4.21 is "Table: Circumflex Control Characters in stty" on the stty page. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590 2010-11-11 16:42 Don Cragun Note Edited: 590 2010-11-11 16:44 Don Cragun Interp Status => --- 2010-11-11 16:44 Don Cragun Final Accepted Text => See https://austingroupbugs.net/view.php?id=249#c590 2010-11-11 16:44 Don Cragun Status
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://www.austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://www.austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-03-15 20:38 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005273) mirabilos (reporter) - 2021-03-15 20:38 https://www.austingroupbugs.net/view.php?id=249#c5273 -- I must say \c\\ irritates me. Otherwise (I cannot figure out how to find “XCU Table 4.21” in the HTML version) it looks like mksh is already compliant (plus supporting \E as alias for \e for GNU bash compatibility). mksh currently defines \cX for any X (I hope this matches the mysterious table I cannot find) as: The sequence “\c%”, where ‘%’ is any octet, translates to Ctrl-%, that is, “\c?” becomes DEL, everything else is bitwise ANDed with 0x9F. With this, I also get FS from \c< or \c| if I must, but… meh. I also read \cX as eating X no matter what it was, at $'…' parse time. So, what am I supposed to do, change mksh now? So after \c we first have another level of character interpretation and the \c acts only then? Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
The following issue has been set PARENT OF issue 0001413. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-02-05 16:42 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- parent of 0001413 incorrect description of how a hexadeci... related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590 2010-11-11 16:42 Don Cragun Note Edited: 590 2010-11-11 16:44 Don Cragun Interp Status => --- 2010-11-11 16:44 Don Cragun Final Accepted Text => See https://austingroupbugs.net/view.php?id=249#c590 2010-11-11 16:44 Don Cragun Status Under Review => Resolved 2010-11-11 16:44 Don Cragun Resolution Open => Accepted As Marked 2010-11-11 16:44 Don Cragun Tag Attached: issue8 2010-12-09 16:12 Don Cragun Note Edited: 590 2015-07-31 15:59 stephane Issue Monitored: stephane
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Hello Robert. Robert Elz wrote in <12854.1612654...@jinx.noi.kre.to>: |Date:Sat, 06 Feb 2021 21:55:19 +0100 |From:Steffen Nurpmeso |Message-ID: <20210206205519.43rln%stef...@sdaoden.eu> | || Fiddling with bytes is something completely different. | |But how is the shell supposed to know? | |Consider | U1=$'\u021c' | U2=$'\u0a47' The shell has to convert \u. It checks. | X1=$'\310\234' | X2=$'\340\251\207' | |Then S1="${U1}${U2}" | S2="${X1}${X2}" The shell just takes and concatenates bytes. |Or worse, given a script j2 containing just: printf '%s%s' "$1" "$2" | | S3="$( j2 "$U1" "$U2" )" | S4="$( j2 "$X1" "$X2" )" | |Then given any (valid, existing, but otherwise unconstrained) unicode |code points for U1 and U2 (with the sole exception of \u000A simply because |of the way command substitutions eat newlines), and the corresponding |encodings not using \u in X1 and X2, which of those lines (the assignments |to U1, U2, X1, X2, S1, S2, S3, or S4) should the shell ever generate any |kind of error? But no, that was not what i said. You have to convert the \u when you parse it, and can apply the Unicode rules as you go, having the target character set in mind. If the target is "Unicode like POSIX does it" aka UTF-8, then you can perform full UTF-32 to UTF-8 validity checking. [I pasted utf32_to_utf8 and vice versa, but then removed it again.] Otherwise you can only test the overall codepoint (less than or equal to 0x10). Well, that is at least how i do it. What i do not mean is that you retest whether the resulting UTF-8 sequence is valid, but offering the possibility to the user would also be nice, for example, to validate user input after it has been cleaned from several constructs. [It is unfortunately a slow operation.] |||the string, it has no idea how the script will interpret it, nothing |||requires that a $'\u' value ever be used as "characters" (though |||that would be a common use). || || I disagree. Invalid \u \U should either remain unconverted or || result in the Unicode replacement character (U+FFFD) to be used || instead | |That's not disagreeing, or not with what I meant. I have no problem |with generating an error (better than silently making a replacement |char I think) for invalid \u conversions (\uDEAF for example). Note that, for Unicode, it is _the_ replacement character (�). |What I was referring to is the opinion, sometimes stated, that certain |combinations of unicode characters are invalid (as a unicode character |sequence). That is, above, with carefully selected U1 and U2, some |people would say that S1 can be invalid. That I do not think is reason\ |able |to expect of the shell - it is up to the application to get those right. Well having composition and decomposition aka normalization is far far away, that much is plain. Attack vector over attack vector, sequences that become invalid or join or _do not_ join if such things happen. Better to use perl(1) for text processing, it has tremendously powerful Unicode processing capabilities. |But of course, a single \u code point should be converted properly. |I thought I said that last time. Fine. || as the starting point of conversion .. to UTF-8 or locale || via iconv(3) | |Ignoring the bit about converting to other replacement chars, here, |since I'm concerned with valid codepoints only, I don't think the |shell should be converting this kind of thing via iconv() ... utilities |might (including built-ins in sh, like echo or printf) but not the |shell itself. In the above (assuming I did the conversions correctly) |it should always be the case that $U1 = $X1 and $U1 = $X2, regardless But if you look around and try $'' sequences in bash for example you will find that \u sequences just will not do what you want here. \u is a Unicode codepoint, and so something purely textual to the core. Well i think this all roots in informatics coming from the wrong direction, may "speech synthesis" have been an early point of interest or not. The interface was not even truly US-ASCII at first, on TUHS there was just recently a thread on that. 6 BIT ALL UPPERCASE, packed sequences with multiple "characters" per storage unit, and all that. This never had anything to do with human communication aka linguistic communication, it was instead about communicating human desire to computer language. What became Unicode changed that. Yes, we now have emojis or however these are spelled, and cute robotic eyes show us heart. So situation has changed somewhat. |of any locale settings. If I cannot assume that when writing a script |then I have no idea how I would ever do anything with non-ascii chars |reliably. | || But in my opinion \u \U should not be mutilated but allow the full || range of Unicode aka ISO 10646, | |I agree, "invalid" should only include those code points designated |that way, not those just not assigned
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
On 06/02/2021 23:38, Robert Elz via austin-group-l at The Open Group wrote: Date:Sat, 06 Feb 2021 21:55:19 +0100 From:Steffen Nurpmeso Message-ID: <20210206205519.43rln%stef...@sdaoden.eu> | Fiddling with bytes is something completely different. But how is the shell supposed to know? Consider U1=$'\u021c' U2=$'\u0a47' X1=$'\310\234' X2=$'\340\251\207' [...] Ignoring the bit about converting to other replacement chars, here, since I'm concerned with valid codepoints only, I don't think the shell should be converting this kind of thing via iconv() ... utilities might (including built-ins in sh, like echo or printf) but not the shell itself. In the above (assuming I did the conversions correctly) it should always be the case that $U1 = $X1 and $U1 = $X2, regardless of any locale settings. If I cannot assume that when writing a script then I have no idea how I would ever do anything with non-ascii chars reliably. bash, ksh and zsh, all of which support $'\u', do convert the Unicode code point to the current locale, and I support this and implemented the same in my shell. For \u sequences that ask for a Unicode code point that is not representable in the current locale, the \u sequence is left unconverted (bash, ksh, my shell) or causes the shell to report an error (zsh). This is useful for scripts that aim to work in a limited selection of locales and know that certain characters are valid in all the supported locales, but are not encoded the same way in all of them. If they want to print a Euro symbol, for instance, they can write echo $'\u20AC' and be assured it works everywhere the Euro symbol is supported. If they instead write echo '€' where the script is saved as UTF-8, the script will needlessly break when it is run in an ISO-8859-15 environment. Cheers, Harald van Dijk
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Date:Sat, 06 Feb 2021 21:55:19 +0100 From:Steffen Nurpmeso Message-ID: <20210206205519.43rln%stef...@sdaoden.eu> | Fiddling with bytes is something completely different. But how is the shell supposed to know? Consider U1=$'\u021c' U2=$'\u0a47' X1=$'\310\234' X2=$'\340\251\207' ThenS1="${U1}${U2}" S2="${X1}${X2}" Or worse, given a script j2 containing just:printf '%s%s' "$1" "$2" S3="$( j2 "$U1" "$U2" )" S4="$( j2 "$X1" "$X2" )" Then given any (valid, existing, but otherwise unconstrained) unicode code points for U1 and U2 (with the sole exception of \u000A simply because of the way command substitutions eat newlines), and the corresponding encodings not using \u in X1 and X2, which of those lines (the assignments to U1, U2, X1, X2, S1, S2, S3, or S4) should the shell ever generate any kind of error? | |the string, it has no idea how the script will interpret it, nothing | |requires that a $'\u' value ever be used as "characters" (though | |that would be a common use). | | I disagree. Invalid \u \U should either remain unconverted or | result in the Unicode replacement character (U+FFFD) to be used | instead That's not disagreeing, or not with what I meant. I have no problem with generating an error (better than silently making a replacement char I think) for invalid \u conversions (\uDEAF for example). What I was referring to is the opinion, sometimes stated, that certain combinations of unicode characters are invalid (as a unicode character sequence). That is, above, with carefully selected U1 and U2, some people would say that S1 can be invalid. That I do not think is reasonable to expect of the shell - it is up to the application to get those right. But of course, a single \u code point should be converted properly. I thought I said that last time. | as the starting point of conversion .. to UTF-8 or locale | via iconv(3) Ignoring the bit about converting to other replacement chars, here, since I'm concerned with valid codepoints only, I don't think the shell should be converting this kind of thing via iconv() ... utilities might (including built-ins in sh, like echo or printf) but not the shell itself. In the above (assuming I did the conversions correctly) it should always be the case that $U1 = $X1 and $U1 = $X2, regardless of any locale settings. If I cannot assume that when writing a script then I have no idea how I would ever do anything with non-ascii chars reliably. | But in my opinion \u \U should not be mutilated but allow the full | range of Unicode aka ISO 10646, I agree, "invalid" should only include those code points designated that way, not those just not assigned yet, otherwise every time a new code point is allocated we'd all need to go update our shells. kre
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Robert Elz wrote in <15313.1612563...@jinx.noi.kre.to>: |Date:Fri, 05 Feb 2021 21:54:52 +0100 |From:Steffen Nurpmeso |Message-ID: <20210205205452.7tbl2%stef...@sdaoden.eu> | | || Well .. if i recall correctly quoting inside of ${xYz} has been || clarified not too long ago | |Not the way that you seem to think. For sure. I really had to look and read it in context. |||And last (for now anyway), after "set -- A B C" what's the effect of |||$'pfx\${@}sfx' ? || || This is interesting. I would say it is identical to ${*} here. | |In that case $'' could not be the only quoting mechanism that users use. Yes. It can be nailed down to that. || My MUA just turns it into UTF-8 (via a utf32_to_utf8 function that || uses the Unicode replacement character for erroneous codepoints) | |The generation of the UTF-8 is not the issue, and the (relatively few) |values that are reserved can be handled. And i will not go nail down in return. || You have to be careful a bit with Unicode. There are guarantees || that must be fulfilled, see for example [1]. Since the shell is || producing UTF-8 it should ensure that no invalid UTF-8 sequences || are exposed to consumers. | |Of course. | |But: users are permitted to write $'\xfc\x13' and similar, and no-one |suggests that the shell should validate such sequences for valid UTF-8 |encoding, and nor would anyone (I hope) claim the shell should object |to $'\u0207\xfc\x13' just because it happens to have a \u in it. |This is all just bits until it gets used somehow, at which point if |it is invalid, then so be it. In a standards context i disagree. Pacta sunt servanda. This stands for "Treu und Glauben" ("Good faith") which is §242 of the BGB (Bürgerliches Gesetzbuch aka Civil Code of Germany). Fiddling with bytes is something completely different. If you want to create that in the shell you can use \x or \OCTAL or what, but if you go \u or \U then a valid Unicode codepoint (or whatever mutilated range ISO standardizes for \u \U escape sequences) should be expected that successfully passes a conforming UTF-32-to-X conversion. That is my opinion. || When a process interprets a code unit sequence which purports to || be in a Unicode character encoding form, it shall treat || ill-formed code unit sequences as an error conddition and shall || not interpret such sequences as characters. | |That has to be a requirement on the application, not upon the programming |language implementation (the shell here) - when the shell is converting Yes, you can always use \x or \OCTAL to break constraints if you want to. This is the flexibility of the programming language POSIX shell, that is very much text-bound, however. But you could create explicit binary strings with $'' and \x / \OCTAL as well as \uU, which is much better than what we have, where often the bytes as such are embedded in strings. Or uuencoded, or base64 encoded, in order to be decoded once needed. |the string, it has no idea how the script will interpret it, nothing |requires that a $'\u' value ever be used as "characters" (though |that would be a common use). I disagree. Invalid \u \U should either remain unconverted or result in the Unicode replacement character (U+FFFD) to be used instead as the starting point of conversion .. to UTF-8 or locale via iconv(3) (then likely resulting in other replacement character(s) for non-buggy implementations). But in my opinion \u \U should not be mutilated but allow the full range of Unicode aka ISO 10646, if i recall (i have not reread the thread nor re-looked at ISO) correctly artificial restrictions where imposed on the range of allowed characters by ISO. A nice weekend i wish. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Date:Fri, 05 Feb 2021 21:54:52 +0100 From:Steffen Nurpmeso Message-ID: <20210205205452.7tbl2%stef...@sdaoden.eu> | Well .. if i recall correctly quoting inside of ${xYz} has been | clarified not too long ago Not the way that you seem to think. | |And last (for now anyway), after "set -- A B C" what's the effect of | |$'pfx\${@}sfx' ? | | This is interesting. I would say it is identical to ${*} here. In that case $'' could not be the only quoting mechanism that users use. | My MUA just turns it into UTF-8 (via a utf32_to_utf8 function that | uses the Unicode replacement character for erroneous codepoints) The generation of the UTF-8 is not the issue, and the (relatively few) values that are reserved can be handled. | You have to be careful a bit with Unicode. There are guarantees | that must be fulfilled, see for example [1]. Since the shell is | producing UTF-8 it should ensure that no invalid UTF-8 sequences | are exposed to consumers. Of course. But: users are permitted to write $'\xfc\x13' and similar, and no-one suggests that the shell should validate such sequences for valid UTF-8 encoding, and nor would anyone (I hope) claim the shell should object to $'\u0207\xfc\x13' just because it happens to have a \u in it. This is all just bits until it gets used somehow, at which point if it is invalid, then so be it. | When a process interprets a code unit sequence which purports to | be in a Unicode character encoding form, it shall treat | ill-formed code unit sequences as an error conddition and shall | not interpret such sequences as characters. That has to be a requirement on the application, not upon the programming language implementation (the shell here) - when the shell is converting the string, it has no idea how the script will interpret it, nothing requires that a $'\u' value ever be used as "characters" (though that would be a common use). kre
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Hello Robert. Robert Elz wrote in <14199.1612519...@jinx.noi.kre.to>: |Date:Thu, 04 Feb 2021 21:59:52 +0100 |From:Steffen Nurpmeso |Message-ID: <20210204205952.fw6wv%stef...@sdaoden.eu> | || Ok, of course, but let me disagree with the latter. Bizarre rules || and Bourne/Korn shell etc ... just look at ${aXb} and quoting || rules within. | |Two things .. first, I agree, the quoting rules that exist now are |bizarre, and weird, and just a royal pain to deal with (both for |users and implementors) - which is one reason I'm loath to add yet |another difference. | |And second, I meant bizarre in a different way, it was probably |the wrong word (there are reasons, many of them, why I write code, and |not novels, nor, or at least very rarely, even academic papers), |what I meant was that inside the shell, we have to deal with single |quoted strings (which are very easy, as they're very simple, and which |includes both ' and \ quoting), and double quoted strings, which are |messy and cause problems, but which we have generally managed to conquer. |Adding a third, somewhat in between form, where most of the text is |literal, but where $ expansions (but I am assuming not ` expansions) |happen, when doing so adds no new functionality, just perhaps a slightly |simpler syntax for the user, just seems like the wrong thing to do. If, and only with this if, it would become standardized it could replace the other quoting mechanisms, not in the shell, but from the user point of view. The good thing about $'' is that nothing happens, just like in a single-quoted string, unless you see a reverse solidus. No fancy rules unless you get triggered to do so. And i have not implemented it yet, but i already document \`{} as a future extension that will allow command evaluation, then. Note this is Plan9 rc syntax (`{command}), which should detect nesting easier, just like $() does. I do not expect that to be implemented by a POSIX shell. It is a MUA in the end :) That one documents '\$NAME' Non-standard extension: expand the given variable name, as above. Brace enclosing the name is supported. |That, and while you can do whatever you like in your MUA, we have to |deal with the rest of sh syntax ... eg: what happens to a ' that occurs |inside a \$ expansion in your scheme (that is, as part of its text, \ |not its |result)? Does that terminate the $' string, and perhaps lead to an |invalid $ expansion, or do things nest? Does that include inside \ |${var:=foo} |(etc) type expansions where currently (if inside quotes) quoting in the foo |word doesn't work (except some \ quoting) - if so, then we have a whole new |expansion syntax to deal with, and if not, then what do we make of a ' that |occurs there? Or what of a \' there?Do $' expressions nest? Well .. if i recall correctly quoting inside of ${xYz} has been clarified not too long ago -- i would expect the entire $'' context to be yielded and resumed once the ${xYz} construct has been handled. I *think* that is what has to happen with them inside of "", so it should be just the same. Except that it was triggered by \$.. not by $.. as it would in double-quoted strings. I think that would be the most natural take. |First in the simple cases, like | $'whatever \$( cmd $'arg' ) and more' |where I assume that answer would be yes, and similarly in | $'xxx \${var%$'\n'} yyy' |but also as a simple insertion | $'abc \$'\t' def' |where doing so makes no sense at all, and so the answer is probably |"not allowed", but that is then the one $ "expansion" which isn't |allowed inside $' strings, which is yet another special case. | |Also, if a command substitution were embedded using \$( ) inside a $' |string, what conversions (if any) are performed upon the stdout of the |command before being embedded in the string, are \ escapes there expected |to work? (Same question for a variable expansion). | |Similarly, what does $'\${var-"two words"}' generate, and |$'\${var-\"two words\"}' (assuming var is unset naturally). Or using ' |instead of " in both of those? All that, to me, yield $'', resume once construct has been handled. |And last (for now anyway), after "set -- A B C" what's the effect of |$'pfx\${@}sfx' ? This is interesting. I would say it is identical to ${*} here. |At least once we either drop \u, or properly define how it is supposed |to work (if anyone actually has an idea what that is), $' is entirely the |same as ' once the internal expansions are done (as part of lexical \ |analysis) |so is trivial to add, makes it easier to encode some strings (just easier, |nothing that cannot already be done) and is trivial to implement. Adding |\$ to that would (I think, I haven't tried to actually do it) complicate |everything. Of course, since $' is properly specified, and unknown \ |escapes produce implementation defined (or unspecified)
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-02-05 16:42 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005229) calestyo (reporter) - 2021-02-05 16:42 https://austingroupbugs.net/view.php?id=249#c5229 -- Maybe it would be really better to leave out \u and \U for the time being? Forcing users with non-UTF-8 encoding to convert UTF8 output again with some external tool, is not much better than forcing them now to produce such characters or even basic ones like \t, in the first place. Maybe it's to simple minded, but can't one just leave \u and \U to be Unicode code points, with it's being up to the shell to convert it to the current encoding and with unspecified results if no mapping is possible. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-02-05 16:08 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005228) dwheeler (reporter) - 2021-02-05 16:08 https://austingroupbugs.net/view.php?id=249#c5228 -- On further reflection: I changed my mind. Robert Elz's proposal is simple and clear. So: If we add \u and \U within $'...' to POSIX, just have it *always* generate the equivalent UTF-8 (without trying tof figure out if it's "legal"). Then the user *always* knows what it generates. If they want something other than UTF-8, they can use $'...' to generate the UTF-8, and then use some other tool like iconv to convert it (e.g., when printing). Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590 2010-11-11 16:42 Don Cragun Note Edited: 590
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-02-05 16:02 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005227) dwheeler (reporter) - 2021-02-05 16:02 https://austingroupbugs.net/view.php?id=249#c5227 -- Robert Elz said: > At least once we either drop \u, or properly define how it is supposed > to work (if anyone actually has an idea what that is), $' is entirely the > same as ' once the internal expansions are done (as part of lexical analysis) > so is trivial to add, makes it easier to encode some strings (just easier, > nothing that cannot already be done) and is trivial to implement. Agreed. The goal is to make it *easy* to do basic things, like handle newline and tab. It's not hard to implement, and it's already implemented by many shells. > ps: unrelated to \$ in $' but while I am here, since I mentioned it above, > in the NetBSD sh, \u (which accepts any number of hex digits up to 4, or > up to 8 for \U, not just exactly 4 or 8, but that's a frill) the interpretation > is that the UTF-8 encoding of the code point specified is embedded in the > string. No more, no less. In particular it is *not* the shell's job to > validate the UTF sequences so that they make sense, or can rationally be > interpreted as anything at all (that's on the application). They're just > bit patterns. Similarly, since the author of the script cannot be assumed > to know what locale the user running it will have set, converting the \u > sequence to some other locale (while it is still being processed inside the > shell) cannot be correct either. ... > Since there doesn't seem to be to be a lot of > reason any more for anyone not to use non UTF-8 encodings, that would really > mean doing a whole lot of nothing most of the time (hopefully, always). I would be very *happy* with the interpretation that \u and \U simply generate the UTF-8 encoding, at least for the C locale, POSIX locale, and any locale ending in ".UTF-8". That is easy to implement, and I agree that trying to validate all sequences is absurd; just output it thank you. We could *allow* shells to convert to the local locale *or* generate UTF-8 for \u and \U. If we allow \u, I suggest ONLY standardizing \u 4-digits and \U 8-digits, and allowing but *NOT* requiring support for byte 0 in strings. Allowing less-than-4-digits makes it easy to make a mistake by having a hex character "after" a short sequence. But I could certainly live with it either way. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Date:Thu, 04 Feb 2021 21:59:52 +0100 From:Steffen Nurpmeso Message-ID: <20210204205952.fw6wv%stef...@sdaoden.eu> | Ok, of course, but let me disagree with the latter. Bizarre rules | and Bourne/Korn shell etc ... just look at ${aXb} and quoting | rules within. Two things .. first, I agree, the quoting rules that exist now are bizarre, and weird, and just a royal pain to deal with (both for users and implementors) - which is one reason I'm loath to add yet another difference. And second, I meant bizarre in a different way, it was probably the wrong word (there are reasons, many of them, why I write code, and not novels, nor, or at least very rarely, even academic papers), what I meant was that inside the shell, we have to deal with single quoted strings (which are very easy, as they're very simple, and which includes both ' and \ quoting), and double quoted strings, which are messy and cause problems, but which we have generally managed to conquer. Adding a third, somewhat in between form, where most of the text is literal, but where $ expansions (but I am assuming not ` expansions) happen, when doing so adds no new functionality, just perhaps a slightly simpler syntax for the user, just seems like the wrong thing to do. That, and while you can do whatever you like in your MUA, we have to deal with the rest of sh syntax ... eg: what happens to a ' that occurs inside a \$ expansion in your scheme (that is, as part of its text, not its result)? Does that terminate the $' string, and perhaps lead to an invalid $ expansion, or do things nest? Does that include inside ${var:=foo} (etc) type expansions where currently (if inside quotes) quoting in the foo word doesn't work (except some \ quoting) - if so, then we have a whole new expansion syntax to deal with, and if not, then what do we make of a ' that occurs there? Or what of a \' there?Do $' expressions nest? First in the simple cases, like $'whatever \$( cmd $'arg' ) and more' where I assume that answer would be yes, and similarly in $'xxx \${var%$'\n'} yyy' but also as a simple insertion $'abc \$'\t' def' where doing so makes no sense at all, and so the answer is probably "not allowed", but that is then the one $ "expansion" which isn't allowed inside $' strings, which is yet another special case. Also, if a command substitution were embedded using \$( ) inside a $' string, what conversions (if any) are performed upon the stdout of the command before being embedded in the string, are \ escapes there expected to work? (Same question for a variable expansion). Similarly, what does $'\${var-"two words"}' generate, and $'\${var-\"two words\"}' (assuming var is unset naturally). Or using ' instead of " in both of those? And last (for now anyway), after "set -- A B C" what's the effect of $'pfx\${@}sfx' ? At least once we either drop \u, or properly define how it is supposed to work (if anyone actually has an idea what that is), $' is entirely the same as ' once the internal expansions are done (as part of lexical analysis) so is trivial to add, makes it easier to encode some strings (just easier, nothing that cannot already be done) and is trivial to implement. Adding \$ to that would (I think, I haven't tried to actually do it) complicate everything. Of course, since $' is properly specified, and unknown \ escapes produce implementation defined (or unspecified) results, there's nothing to stop shells from adding \$ if they like (it would probably help them if there was a fully specified spec of how it is intended to work, including all the corner cases) and if it becomes popular, perhaps it could appear in some later standard. I just don't see that happening myself. kre ps: unrelated to \$ in $' but while I am here, since I mentioned it above, in the NetBSD sh, \u (which accepts any number of hex digits up to 4, or up to 8 for \U, not just exactly 4 or 8, but that's a frill) the interpretation is that the UTF-8 encoding of the code point specified is embedded in the string. No more, no less. In particular it is *not* the shell's job to validate the UTF sequences so that they make sense, or can rationally be interpreted as anything at all (that's on the application). They're just bit patterns. Similarly, since the author of the script cannot be assumed to know what locale the user running it will have set, converting the \u sequence to some other locale (while it is still being processed inside the shell) cannot be correct either. If I ever work out what (beyond message encoding, and perhaps some pattern matching expressions) what the shell is supposed to be doing with locales (which as best I can tell, is really not a lot) and I implement that, it would be by encoding everything internally as UTF-8 sequences (not wchar_t), and then converting to locale specified encodings as strings are output (or from them for input). Since there doesn't seem to be to be a lot of
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://www.austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://www.austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-02-04 21:38 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005226) dwheeler (reporter) - 2021-02-04 21:38 https://www.austingroupbugs.net/view.php?id=249#c5226 -- I agree with @stephane. I'd rather have half a loaf than no loaf, and adding this without \u or \U still makes it easier to insert tab, newline, and so on into strings. I'll note that my original proposal from 2010 didn't include proposing \u or \U. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590 2010-11-11 16:42 Don Cragun Note Edited: 590 2010-11-11 16:44 Don Cragun Interp Status => --- 2010-11-11 16:44 Don Cragun Final Accepted Text => See
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Hello Robert. Robert Elz wrote in <9394.1612390...@jinx.noi.kre.to>: |Date:Wed, 03 Feb 2021 18:35:01 +0100 |From:Steffen Nurpmeso |Message-ID: <20210203173501.srcqv%stef...@sdaoden.eu> | || What else. Having \$ would be nice, | |It doesn't exist in shells, so it cannot be in the standard. | || that i do not understand reluctance of you all. | |For me, it is unnecessary (sure, it might make user input fractionally |cleaner, but adds nothing that cannot already be done) - and it turns |$'' from being essentially a single quoted string (once the escapes are |processed, which is entirely a parse time activity) into a bizarre form |of double quoted string, which needs expansion at execution time. That |complicates the implementation, and for the minor benefit it offers, |it just isn't worth it. Ok, of course, but let me disagree with the latter. Bizarre rules and Bourne/Korn shell etc ... just look at ${aXb} and quoting rules within. (What i mean is: this does not come naturally, at all.) And in general, how long would it take to re-understand the tests you have committed for NetBSD shell a few years ago, without any comments describing what they do -- what a mess, at least to my brain! Temporarily suspend and expand until the \$ aka \${} construct is fully expanded, then continue "single quote reading with \XY expansion, that sounds easy. But hey, i do not want to block progress. There are minorities who may not know about the number 0, but still coding standards put them under the digit system umbrella. If my words are blocking issue 249, then i take them back, and in the end making \U and \u compatible to some ISO standard is state of the art. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Date:Wed, 03 Feb 2021 18:35:01 +0100 From:Steffen Nurpmeso Message-ID: <20210203173501.srcqv%stef...@sdaoden.eu> | What else. Having \$ would be nice, It doesn't exist in shells, so it cannot be in the standard. | that i do not understand reluctance of you all. For me, it is unnecessary (sure, it might make user input fractionally cleaner, but adds nothing that cannot already be done) - and it turns $'' from being essentially a single quoted string (once the escapes are processed, which is entirely a parse time activity) into a bizarre form of double quoted string, which needs expansion at execution time. That complicates the implementation, and for the minor benefit it offers, it just isn't worth it. kre
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Robert Elz wrote in <11054.1612115...@jinx.noi.kre.to>: |Date:Sun, 31 Jan 2021 06:48:25 + |From:"Austin Group Bug Tracker via austin-group-l at The \ |Open Group" |Message-ID: <79086278e43eeebd97f64b7f45613...@www.austingroupbugs.net> | || A NOTE has been added to this issue. | |This comment isn't worthy of a note, but | || As most of the remaining issues are with $'\u' and $'\U', I || would suggest that it be dropped for issue8 for now. | |what is the "it" you're suggesting dropping (or deferring)? | |The whole of $'...' or just the (two) \u escapes inside $'' |I'd like to see $'' included, but if the only way to do that is to |omit the \u (both) escape sequences, I could live with that, particularly |as exactly how the shell should use unicode chars is still very much |uncertain (eg: if I want to write a case statement that would match |various currency symbols, just how do I encode that? Does it depend |upon the user's current locale, if so, how do I write a portable |script (do I need to iconv constant strings?), and if not, how is the |user's input supposed to match, particularly if they're not using |a UTF-8 locale. | |There's lots more work needed (initially by implementers, not here) Letting aside the \u stuff which currently goes via iconv(3) (and thus likely causes replacement to occur in case the locale character set cannot handle), not without reiterating that the real future proof approach would be to require iconv(3) to handle Unicode grapheme boundaries, and that in turn meaning that multiple \u must be interpreted in sequence because Unicode is not about single codepoints but at least potentially graphemes aka real characters which are formed of multiple adjacent individual codepoints. I am not standing in your way, it is only about commenting that it is worthwhile noting that quoted ranges should extend to the maximum length possible in order to allow all languages of the world to benefit from internationalization efforts (sic). What else. Having \$ would be nice, i have it for the little MUA i maintain. If you just look at this simple shell snippet, and i could have quoted other things, though admittedly chown '"${user}"':'"${group}"' '"${user}"' || exit 6 echo 0 > '"${user}"'/"'"${datfile}"'" chmod 0600 '"${user}"'/"'"${datfile}"'" could be quoted as unities, hmm. Anyhow with $'' in its best epiphany, so to say, there would be a single flow of progression, and so much nicer to the human eye -c $' ... chown \${user}:\${group} \${user} || exit 6 echo 0 > \${user}/"\${datfile}" chmod 0600 \${user}/"\${datfile}" ' that i do not understand reluctance of you all. Ciao, --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
2021-02-01 00:59:35 +0700, Robert Elz via austin-group-l at The Open Group: [...] > | As most of the remaining issues are with $'\u' and $'\U', I > | would suggest that it be dropped for issue8 for now. > > what is the "it" you're suggesting dropping (or deferring)? > > The whole of $'...' or just the (two) \u escapes inside $'' [...] Sorry, I meant the latter. I've now edited my note to clarify. -- Stephane
Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
Date:Sun, 31 Jan 2021 06:48:25 + From:"Austin Group Bug Tracker via austin-group-l at The Open Group" Message-ID: <79086278e43eeebd97f64b7f45613...@www.austingroupbugs.net> | A NOTE has been added to this issue. This comment isn't worthy of a note, but | As most of the remaining issues are with $'\u' and $'\U', I | would suggest that it be dropped for issue8 for now. what is the "it" you're suggesting dropping (or deferring)? The whole of $'...' or just the (two) \u escapes inside $'' I'd like to see $'' included, but if the only way to do that is to omit the \u (both) escape sequences, I could live with that, particularly as exactly how the shell should use unicode chars is still very much uncertain (eg: if I want to write a case statement that would match various currency symbols, just how do I encode that? Does it depend upon the user's current locale, if so, how do I write a portable script (do I need to iconv constant strings?), and if not, how is the user's input supposed to match, particularly if they're not using a UTF-8 locale. There's lots more work needed (initially by implementers, not here) in this area. kre
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://www.austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://www.austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-01-31 17:08 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005223) calestyo (reporter) - 2021-01-31 17:08 https://www.austingroupbugs.net/view.php?id=249#c5223 -- It should perhaps not be forgotten, that, if a standard fails to standardise things which are apparently needed,... implementers will simply go their own way (like they did not only in this case, but also other examples)... which are likely incompatible and/or non-portable... which is in turn obviously what standards It's quite clear that a standard cannot and should not jump on every hype train but this ticket is open for 10 years now, and there is still no easy/proper way to specify, e.g. unicode characters or even simply things like . So hopefully this can be addressed then at least in issue9. :-) Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added:
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://www.austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://www.austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-01-31 06:48 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005222) stephane (reporter) - 2021-01-31 06:48 https://www.austingroupbugs.net/view.php?id=249#c5222 -- As most of the remaining issues are with $'\u' and $'\U', I would suggest that it be dropped for issue8 for now. That could be revisited in issue9 where we should also consider at the same time: - adding those in printf format and %b ($'\u' which comes from zsh in 2003, was actually inspired from the same added to GNU printf in 2000. See https://www.zsh.org/mla/workers/2003/msg00223.html) and echo. - specifying a C.UTF-8 locale - more generally specify how and when utilities decode their argument from byte to characters (and the handling of when that fails). Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm
[1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell
A NOTE has been added to this issue. == https://www.austingroupbugs.net/view.php?id=249 == Reported By:dwheeler Assigned To:ajosey == Project:1003.1(2008)/Issue 7 Issue ID: 249 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: User Reference: Section:2.2 Quoting Page Number:2298-2299 Line Number:72348-72401 Interp Status: --- Final Accepted Text:Previously accepted text is in https://www.austingroupbugs.net/view.php?id=249#c590. == Date Submitted: 2010-04-30 21:42 UTC Last Modified: 2021-01-31 05:38 UTC == Summary:Add standard support for $'...' in shell == Relationships ID Summary -- related to 322 Defect in XCU File Format Notation related to 985 quote removal missing from case stateme... == -- (0005221) calestyo (reporter) - 2021-01-31 05:38 https://www.austingroupbugs.net/view.php?id=249#c5221 -- Is this still under consideration? Would probably be quite useful to have a portable way for such escape sequences. Issue History Date ModifiedUsername FieldChange == 2010-04-30 21:42 dwheeler New Issue 2010-04-30 21:42 dwheeler Status New => Under Review 2010-04-30 21:42 dwheeler Assigned To => ajosey 2010-04-30 21:42 dwheeler Name => David A. Wheeler 2010-04-30 21:42 dwheeler Section => 2.2 Quoting 2010-04-30 21:42 dwheeler Page Number => 2298-2299 2010-04-30 21:42 dwheeler Line Number => 72348-72401 2010-09-16 16:17 nick Note Added: 548 2010-09-18 18:12 Don Cragun Relationship added related to 322 2010-10-01 12:48 geoffclare Note Added: 560 2010-10-06 01:26 nick Tag Attached: c99 2010-10-08 17:28 mirabilos Note Added: 565 2010-10-25 06:17 Don Cragun Note Added: 590 2010-10-25 14:51 Don Cragun Note Edited: 590 2010-10-25 15:55 Don Cragun Note Edited: 590 2010-10-26 06:44 Don Cragun Note Edited: 590 2010-10-26 20:39 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:40 Don Cragun Note Edited: 590 2010-10-26 20:45 Don Cragun Note Edited: 590 2010-10-26 21:04 Don Cragun Note Edited: 590 2010-10-27 03:29 Don Cragun Note Edited: 590 2010-11-04 16:07 nick Note Added: 599 2010-11-05 02:34 Don Cragun Note Edited: 590 2010-11-05 03:00 Don Cragun Note Added: 601 2010-11-05 03:04 Don Cragun Note Edited: 601 2010-11-05 14:52 nick Note Added: 609 2010-11-05 14:54 nick File Added: n1534.htm 2010-11-05 14:56 nick File Added: n1534_original.htm 2010-11-11 16:40 Don Cragun Note Edited: 590 2010-11-11 16:42 Don Cragun Note Edited: 590 2010-11-11 16:44 Don Cragun Interp Status => --- 2010-11-11 16:44 Don Cragun Final Accepted Text => See https://www.austingroupbugs.net/view.php?id=249#c590 2010-11-11 16:44 Don Cragun Status Under Review => Resolved 2010-11-11 16:44 Don