Re: quote removal issues within character class
Thanks for the explanation. I didn't know any of these
Re: quote removal issues within character class
On 11/8/19 4:50 PM, Oğuz wrote: v=foo echo ${v#[[:"lower":]]} should print oo, but it prints foo instead. This is reproducible on bash 4.4 Plus case foo in (*[![:"lower":]]*) echo bar; esac prints bar, while The idea is that at this point in command processing, quote removal hasn't been performed. According to the abstract model the shell uses for word expansions, that means the double quotes are still present in the word, and `"lower"' is not the same as `lower'. There was a recent extensive discussion of this and other points on the posix mailing list, and, as kre said, the committee has decided to make this a special case. I changed this about a month ago, and the chage is in the devel branch. case foo in (*[![":lower":]]*) echo bar; esac doesn't print anything. And this is only reproducible on bash >5.0 This is an invalid character class, since a class has to begin with the two-character sequence `[:'. The intervening double quote causes that test to fail (this case is not so special, it seems). The first `]' then terminates the bracket expression, so th string has to contain at least a `]' to have a possibility of matching. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: quote removal issues within character class
FWIW, Andreas's description really was sufficient...
Re: quote removal issues within character class
Date:Sat, 09 Nov 2019 06:46:05 -0800 From:L A Walsh Message-ID: <5dc6d12d.6040...@tlinx.org> | Is this really what the standard says, Yes, I used cut (and then some line length/wrappoing reformatting) | because '\\' is not a character, but 2 characters. In sh, yes, in C, no. So it all depends upon context. The text comes from the XBD (basic definitions) so isn't either sh or C specific. But since that is followed by (and you quoted) (, , , and , respectively) I don't thnk there is any doubt what it means. | They could use "\\" but if a backslash is between single | quotes, it loses its special meaning. In sh yes, but this text is from the section on Regular Expressions, in the basic definitions (XBD), not the (many) sections about the shell in XCU (Commands and Utilities). | The only way to get a backslash get in what sense? In sh '\' is just fine, as is \\ if what you want is a literal backslash. | when using single quotes that I've found is to end the single-quote | then use the backslash. If you want to \ escape a character when you're in single quotes, then yes (in sh) that is correct. But the only time that makes sense is if the character that is to follow the \ is a ' (which cannot appear in a single quoted string), or if you want to elide a newline \ (where the means a literal newline character). | So if you wanted to insert single quotes in | a string that is single-quoted, you would have to do this: | | 'this is a single-quote(SQ) quoted string using a SQ ('\'') within | the single quote.' That is the usual way, yes. | Alternatively using double-quotes or another quoting | mechanism might be preferable. Yes, there are several other ways to do it, but the \' form is certainly the most common. When $'...' quoting becomes more popular (after it actually makes it into the standard) then that form allows $' ...\' ... ' ($' is "C" quoting, it works just like single quotes, except that C style backslash escape sequences, plus a few new ones, are expanded). | I find it odd that the standard would try to use a backslash within | a SQ'd string as a literalizer. It all relates to the context, and the expectations of the reader. Since the parenthisised note that follows is explicit about what it means, I don't think it really matters. In the section about the shell, \ tends to be written exactly like that (no quo9ting, just the character) when it is needed (it might occasionally be written as ). kre
Re: quote removal issues within character class
On Nov 09 2019, L A Walsh wrote: > On 2019/11/09 04:49, Robert Elz wrote: >> There's also >> >> The special characters '.', '*', '[', and '\\' >> (, , , and , >> respectively) shall lose their special meaning within a bracket >> expression. >> > > Is this really what the standard says, because '\\' is not a character, but > 2 characters. They could use "\\" but if a backslash is between single > quotes, it loses its special meaning. This is C notation, not shell notation. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: quote removal issues within character class
You've already answered it, thank you. I didn't know that [:, [., [= were special *sequences*, I guess I overlooked that part. Thanks again for taking time to explain it in detail, I'm grateful 9 Kasım 2019 Cumartesi tarihinde Robert Elz yazdı: > Date:Sat, 9 Nov 2019 07:35:16 +0300 > From:=?UTF-8?B?T8SfdXo=?= > Message-ID: < > cah7i3lr68civxlr9_hoogqa7vd-zyvz+fck-0k3uqptnsir...@mail.gmail.com> > > | is correct, as "foo" does not contain a ']' which would be required > | > to match there (quoting the ':' means there is no character class, > | > hence we have instead (the negation of) a char class containing '[' > ':' > | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and > | > followed by ']' and anything. foo does not match. f]oo would. > | > > | > | where exactly is this documented in the standard? > > I'm not sure which part exactly you're looking for, but char sets in sh > are specified to be the same as in REs, except that ! replaces ^ as the > negation character (that's in XCU 2.13.1). Char sets (bracket expressions) > in RE's are documented in XBD 9.3.5 wherein it states > > A bracket expression is either a matching list expression or a > non-matching list expression. It consists of one or more > expressions: > ordinary characters, collating elements, collating symbols, > equivalence classes, character classes, or range expressions. > The (']') shall lose its special meaning and > represent itself in a bracket expression if it occurs first in the > list > (after an initial ('^'), if any). > > Otherwise, it shall terminate the bracket expression, > > That is, a ']' that occurs anywhere else terminates the bracket expression > except: > > unless it appears in a collating symbol (such as "[.].]") > > (not relevant in the given example) > > or is the ending for a collating symbol, > equivalence class, or character class. > > So the ']' that immediately follows the second ':' would not terminate the > bracket expression if it is the ending ']' for a character class > (collating symbols and equiv classes not being relevant to the example). > Of course, that can only happen if there is a character class to end. > > There's also > > The special characters '.', '*', '[', and '\\' > (, , , and , > respectively) shall lose their special meaning within a bracket > expression. > > whereupon if the [": sequence does not start a char class, the '[' there > is simply a literal char inside the bracket expression. > > Similarly if the bracket expression ends at the first ']' (the one > imediately > after the second ':') the following ']' is simply a literal character, as > ']' chars are special only when following a '['. > > So, all that's left to determine is whether the [": sequence can be > considered as beginning a char class. > > In a RE it certainly cannot - quote chars (' and ") are not special in > REs at all, and [": is no different syntatically than [x: which no-one > would treat as being the introduction to a char class. > > This is also, I believe (Chet can confirm, or refute, if he desires) where > bash gets the interpretation that "lower" (including the quotes) is the > name of the char class in [:"lower":] except that it cannot be, as char > class names cannot contain quote characters (which should lead to the > whole sub-expression not being treated as a char class at all, instead > bash treats it, I think, as if it were an unknown but valid class name). > > But when it comes from sh, quote chars are "different" and instead of > just being characters, they instead affect the interpretation of the > characters that are quoted. See XCU 2.2: > > Quoting is used to remove the special meaning of certain characters > or words to the shell. > > Quoting can be used to preserve the literal meaning of the special > characters in the next paragrapyh [...] > > and the following may need to be quoted under certain > circumstances. > That is, these characters may be special depending on conditions > described elsewhere in this volume of POSIX.1-2017: > > * ? [ # ~ = % > > to which more chars have been added (as I recall) recently by some > Austin Group correction (which I think includes ! : - and ]), that is > to make it clear, that in sh > > [a'-'z] > > is a bracket expression containing 3 chars 'a' '-' and 'z' (which form > of quoting is used to remove the specialness of the '-' is irrelevant). > and that "[a-z]" isn't a bracket expression at all (neither of which > is true in an RE - though the role of \ in RE's is being altered slightlty > so if it had been [a\-z] in a RE things are less clear.) > > The effect of this is that in sh, in an expression like > > [![":lower":]] > > the first ':' is not "special" and hence
Re: quote removal issues within character class
On 2019/11/09 04:49, Robert Elz wrote: > There's also > > The special characters '.', '*', '[', and '\\' > (, , , and , > respectively) shall lose their special meaning within a bracket > expression. > Is this really what the standard says, because '\\' is not a character, but 2 characters. They could use "\\" but if a backslash is between single quotes, it loses its special meaning. The only way to get a backslash when using single quotes that I've found is to end the single-quote then use the backslash. So if you wanted to insert single quotes in a string that is single-quoted, you would have to do this: 'this is a single-quote(SQ) quoted string using a SQ ('\'') within the single quote.' Alternatively using double-quotes or another quoting mechanism might be preferable. I find it odd that the standard would try to use a backslash within a SQ'd string as a literalizer.
Re: quote removal issues within character class
Date:Sat, 9 Nov 2019 07:35:16 +0300 From:=?UTF-8?B?T8SfdXo=?= Message-ID: | is correct, as "foo" does not contain a ']' which would be required | > to match there (quoting the ':' means there is no character class, | > hence we have instead (the negation of) a char class containing '[' ':' | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and | > followed by ']' and anything. foo does not match. f]oo would. | > | | where exactly is this documented in the standard? I'm not sure which part exactly you're looking for, but char sets in sh are specified to be the same as in REs, except that ! replaces ^ as the negation character (that's in XCU 2.13.1). Char sets (bracket expressions) in RE's are documented in XBD 9.3.5 wherein it states A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions. The (']') shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial ('^'), if any). Otherwise, it shall terminate the bracket expression, That is, a ']' that occurs anywhere else terminates the bracket expression except: unless it appears in a collating symbol (such as "[.].]") (not relevant in the given example) or is the ending for a collating symbol, equivalence class, or character class. So the ']' that immediately follows the second ':' would not terminate the bracket expression if it is the ending ']' for a character class (collating symbols and equiv classes not being relevant to the example). Of course, that can only happen if there is a character class to end. There's also The special characters '.', '*', '[', and '\\' (, , , and , respectively) shall lose their special meaning within a bracket expression. whereupon if the [": sequence does not start a char class, the '[' there is simply a literal char inside the bracket expression. Similarly if the bracket expression ends at the first ']' (the one imediately after the second ':') the following ']' is simply a literal character, as ']' chars are special only when following a '['. So, all that's left to determine is whether the [": sequence can be considered as beginning a char class. In a RE it certainly cannot - quote chars (' and ") are not special in REs at all, and [": is no different syntatically than [x: which no-one would treat as being the introduction to a char class. This is also, I believe (Chet can confirm, or refute, if he desires) where bash gets the interpretation that "lower" (including the quotes) is the name of the char class in [:"lower":] except that it cannot be, as char class names cannot contain quote characters (which should lead to the whole sub-expression not being treated as a char class at all, instead bash treats it, I think, as if it were an unknown but valid class name). But when it comes from sh, quote chars are "different" and instead of just being characters, they instead affect the interpretation of the characters that are quoted. See XCU 2.2: Quoting is used to remove the special meaning of certain characters or words to the shell. Quoting can be used to preserve the literal meaning of the special characters in the next paragrapyh [...] and the following may need to be quoted under certain circumstances. That is, these characters may be special depending on conditions described elsewhere in this volume of POSIX.1-2017: * ? [ # ~ = % to which more chars have been added (as I recall) recently by some Austin Group correction (which I think includes ! : - and ]), that is to make it clear, that in sh [a'-'z] is a bracket expression containing 3 chars 'a' '-' and 'z' (which form of quoting is used to remove the specialness of the '-' is irrelevant). and that "[a-z]" isn't a bracket expression at all (neither of which is true in an RE - though the role of \ in RE's is being altered slightlty so if it had been [a\-z] in a RE things are less clear.) The effect of this is that in sh, in an expression like [![":lower":]] the first ':' is not "special" and hence cannot form part of the magic opening '[:' sequence for a character class. Hence this expression contains no character class, and consequently the ':]' chars are simply a ':' in the bracket expression, and then the terminating ']' - which leaves the second ']' being just a literal character. While here (these following parts are not relevant to your question I believe) when used in sh [[:"lower":]] should be treated just the same as [[:lower:]] for the same reason that ["abc"]
Re: quote removal issues within character class
is correct, as "foo" does not contain a ']' which would be required > to match there (quoting the ':' means there is no character class, > hence we have instead (the negation of) a char class containing '[' ':' > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and > followed by ']' and anything. foo does not match. f]oo would. > where exactly is this documented in the standard?
Re: quote removal issues within character class
Date:Sat, 9 Nov 2019 00:50:52 +0300 From:=?UTF-8?B?T8SfdXo=?= Message-ID: These two | v=foo | echo ${v#[[:"lower":]]} | case foo in (*[![:"lower":]]*) echo bar; esac are because bash believes that the character class name must not be quoted (which is likely to be clarified to be incorrect in the next revision of posix). This one | case foo in (*[![":lower":]]*) echo bar; esac is correct, as "foo" does not contain a ']' which would be required to match there (quoting the ':' means there is no character class, hence we have instead (the negation of) a char class containing '[' ':' 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and followed by ']' and anything. foo does not match. f]oo would. kre
quote removal issues within character class
v=foo echo ${v#[[:"lower":]]} should print oo, but it prints foo instead. This is reproducible on bash >4.4 Plus case foo in (*[![:"lower":]]*) echo bar; esac prints bar, while case foo in (*[![":lower":]]*) echo bar; esac doesn't print anything. And this is only reproducible on bash >5.0