Re: Should shell quoting within glob bracket patterns be effective?
Geoff Clare wrote, on 16 Apr 2018: > > Robert Elz wrote, on 13 Apr 2018: > > > > Date:Fri, 13 Apr 2018 15:07:07 +0100 > > From:Geoff Clare > > > > | For those the only difference from REs is the '^' -> '!' one, > > > > Not for fnmatch() which can have \ to escape characters (anywhere > > according to its description, which would include in bracket expressions, > > as that is not excluded. > > Clearly the statement in XBD 9.3.5: > > The special characters '.', '*', '[', and '\\' ( , > , , and , respectively) > shall lose their special meaning within a bracket expression. > > is intended to apply to backslashes in fnmatch(), just as it does to > the special meaning of backslash stated in XCU 2.13.1 (which also > doesn't mention an exception for bracket expressions). > > The whole point of adding fnmatch() to the standard was to provide a > a function which implements XCU 2.13, so any interpretation of the > standard which has backslash being treated differently in fnmatch() > (without FNM_NOESCAPE) than in XCU 2.13 cannot be correct. I tested some implementations of fnmatch() using the program below. Solaris and HP-UX do not treat backslash as special in bracket expressions. MacOS and Linux (glibc) DO treat backslash as special in bracket expressions. However, in both cases this behaviour is inconsistent with the behaviour of find -name on the same system, and so should be considered to be a bug in fnmatch() for those implementations. Conclusion: the new description of backslash handling for fnmatch() in the resolution of bug 985 is correct and should remain as it is. #include #include int main(void) { int ret; ret = fnmatch("[a\\-c]", "b", 0); printf("[a\\-c], b, 0: return %d\n", ret); ret = fnmatch("[a\\-c]", "-", 0); printf("[a\\-c], -, 0: return %d\n", ret); } -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Geoff Clare wrote: > Clearly the statement in XBD 9.3.5: > > The special characters '.', '*', '[', and '\\' ( , > , , and , respectively) > shall lose their special meaning within a bracket expression. > > is intended to apply to backslashes in fnmatch(), just as it does to > the special meaning of backslash stated in XCU 2.13.1 (which also > doesn't mention an exception for bracket expressions). It seems that everybody agrees that [a-z] should behave different to ["a-z"]. How this is implemented in the shell is not mentioned in POSIX. It seems however that people tend to use a prepended '\\' in strings to mark quoted characters in shell internal strings. > The whole point of adding fnmatch() to the standard was to provide a > a function which implements XCU 2.13, so any interpretation of the > standard which has backslash being treated differently in fnmatch() > (without FNM_NOESCAPE) than in XCU 2.13 cannot be correct. If the intention was not to add a new interface but to add an interface that could be used to give the same results as seen in the shell, then I would expect fnmatch() to honor backslashes in [..] constructs as long as FNM_NOESCAPE is not in effect. > While quoting it here, I just noticed that this statement also has > another issue when being read in the context of XCU 2.13.1: it should > refer to '?' losing its special meaning instead of '.'. I'll update > my proposed change in bug 1190 to address that. It may be that the original intention was not to enforce people to implement shell internal quoting by using prepended '\\' characters in the strings that are used internally after tokenization in the parser. In case that a different mechanism is used, it would need a different implementation in fnmatch() as well. I am not aware of a shell implementation that today uses a different method, so implementing backslash based quoting in fnmatch() seems to be the obvious method to recreate the behavior of the shell. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 13 Apr 2018: > > Date:Fri, 13 Apr 2018 15:07:07 +0100 > From:Geoff Clare > > | For those the only difference from REs is the '^' -> '!' one, > > Not for fnmatch() which can have \ to escape characters (anywhere > according to its description, which would include in bracket expressions, > as that is not excluded. Clearly the statement in XBD 9.3.5: The special characters '.', '*', '[', and '\\' ( , , , and , respectively) shall lose their special meaning within a bracket expression. is intended to apply to backslashes in fnmatch(), just as it does to the special meaning of backslash stated in XCU 2.13.1 (which also doesn't mention an exception for bracket expressions). The whole point of adding fnmatch() to the standard was to provide a a function which implements XCU 2.13, so any interpretation of the standard which has backslash being treated differently in fnmatch() (without FNM_NOESCAPE) than in XCU 2.13 cannot be correct. While quoting it here, I just noticed that this statement also has another issue when being read in the context of XCU 2.13.1: it should refer to '?' losing its special meaning instead of '.'. I'll update my proposed change in bug 1190 to address that. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Date:Fri, 13 Apr 2018 15:07:07 +0100 From:Geoff Clare Message-ID: <20180413140707.GB19570@lt2.masqnet> | Bracket expressions are not only used in REs and the shell. There are | also fnmatch(), glob(), find and pax to consider, where shell quoting | does not apply. They are used by glob (in the generic sense) and by REs (differently). All the other examples you cite are glob patterns, and all refer to the sh implementation. Sure the quoting needs to be made clear, but none of this needs to in any way impact upon REs or baracket expressions in REs. | For those the only difference from REs is the '^' -> '!' one, Not for fnmatch() which can have \ to escape characters (anywhere according to its description, which would include in bracket expressions, as that is not excluded. The others just refer to XCU 2.13 and don't say what they expect in this regard from what I can tell. What's more, I'm not sure what they should say, I've never wanted to use quoting in a bracket expression, as I know how to use them wihout that, and just always do it that way. | It is true of glob patterns as used by fnmatch(), glob(), find and pax. It is certainly not true of fnmatch() unless it has the FNM_NOESCAPE flag is set - though, and for the others, as above, at least according to what the standard says, I just don't know. True only sh uses quotation marks as quoting methods, but that can be handled separately (indeed it must be, however things are combined together.) kre
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 13 Apr 2018: > > Date:Fri, 13 Apr 2018 12:04:51 +0100 > From:Geoff Clare > > | In the case of , this does not make clear that it is only > | referring to the RE-and-shell-pattern-matching special meaning of > | and does not affect its shell-quoting special meaning. > > This gets kind of messy, because XBD 9 is all about regular > expressions, and the shell has none of those. > > I believe that the right solution is just to remove the reference to XBD 9.3.5 > from XCU 2.13 and instead define how character classes work for the > shell.Do that and we can get all of the quoting rules correct - and it > just costs an extra page or so (most of the text can start out by a cut > and paste.) Bracket expressions are not only used in REs and the shell. There are also fnmatch(), glob(), find and pax to consider, where shell quoting does not apply. For those the only difference from REs is the '^' -> '!' one, which is why it makes sense to refer to 9.3.5 with a statement about that difference. My proposed update to bug 985 (in note 3948) I think deals with the addition of shell quoting considerations in a reasonably readable manner without needing to duplicate 9.3.5. > I know it is irritating to duplicate text, and if they were truly the same, > I would not advocate it, but glob patterns and RE patterns are just > different - only the char classes look kind of similar (and even there we > need to do the '^' -> '!' substitution) but aren't really. In an RE class > the only way to get a literal '-' is to make it first (after ^ iif it is > there) > or last. That's not true of glob patterns, ... It is true of glob patterns as used by fnmatch(), glob(), find and pax. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 13 Apr 2018: > > I think we have had enough of this topic, so I will not continue it > after this message, but... > > | I maintain that the requirements of 2.2.3 are indeed universal. > > If that's true, then surely those words must be read in conjunction with > what the initial paragraph of 2.2 says ... > > Quoting is used to remove the special meaning of certain characters > or words to the shell. > Quoting can be used to preserve the literal meaning of the special > characters in the next paragraph [continues about reserved words etc.] > > The "special characters in the next paragraph" are ... > > | & ; < > ( ) $ ` \ " ' > > and sometimes, where "depending on conditions described elsewhere" > > * ? [ # ~ =% > > Note that '-' is not in the list anywhere. If we read that literally, it is > saying that quoting is not intended to remove any special meaning of > characters other than the ones listed, which includes '-', which I would > submit means that if you want to have quotes remove the special meaning > of '-' in char classes in glob expressions, it needs to be explicitly stated. Thank you for spotting this. It looks to be an editorial oversight in 2.2. (I think the purpose of that introductory text is to warn shell script writers about which characters they need to think about quoting if they want them to be treated literally.) The lack of '-' in 2.2 doesn't change the requirements of 2.2.3, since 2.2.3 says "all characters", not "the characters listed in 2.2". -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Date:Fri, 13 Apr 2018 11:51:12 +0100 From:Geoff Clare Message-ID: <20180413105112.GA16858@lt2.masqnet> I think we have had enough of this topic, so I will not continue it after this message, but... | I maintain that the requirements of 2.2.3 are indeed universal. If that's true, then surely those words must be read in conjunction with what the initial paragraph of 2.2 says ... Quoting is used to remove the special meaning of certain characters or words to the shell. Quoting can be used to preserve the literal meaning of the special characters in the next paragraph [continues about reserved words etc.] The "special characters in the next paragraph" are ... | & ; < > ( ) $ ` \ " ' and sometimes, where "depending on conditions described elsewhere" * ? [ # ~ =% Note that '-' is not in the list anywhere. If we read that literally, it is saying that quoting is not intended to remove any special meaning of characters other than the ones listed, which includes '-', which I would submit means that if you want to have quotes remove the special meaning of '-' in char classes in glob expressions, it needs to be explicitly stated. kre
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote: > I know it is irritating to duplicate text, and if they were truly the same, > I would not advocate it, but glob patterns and RE patterns are just > different - only the char classes look kind of similar (and even there we > need to do the '^' -> '!' substitution) but aren't really. In an RE class > the only way to get a literal '-' is to make it first (after ^ iif it is > there) > or last. That's not true of glob patterns, perhaps just by accident of > that implementation in the Bourne sh (I do not recall quoting being > possible to enter a literal '-' in 6th edition sh glob patterns but my > memory might be lacking) - perhaps deliberate, I have no idea. The 6th edition glob command did not implement escaping via '\\' at all. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Date:Fri, 13 Apr 2018 12:04:51 +0100 From:Geoff Clare Message-ID: <20180413110451.GA17286@lt2.masqnet> | In the case of , this does not make clear that it is only | referring to the RE-and-shell-pattern-matching special meaning of | and does not affect its shell-quoting special meaning. This gets kind of messy, because XBD 9 is all about regular expressions, and the shell has none of those. I believe that the right solution is just to remove the reference to XBD 9.3.5 from XCU 2.13 and instead define how character classes work for the shell.Do that and we can get all of the quoting rules correct - and it just costs an extra page or so (most of the text can start out by a cut and paste.) I know it is irritating to duplicate text, and if they were truly the same, I would not advocate it, but glob patterns and RE patterns are just different - only the char classes look kind of similar (and even there we need to do the '^' -> '!' substitution) but aren't really. In an RE class the only way to get a literal '-' is to make it first (after ^ iif it is there) or last. That's not true of glob patterns, perhaps just by accident of that implementation in the Bourne sh (I do not recall quoting being possible to enter a literal '-' in 6th edition sh glob patterns but my memory might be lacking) - perhaps deliberate, I have no idea. It is possible that the i18n parts of the char class spec could be moved out of XBD 9.3.5 and into a section of their own (somewhere) and then referred to by 9.3.5 and XCU 2.13, but that would be a fairly big change (I mean the internal [= and [: type stuff - I think that's all the same in glob and RE char classes, most probably as it is all recently added - comparatively recently anyway.) kre
Re: Should shell quoting within glob bracket patterns be effective?
I wrote, on 13 Apr 2018: > > Note that whilst extra text is not *needed* regarding quoting inside > bracket expressions, I would have no objection to some sort of explanatory > note being added to lessen the chances that readers fail to realise that > the quoting rules still apply inside bracket expressions. I have spotted one problematic piece of text where such a note would be beneficial. It's in XBD 9.3.5 item 1: The special characters '.', '*', '[', and '\\' (, , , and , respectively) shall lose their special meaning within a bracket expression. In the case of , this does not make clear that it is only referring to the RE-and-shell-pattern-matching special meaning of and does not affect its shell-quoting special meaning. I will file a separate bug report for this. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 13 Apr 2018: > > That is, your comment that the text in 2.2.3 which says "shall preserve > the literal value..." is not universal throughout the spec as you implied. I maintain that the requirements of 2.2.3 are indeed universal. > If it doesn't always apply, then we need extra text to say in each case > where it matters, whether it applies or not I disagree. If a general rule doesn't always apply then extra text is only needed in each case where it does not apply. That text already exists in 2.2.3 (where the general rule is "all enclosed characters are literal" and the exceptions to that are explicitly stated). Note that whilst extra text is not *needed* regarding quoting inside bracket expressions, I would have no objection to some sort of explanatory note being added to lessen the chances that readers fail to realise that the quoting rules still apply inside bracket expressions. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Date:Fri, 13 Apr 2018 09:32:40 +0100 From:Geoff Clare Message-ID: <20180413083240.GA14937@lt2.masqnet> | Neither of your examples is valid because the standard already explicitly | describes the behaviour in those cases. Sorry, but those sections have nothing whatever to do with the point I was making. That is, that to process "$(...)" you have to both take the '(' literally, not as an operator, and also treat it as a syntax character (part of the $( combination). That is, your comment that the text in 2.2.3 which says "shall preserve the literal value..." is not universal throughout the spec as you implied. If it doesn't always apply, then we need extra text to say in each case where it matters, whether it applies or not - the $( (etc) cases are handled (I am not suggesting anything is wrong with them) but the ["$var"] case is not - it just needs to be made explicit what is to happen when the text for that is rewritten. There are really not all that many places where quoting actually makes a difference, and I think most of those are already handled (there are words like "except when quoted" or "an unquoted ...") it just happens that quoting and patterns is not really specified - and particularly character classes - again I suspect because of the reference to XBD 9.3.5 in which there is no quoting at all, and hence obviously no need to say what happens. kre ps: and we really do need to add some text to say what "matched" means in the context of the parameter expansion substring operators.
Re: Should shell quoting within glob bracket patterns be effective?
"Schwarz, Konrad" wrote: > As the Bourne Shell source code posted earlier showed, that implementation > did not clearly separate the phases: a character with its high-bit set was > quoted for all further purposes. This worked with 7-bit ASCII. As we have an 8-bit clean Bourne Shell since 1986 (SysVr3), this was replaced by a '\\' prefix char. The code is now more complex, but the behavior is basically the same. Yes, a character is passed through the shell with the quoting intact until it calls "trim()" to remove this quoting. Where this happens influences the behavior. People who implement shells usually control this by checking whether the shell behaves as expected. Whether the POSIX standard always mentions "quote removal" at the right location was not yet verified as this would need a shell that was implemented only from reading the POSIX standard. The reason for modifications in the POSIX standard with respect to the shell is to correct the current wording to follow the expectations of the users and the behavior of the reference shell. Unfortunately, both POSIX and the reference shell have bugs. This is why we need to carefuly disuss issues with the shell. For our discussion, ksh93 seems to misbehave with respect to quoting, while ksh88 seems to missbehave with respect to honoring quoting in the pattern matcher. I believe we all agree that [a-c] and ["a-c"] should behave different. Since the only difference in ["a-c"] is quoting, is is obvious hat the shell needs to honor quoting inside character classes as well in order to give a different behavior for [a-c] and ["a-c"]. The way the quoting and the pattern matcher has to be implemented depends on the expectations of the users. My impression is that we agree on how both patterns should behave, so we just need to find a wording that matches the expectations. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote: > Date:Thu, 12 Apr 2018 12:10:04 -0700 > From:Don Cragun > Message-ID: > > | The fact that the $ is special is what is the key. > > The problem is in the interpretation of just what "treated literally" means. > > If it just means that "the character is itself and is not transformed into > something else" that's fine, the special $ inside the "" (which is not > treated literally, so it remains the introducer of various expansions) > can then look at the following character, see it is a '{' (untransformed) > and then go on and implement parameter expansion as you describe > -- and the various other characters that have meaning in that, > such as ':' '-' '+' '?' '%' '#' '=') which (being treated litterally all > represent themselves) can be part of the syntax. $(cmd) vs "$(cmd)" is not parsed differently but just results in different treatment of the results. Strings enclosed inside '"' are not passed as quoted but treated differently during parameter expansion when the parser in that unit sees the '"'. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 13 Apr 2018: > > I have also just realised that a better example than ${ in "" would have been > $( inside "". > > There, because of the double quotes, the '(' is treated literally, as a '(' > character, and not as the '(' operator. But still when the command > substitition (inside the "") is performed, the '(' is available to be part > of the syttax, and is no longer treated literally at all. > > Put that reasoning into the argument in the previous message, instead > of the ${ version and I think it becomes clearer how the current text > allows the '-' inside "a-c" to be treated literally, as the string is parsed > (not that that one would be treated differently anyway, any more than > the '{' would be in the ${ form) but still have its special meaning in > character ranges, just as the '(' (or '{' retains its special meaning in the > expansions. Neither of your examples is valid because the standard already explicitly describes the behaviour in those cases. See 2.2.3 Double-Quotes in the part about : The input characters within the quoted string that are also enclosed between "$(" and the matching ')' shall not be affected by the double-quotes, ... Within the string of characters from an enclosed "${" to the matching '}', an even number of unescaped double-quotes or single-quotes, if any, shall occur. A preceding character shall be used to escape a literal '{' or '}'. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
RE: Should shell quoting within glob bracket patterns be effective?
> -Original Message- > From: Robert Elz [mailto:k...@munnari.oz.au] > Put that reasoning into the argument in the previous message, instead > of the ${ version and I think it becomes clearer how the current text > allows the '-' inside "a-c" to be treated literally, as the string is > parsed (not that that one would be treated differently anyway, any more than > the '{' would be in the ${ form) but still have its special meaning in > character ranges, just as the '(' (or '{' retains its special meaning in the > expansions. This is probably Captain Obvious speaking, but in the C standard, I find the concept of -- sequentially applied -- Translation Phases (i.e., Trigraph elimination, backslash newline elimination, preprocessing tokenization, preprocessing, ...) quite illuminating. As the Bourne Shell source code posted earlier showed, that implementation did not clearly separate the phases: a character with its high-bit set was quoted for all further purposes. Perhaps something similar to translation phases could help here. Konrad
Re: Should shell quoting within glob bracket patterns be effective?
I have also just realised that a better example than ${ in "" would have been $( inside "". There, because of the double quotes, the '(' is treated literally, as a '(' character, and not as the '(' operator. But still when the command substitition (inside the "") is performed, the '(' is available to be part of the syttax, and is no longer treated literally at all. Put that reasoning into the argument in the previous message, instead of the ${ version and I think it becomes clearer how the current text allows the '-' inside "a-c" to be treated literally, as the string is parsed (not that that one would be treated differently anyway, any more than the '{' would be in the ${ form) but still have its special meaning in character ranges, just as the '(' (or '{' retains its special meaning in the expansions. kre
Re: Should shell quoting within glob bracket patterns be effective?
Date:Thu, 12 Apr 2018 12:10:04 -0700 From:Don Cragun Message-ID: | The fact that the $ is special is what is the key. The problem is in the interpretation of just what "treated literally" means. If it just means that "the character is itself and is not transformed into something else" that's fine, the special $ inside the "" (which is not treated literally, so it remains the introducer of various expansions) can then look at the following character, see it is a '{' (untransformed) and then go on and implement parameter expansion as you describe -- and the various other characters that have meaning in that, such as ':' '-' '+' '?' '%' '#' '=') which (being treated litterally all represent themselves) can be part of the syntax. On the other hand, if "treated literally" means "is itself and can have no meaning or other interpretation other than being the character itself" then that '{' must just be a '{' that is part of the string, and not be co-opted into being part of the sh syntax for parameter expansions. I am assuming here that the first interpretation is the desirable one. Given that, then the "treated literally" '-' in the double quoted (or for that matter single quoted) string inside a character class, is just a '-' character, but that can still (as the '{' was in the parameter expansion) be used as a syntax character in the class - indicating the range, whether it was double quoted or not. On the other hand, if you want the '-' to always represent the character '-' itself, and not be part of the range expression, and you want to produce that result just from the words "treated literally", you have to define "treated literally" in the second form, and in that case, the '{' in the "${..." form cannot be the '{' that is part of the parameter expansion. It cannot be both ways. But let me be clear about something - my point here is not to argue for changes to all of the shells to meet some bizarre interpretation of the specification, it is that the text in the standard needs to be improved, and be explicit about things like this. That it is possible for me to make an even semi-plausible argument in this way means the text does not state the intentions nearly clearly enough, and needs to be made much more precise. There is a temptation for those who know what it should mean to read the text, and see that it can mean what it should mean (and perhaps not even notice that things could be interpreted differently) and then be happy that all is OK.But when read by someone who has no idea what the desired outcome is, and has only the words to reply upon, we must be sure that there is no room at all for misinterpretation. This is why standards are generally very dry, hard to read, and boring documents - they need to be precise about every little detail (and yes, saying something is unspecified or undefined is precise, provided it is clear what the "something" is). When 985 is revisited, and the wording for how pattern matching is done gets revised, just specify all of this precisely - it does not need to be in the form of some algorithm to achieve the desired result (although that is one method) but it must make it absolutely clear what the desired result is, for every possible input. kre
Re: Should shell quoting within glob bracket patterns be effective?
> On Apr 12, 2018, at 8:07 AM, Robert Elz wrote: > >Date:Thu, 12 Apr 2018 14:25:32 +0100 >From:Geoff Clare >Message-ID: <20180412132532.GA9483@lt2.masqnet> > > | It treats them as literal characters, just as 2.2.3 says. > > I thought that might have been the response, in that case in > > "${xxx}" > > The '{' has to be treated as a literal character, as inside double > quotes, and not being one of the magic few, that's what the text > you quoted says, and apparently, everywhere else in the shell > is supposed to follow that same interpretation. > > That is, the '{' above cannot be treated the same as the one in > > ${xxx} > > (unquoted) where it is a part of the syntax of the variable expansion, > because then it would not be being treated literally. > > Which way do you want it? > > kre The fact that the $ is special is what is the key. Since $ is special and parameter expansion and command substitution are performed inside double-quotes, sections 2.6.2 and 2.6.3 come into play... and that is where {, #, ##, %, %%, and } in parameter expansions may become special and where ( and ) may become special in command substitutions, respectively. Cheers, Don
Re: Should shell quoting within glob bracket patterns be effective?
Date:Thu, 12 Apr 2018 14:25:32 +0100 From:Geoff Clare Message-ID: <20180412132532.GA9483@lt2.masqnet> | It treats them as literal characters, just as 2.2.3 says. I thought that might have been the response, in that case in "${xxx}" The '{' has to be treated as a literal character, as inside double quotes, and not being one of the magic few, that's what the text you quoted says, and apparently, everywhere else in the shell is supposed to follow that same interpretation. That is, the '{' above cannot be treated the same as the one in ${xxx} (unquoted) where it is a part of the syntax of the variable expansion, because then it would not be being treated literally. Which way do you want it? kre
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 12 Apr 2018: > > Date:Thu, 12 Apr 2018 13:25:01 +0100 > From:Geoff Clare > > | Yes there is. I quoted it earlier in this thread. > > I know that, but that's useless for this purpose. We know the quoted > (part of) the string is treated literally, and handed to the pattern matching > code, > exactly as is, with no conversions performed upon it. > > But now what is the pattern matching code supposed to do with that > string? It treats them as literal characters, just as 2.2.3 says. > Where does it say (aside from the 985 resolution) that those characters > mean anything different in a pattern than they would in a pattern given > anywhere else? The statement "shall preserve the literal value of all characters" in 2.2.3 is sufficient. > Eg: if I do > > find . -name '[a-z]*' -print > > are you suggesting that because that '[' '-' and '*' are qoted they are not to > be given their normal pattern (class, range and "all") meanings ? The shell passes the literal characters [a-z]* to find. What find does with those is specified in the description of find. > Obviously not. > > What about > > find . -name '"[a-z]*"' -print? The shell passes the literal characters "[a-z]*" to find. What find does with those is specified in the description of find. > So in > > ls "[a-z]*" > > given that quote removal is not performed before the filename expansion, > exactly what text in the standard says the quotes should be treated > differently > in this than in the 2nd find example above? The shell passes the literal characters [a-z]* to ls. What ls does with those is specified in the description of ls. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Date:Thu, 12 Apr 2018 13:25:01 +0100 From:Geoff Clare Message-ID: <20180412122501.GA8783@lt2.masqnet> | Yes there is. I quoted it earlier in this thread. I know that, but that's useless for this purpose. We know the quoted (part of) the string is treated literally, and handed to the pattern matching code, exactly as is, with no conversions performed upon it. But now what is the pattern matching code supposed to do with that string? Where does it say (aside from the 985 resolution) that those characters mean anything different in a pattern than they would in a pattern given anywhere else? Eg: if I do find . -name '[a-z]*' -print are you suggesting that because that '[' '-' and '*' are qoted they are not to be given their normal pattern (class, range and "all") meanings ? Obviously not. What about find . -name '"[a-z]*"' -print? There quotes get handed to find as part of the arg, but quotes mean nothing to pattern matching normally, so this one should look for file names that begin and end with double quote characters, and have a lower-case alpha as the first character after the leading ". Right? So in ls "[a-z]*" given that quote removal is not performed before the filename expansion, exactly what text in the standard says the quotes should be treated differently in this than in the 2nd find example above? Note in all of this I am not questioning what should be done, or what shells actually do, but rather how I work out from the text in the standard what should be done. kre
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 12 Apr 2018: > > Date:Thu, 12 Apr 2018 11:39:11 +0100 > From:Geoff Clare > > | Huh? The '-' is quoted by the double quotes and should therefore be > | treated literally. > > The problem is that there is nothing in either TC2 or TC2 + 985-fix that > says that should happen. Yes there is. I quoted it earlier in this thread. 2.2.3 Double-Quotes Enclosing characters in double-quotes ("") shall preserve the literal value of all characters within the double-quotes, with the exception of the characters backquote, , and -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Date:Thu, 12 Apr 2018 11:39:11 +0100 From:Geoff Clare Message-ID: <20180412103911.GA6656@lt2.masqnet> | Huh? The '-' is quoted by the double quotes and should therefore be | treated literally. The problem is that there is nothing in either TC2 or TC2 + 985-fix that says that should happen. And without that "should" is really just wishing (based upon what shells actually do, or most of them). The issue is how to specify it so that everything works correctly, for all the cases of sh pattern matching, and for the other users of fnmatch() Ideally: find dir -name 'pattern' -print should list the same filenames (in a different order/format) as ls dir/pattern lists, for all possible patterns (temporarily ignoring leading dot issues, if there are any), and ls dir | while read f do case "$f" in (pattern) printf '%s\n' "$f";; esac; done should (again, ignoring '.' issues for now) print the same list. kre
Re: Should shell quoting within glob bracket patterns be effective?
Date:Thu, 12 Apr 2018 12:10:20 +0200 From:Joerg Schilling Message-ID: <5acf308c.yoyva4vzwwu8t7jp%joerg.schill...@fokus.fraunhofer.de> Jörg: | Since '' and "" quoting in the shell is highly complex and no longer present at | the time the shell pattern matching is called, That's not correct (well, "highly complex" is reasonable) at least according to the standard (rather than how things might be implemented in any particular implementation) In filenames, the order is tilde expansion, (field splitting is irrelevant for present purposes), parameter expansion (and its companions) filename expansion, and finally, quote removal.See "The order of word expansions" and what follows in XCU 2.6. If it were not that way, then ls "*"* would not find files starting with a literal asterisk, but just all files. In case patterns the old (current) standard does no quote removal on the pattern at all - 985 tries to fix that but doesn't get it right. In parameter expansions, the % and # (and %% and ## of course) operators also happen before quote removal, so the pattern matching they do also still has the quote characters. Of course, for these ones the standard says nothing at all about what "matched by pattern" means and just assumes "you know it is a glob style match" and what that means (and we all do it by comparing results from other shells and hoping we haven't missed any weird cases...) Are there any other uses of patterns in (standard) sh? I can't remember any right now/ | it makes no sense to add '' and "" to fnmatch(). That might be true, but assuming that we want fnmatch() to produce the same results as sh does (given the correct flags to indicate what kind of match it should perform) we would need to be very specific about exactly how to translate a quoted shell string into a fnmatch pattern. | To understand quoting, let me explain how the Bourne Shell does it: Once again, this is (kind of) interesting but 100% irrelevant. What matters is what the standard says must be done, not how some implementation chooses to implement that. One thing the standard does not say that should be done is to convert one form of quoting into another form (ever, except for the 985 bug resolution I think.) Of course, provided the results are correct, it is fine to do that within an implementation (ash based shells do quoting a totally different way, but also not the posix "leave the quotes in the word") but it is unacceptable to assume that all other implementations must, or should, act that way - or even that their implementors would ever consider doing it that way. As long as posix says to leave the quotes in the word until quote removal, and as long as quote removal happens after pattern matching (or filename expansion for that case) the specification of the pattern matching algorithm must handle ' and " chars in the pattern. And if the pattern matching algorithm is just to be "call fnmatch() with the flags..." (etc) then fnmatch needs to handle them as well. Alternatively, the algorithm could be "convert quoted strings in the pattern as ..[to be completed].. and then call fnmatch() using the modified pattern, then fnmatch does not need to handle quotes. Which is better largely depends upon just how flexible we want the fnmatch() function to be - that is, must all callers deal with quoting (if their context allows that) somehow, before calling it ? What the standard specifies should however match what the implementations actually do (or at least most of them.) kre ps: it was interesting to see that the (ancient algol68 style) code fragment you sent in the earlier message did not handle a ']' as the first char of a class correctly (meaning a ']' in the class instead of being the ending delimiter). I don't remember ever encountering that issue back when I used that shell - of course wanting ']' in c char class is not common, so it is perhaps not too surprising. And wrt that message - for persent purposes, it would be better to run tests using case pattern matching rather than filename expansion - for filename expansion it is quite clear that quote removal happens after the pattern matching, so the shell is free to interpret the quote chars. For case patterns it is not so clear what should be done.
Re: Should shell quoting within glob bracket patterns be effective?
Geoff Clare wrote: > > > > > ["a\-c"] the backslash is not special and should be treated literally > > > > This string is converted into [\a\\-\c] by the shell macro expansion code. > > > > With the shell gmatch() code, this results in a match for 'a' and '\\' .. > > 'c'. > > Huh? The '-' is quoted by the double quotes and should therefore be > treated literally. It should match only 'a', backslash, '-' and 'c' > (and that's what I observe in bash, although ksh88 for some reason only > matches 'a', backslash and 'c' which looks like a bug). You are right, Sorry, I missed one backslash before the '-'. The resulting pattern is: [\a\\\-\c] Here is an updated script "tsh": - if [ "$BASH_VERSION" != "" ]; then echo() { command echo -e "$@"; } fi chk() { echo [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]; } mkdir td && cd td || exit printf '%s\n' '---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]' echo ": \c"; chk :> a; echo "a: \c"; chk; rm a :> b; echo "b: \c"; chk; rm b :> ./-; echo "-: \c"; chk; rm ./- :> c; echo "c: \c"; chk; rm c :> _; echo "_: \c"; chk; rm _ :> \\; echo "\\: \c"; chk; rm \\ :> d; echo "d: \c"; chk; rm d rm -f * cd .. rmdir td - and with: for i in sh ksh ksh93 bosh bash mksh dash; do echo; echo $i:; $i ./tsh; done you get this result: sh: ---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c] : [a-c] [a-c] [a\-c] [a-c] [a-c] a: a a a a a b: b [a-c] [a\-c] [a-c] [a-c] -: [a-c] - - - - c: c c c c c _: [a-c] [a-c] [a\-c] [a-c] [a-c] \: [a-c] [a-c] \ [a-c] [a-c] d: [a-c] [a-c] [a\-c] [a-c] [a-c] ksh: ---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c] : [a-c] [a-c] [a\-c] [a-c] [a-c] a: a a a a a b: b [a-c] [a\-c] [a-c] b -: [a-c] [a-c] [a\-c] [a-c] [a-c] c: c c c c c _: [a-c] [a-c] [a\-c] [a-c] [a-c] \: [a-c] \ \ \ \ d: [a-c] [a-c] [a\-c] [a-c] [a-c] ksh93: ---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c] : [a-c] [a-c] [a\-c] [a-c] [a-c] a: a a a a a b: b b b [a-c] [a-c] -: [a-c] [a-c] [a\-c] - - c: c c c c c _: [a-c] [a-c] [a\-c] [a-c] [a-c] \: [a-c] [a-c] \ [a-c] [a-c] d: [a-c] [a-c] [a\-c] [a-c] [a-c] bosh: ---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c] : [a-c] [a-c] [a\-c] [a-c] [a-c] a: a a a a a b: b [a-c] [a\-c] [a-c] [a-c] -: [a-c] - - - - c: c c c c c _: [a-c] [a-c] [a\-c] [a-c] [a-c] \: [a-c] [a-c] \ [a-c] [a-c] d: [a-c] [a-c] [a\-c] [a-c] [a-c] bash: ---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c] : [a-c] [a-c] [a\-c] [a-c] [a-c] a: a a a a a b: b [a-c] [a\-c] [a-c] [a-c] -: [a-c] - - - - c: c c c c c _: [a-c] [a-c] [a\-c] [a-c] [a-c] \: [a-c] [a-c] \ [a-c] [a-c] d: [a-c] [a-c] [a\-c] [a-c] [a-c] mksh: ---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c] : [a-c] [a-c] [a\-c] [a-c] [a-c] a: a a a a a b: b [a-c] [a\-c] [a-c] [a-c] -: [a-c] - - - - c: c c c c c _: [a-c] [a-c] [a\-c] [a-c] [a-c] \: [a-c] [a-c] \ [a-c] [a-c] d: [a-c] [a-c] [a\-c] [a-c] [a-c] dash: ---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c] : [a-c] [a-c] [a\-c] [a-c] [a-c] a: a a a a a b: b [a-c] b [a-c] [a-c] -: [a-c] - [a\-c] - - c: c c c c c _: [a-c] [a-c] _ [a-c] [a-c] \: [a-c] [a-c] \ [a-c] [a-c] d: [a-c] [a-c] [a\-c] [a-c] [a-c] So there is a new bug in "dash", as dash matches '_' for your example and '_' is inside the range '\\' .. 'c'. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Joerg Schilling wrote, on 12 Apr 2018: > > Geoff Clare wrote: > > > > Maybe, I should again mention history: > > > > > > - fmnatch() has been introduced with issue 4 (1995). It does not > > > seem to be related to a historic UNIX. Since the oldest known > > > implementation is from IBM, fnmatch() may have been introduced > > > by AIX. > > > > It was first standardised in POSIX.2-1992 and was invented by the developers > > of that standard. > > So fnmatch() could be seen as an artificial invention and there is no need to > have fnmatch() to behave the same as the shell. It performs filename/pathname pattern matching as done by find and pax. It is only the same as the shell when there is no shell quoting involved. > > You are conflating two different type of backslash escape. > > > > The shell should honour backslash when used as shell quoting, regardless > > of whether it is inside a bracket expression, but should not treat a > > backslash in a bracket expression *that is part of the pattern* (i.e. not > > shell quoting) as special. > > > > For example: > > > > [\"] the backslash quotes the " > > This string is converted to [\"] by the parser and there is no PS2 prompt. > > > > ["a\-c"] the backslash is not special and should be treated literally > > This string is converted into [\a\\-\c] by the shell macro expansion code. > > With the shell gmatch() code, this results in a match for 'a' and '\\' .. 'c'. Huh? The '-' is quoted by the double quotes and should therefore be treated literally. It should match only 'a', backslash, '-' and 'c' (and that's what I observe in bash, although ksh88 for some reason only matches 'a', backslash and 'c' which looks like a bug). -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Geoff Clare wrote: > > Maybe, I should again mention history: > > > > - fmnatch() has been introduced with issue 4 (1995). It does not > > seem to be related to a historic UNIX. Since the oldest known > > implementation is from IBM, fnmatch() may have been introduced > > by AIX. > > It was first standardised in POSIX.2-1992 and was invented by the developers > of that standard. So fnmatch() could be seen as an artificial invention and there is no need to have fnmatch() to behave the same as the shell. It would however be nive to be able to switch it into that mode (see my FNM_CLASSESC proposal. > You are conflating two different type of backslash escape. > > The shell should honour backslash when used as shell quoting, regardless > of whether it is inside a bracket expression, but should not treat a > backslash in a bracket expression *that is part of the pattern* (i.e. not > shell quoting) as special. > > For example: > > [\"] the backslash quotes the " This string is converted to [\"] by the parser and there is no PS2 prompt. > ["a\-c"] the backslash is not special and should be treated literally This string is converted into [\a\\-\c] by the shell macro expansion code. With the shell gmatch() code, this results in a match for 'a' and '\\' .. 'c'. So I guess that you missinterpret my text and the results from my test script. Let me add an updated version that includes a test for "-": mkdir td && cd td || exit :> a echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm a :> b echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm b :> ./- echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm ./- :> c echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm c :> _ echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm _ :> \\ echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm \\ :> d echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm d rm -f * cd .. rmdir td Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote: > And yes, in particular if a [a\-c] means a class with the three chars 'a' '-' > and 'c' in it in sh it should mean that in fnmatch() as well, or if that > pattern means a class with 8 chars (0x5c .. 0x63) with 'a' there 2 ways, > in fnmatch() then it should mean that in sh as well. My tests verify that all modern shells including ksh93 match three chars for [a\-c] > On the other hand, having sh allow '' and "" quoting in addition to \ quoting > while not supporting that in fnmatch() is possible using a technique like > that in what was intended to be the 985 resolution - just provided that it > handles all of the cases correctly. Since '' and "" quoting in the shell is highly complex and no longer present at the time the shell pattern matching is called, it makes no sense to add '' and "" to fnmatch(). To understand quoting, let me explain how the Bourne Shell does it: 1) the parser keeps \a and converts 'a' into \a 2) The parser retains " in strings 3) The interpreter calls the macro expansion code and this code replaces the extended strings inside "" by quote chars (e.g. "abc" into \a\b\c). 4) The file name globbing is done for command arguments and gmatch() is called for "case" statements, using the current state of the string that reaults in: \aq\o\oq\a\b\c for 'a'q'oo'q"abc" If you like to let fnmatch() match the behavior of the shell related to character classes, this could be cone using a new flag FNM_CLASSESC. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Joerg Schilling wrote, on 12 Apr 2018: > > It seems that we need to define how quoting works in a real shell > implementation. > > If we require the strings to be in the form \c\h\a\r in case of a quoted > string > at that specific part of the shell, we may explain how quoting works for > "case" > statements. > > Maybe, I should again mention history: > > - fmnatch() has been introduced with issue 4 (1995). It does not > seem to be related to a historic UNIX. Since the oldest known > implementation is from IBM, fnmatch() may have been introduced > by AIX. It was first standardised in POSIX.2-1992 and was invented by the developers of that standard. [...] > Now let us check the behavior of various shells with the following script: > [...] > As we can see: > > - The Bourne Shell interprets backshlash escapes > inside character classes. > > - All other (relevant) shells behave identically > except ksh88 and ksh93 > > - ksh88 does not honor backslashes inside a > character class. Since ksh93 changes this back to the > original Bourne Shell behavior, I would call it a bug. > > - ksh93 interprets ["a-c"] different from all > other shells, but again interprets backshlash escapes > inside character classes. > > I remember that I received a report from someone (maybe > Martijn Dekker or Thorsten Glaser) that ksh93 has problems > with " inside some expressions. > > I would call the single deviation seen in ksh93 a bug. > The reason for this other behavior does not seem to be related > to pattern matching but to the way quote removal has been > implemented. > > Conclusion: > > Since the behavior of fnmatch() is currently not able to match the behavior > of the shell matcher, I propose to add a new flag for fnmatch() to switch > it into the shell mode that honors backslashes inside character sets. You are conflating two different type of backslash escape. The shell should honour backslash when used as shell quoting, regardless of whether it is inside a bracket expression, but should not treat a backslash in a bracket expression *that is part of the pattern* (i.e. not shell quoting) as special. For example: [\"] the backslash quotes the " ["a\-c"] the backslash is not special and should be treated literally If you want fnmatch() to be able to work like the shell you would need the new flag to turn on all shell quoting (i.e. backslash, double-quotes and single-quotes), not just backslash. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Date:Thu, 12 Apr 2018 09:24:51 +0100 From:Geoff Clare Message-ID: <20180412082451.GA3949@lt2.masqnet> | Bug 985 moves the detail from the current 2.13 into the fnmatch() | description and makes 2.13 refer to fnmatch(). Oh - I did not read all of it all that carefully - just the actual descriptions of how it was to work. I see no problem with that approach though -- that was what I intended to say before I read the (old, TC2) fnmatch() page. Both the shell, and the function, should act the same - if they don't the function isn't nearly as useful. Given that having the description in one, and referring to it from the other seems appropriate, and I don't see that it matters much which way it is done. And yes, in particular if a [a\-c] means a class with the three chars 'a' '-' and 'c' in it in sh it should mean that in fnmatch() as well, or if that pattern means a class with 8 chars (0x5c .. 0x63) with 'a' there 2 ways, in fnmatch() then it should mean that in sh as well. On the other hand, having sh allow '' and "" quoting in addition to \ quoting while not supporting that in fnmatch() is possible using a technique like that in what was intended to be the 985 resolution - just provided that it handles all of the cases correctly. kre
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote: > Date:Wed, 11 Apr 2018 15:14:00 +0100 > From:Geoff Clare > Message-ID: <20180411141400.GA32463@lt2.masqnet> > > | I also have a feeling we will have to abandon the neat idea of defining > | shell pattern matching in terms of fnmatch(). > > Yes, but for a slightly different reason - fnmatch() doesn't describe how > the matching works, it just refers to XCU 2.13 for that info. What it > describes is a function that applications can call that does the same > kind of matching as the shell does. ... > Delete "quote removal" and in the description of how matching works, the > quoting characters can be made to mean what they should mean for > patterns - nothing needs to be "removed" here, as the pattern is just used > for matching, the only result is matched or not matched. Quoting just > affects > the interpretation of the quoted characters, and otherwise matches nothing. It seems that we need to define how quoting works in a real shell implementation. If we require the strings to be in the form \c\h\a\r in case of a quoted string at that specific part of the shell, we may explain how quoting works for "case" statements. Maybe, I should again mention history: - fmnatch() has been introduced with issue 4 (1995). It does not seem to be related to a historic UNIX. Since the oldest known implementation is from IBM, fnmatch() may have been introduced by AIX. - The historic Bourne Shell used it's own implementation in expand.c: case '[': {BOOL ok; INT lc; ok=0; lc=07; WHILE c = *p++ DO IF c==']' THENreturn(ok?gmatch(s,p):0); ELIF c==MINUS THENIF lc<=scc ANDF scc<=(*p++) THEN ok++ FI ELSEIF scc==(lc=(c&STRIP)) THEN ok++ FI FI OD return(0); } and this shows very obviously that [a\-c] is subject to quoting as otherwise the code needs to read: ELIF (c&STRIP)==MINUS since the 1977 Bourne Shell did pass "[a\334c]" to the matching function in expand.c if the command line was [a\-c]. - A classical AT&T based UNIX in the late-1980s did have a library "libgen" with a function gmatch() inside that behaves like the code above, but by understanding "[a\-c]" instead of "[a\334c]". Now let us check the behavior of various shells with the following script: mkdir td && cd td || exit :> a echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm a :> b echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm b :> _ echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm _ :> \\ echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm \\ :> d echo [a-c] ["a-c"] [\a\-\c] [a\-c] rm d rm -f * cd .. rmdir td --- This results in the following: Bourne Shell: a a a a b [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] ksh88: a a a a b [a-c] [a-c] b [a-c] [a-c] [a-c] [a-c] [a-c] \ \ \ [a-c] [a-c] [a-c] [a-c] ksh93: a a a a b b [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] bosh: a a a a b [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] bash: a a a a b [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] mksh: a a a a b [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] dash: a a a a b [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] [a-c] As we can see: - The Bourne Shell interprets backshlash escapes inside character classes. - All other (relevant) shells behave identically except ksh88 and ksh93 - ksh88 does not honor backslashes inside a character class. Since ksh93 changes this back to the original Bourne Shell behavior, I would call it a bug. - ksh93 interprets ["a-c"] different from all other shells, but again interprets backshlash escapes inside character classes. I remember that I received a report from someone (maybe Martijn Dekker or Thorsten Glaser) that ksh93 has problems with " inside some expressions. I would call the single deviation seen in ksh93 a bug. The reason for this other behavior does not seem to be related to pattern matching but to the way quote removal has been implemented. Conclusion: Since the behavior of fnmatch() is currently not able to match the behavior of the shell matcher, I propose to add a new flag for fnmatch() to switch it into the shell mode that honors backslashes inside character sets. Jörg -- EMail:jo...@schily.
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 12 Apr 2018: > > Date:Wed, 11 Apr 2018 15:14:00 +0100 > From:Geoff Clare > > | I also have a feeling we will have to abandon the neat idea of defining > | shell pattern matching in terms of fnmatch(). > > Yes, but for a slightly different reason - fnmatch() doesn't describe how > the matching works, it just refers to XCU 2.13 for that info. Bug 985 moves the detail from the current 2.13 into the fnmatch() description and makes 2.13 refer to fnmatch(). -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Date:Wed, 11 Apr 2018 15:14:00 +0100 From:Geoff Clare Message-ID: <20180411141400.GA32463@lt2.masqnet> | I also have a feeling we will have to abandon the neat idea of defining | shell pattern matching in terms of fnmatch(). Yes, but for a slightly different reason - fnmatch() doesn't describe how the matching works, it just refers to XCU 2.13 for that info. What it describes is a function that applications can call that does the same kind of matching as the shell does. Describing how matching works in terms of fnmatch() is just a convoluted path to get back to "it works like XCU 2.13 says", except that if that is in 2.13 itself, we have infinite recursion. All that is really gained is the ability to use the fnmatch() flags as a shorthand for their meanings, and that just isn'[t worth it. I think the real problem however is the reliance on XBD 9.3.5 - delete that, (the reference, not the section) and describe glob character classes. Delete "quote removal" and in the description of how matching works, the quoting characters can be made to mean what they should mean for patterns - nothing needs to be "removed" here, as the pattern is just used for matching, the only result is matched or not matched. Quoting just affects the interpretation of the quoted characters, and otherwise matches nothing. kre
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 11 Apr 2018: > > Date:Wed, 11 Apr 2018 12:18:27 +0100 > From:Geoff Clare > > | No, that text is very careful to say "*was* quoted", not "is quoted", > | for precisely this reason. To conform to this requirement, the shell > | has to remember which characters were quoted when it removes the quotes. > | How this is done is a matter for the implementor. > > Yes, I saw that, but was quoted when? > > Eg: > > var='!a' > eval 'case b in *["$var"]*) echo match;;esac' > > There in the case statement, everything "was" quoted once. > > So that means we now are required to convert the case pattern to > > \*\[\!\a\]\* > > does it? Good point. [...] > Beyond that, to get back to the example in the original message, once we > get past this "was quoted" stuff, we still need to deal with the later words > in the same sentence: > > and is not in a bracket expression is prefixed by a backslash > > That is, in the (approximately) original example > > case b in ["$var"]) ... > > the "was quoted" is irrelevant, either way, as this is in a bracket > expression, > and so the \ is not added, and we end up with > > case b in [!a]) ... > not > case b in [\!\a] > > and even if we somehow interpret XBD 9.3.5 as allowing the latter to mean > a literal ! and a literal a are in the class (which is beyond stretching the > language, it is downright breaking it) it does not matter, as that is not > what we get, we get the former. Ouch. Looks like we do need to revisit bug 985. I also have a feeling we will have to abandon the neat idea of defining shell pattern matching in terms of fnmatch(). I can't see any way to modify that new 2.13 text so that it describes the correct behaviour of quoted '!', '-', "[.", etc. in a bracket expression. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Date:Wed, 11 Apr 2018 14:14:18 +0200 From:Joerg Schilling Message-ID: <5acdfc1a.st0sitei1fsdsgeb%joerg.schill...@fokus.fraunhofer.de> | Then we should change the wording. I agree. | The characters '.', '*' and '[' really lose their special meaning inside a | character class. Yes. | The '\\' on the other side always allowed to escape the meaning of '-' and the | meaning of any other char, see the original code fragment from 1977: In sh glob expressions, yes, but not in classes in RE's. One of the issues is that the standard is trying too hard for consistency, and so rather than re-specify char classes for glob, it simply defers to char classes in REs, and because of that, gets all of this wrong. | All modern implementations I am aware of do something similar with explicit '\\' | chars in the string. Yes, I know - the question isn't what implementations do, or even should do, but what the standard says they should do. And how that is incorrect. | So the reason for the deviating behavior of ksh93 may be that it tries to | follow 9.3.5 that does not seem to be alighed with the Bourne Shell and ksh88. That very well may be. kre
Re: Should shell quoting within glob bracket patterns be effective?
Date:Wed, 11 Apr 2018 12:18:27 +0100 From:Geoff Clare Message-ID: <2018041827.GA29286@lt2.masqnet> | No, that text is very careful to say "*was* quoted", not "is quoted", | for precisely this reason. To conform to this requirement, the shell | has to remember which characters were quoted when it removes the quotes. | How this is done is a matter for the implementor. Yes, I saw that, but was quoted when? Eg: var='!a' eval 'case b in *["$var"]*) echo match;;esac' There in the case statement, everything "was" quoted once. So that means we now are required to convert the case pattern to \*\[\!\a\]\* does it? It really is not a good idea to try and craft minimal words that seem to achieve the desired result - "was quoted" is just too vague. Once again, after quote removal, nothing is quoted. When the code looked at the pattern, nothing was quoted. Or it all was quoted. Until you specify just what the "was" refers to. There is nothing in the text that actually requires the implementation to do what you suggest, because there's nothing to tell it how far back in time that "was quoted" really means. It might seem obvious to you, but obvious to you isn't the right solution. Beyond that, to get back to the example in the original message, once we get past this "was quoted" stuff, we still need to deal with the later words in the same sentence: and is not in a bracket expression is prefixed by a backslash That is, in the (approximately) original example case b in ["$var"]) ... the "was quoted" is irrelevant, either way, as this is in a bracket expression, and so the \ is not added, and we end up with case b in [!a]) ... not case b in [\!\a] and even if we somehow interpret XBD 9.3.5 as allowing the latter to mean a literal ! and a literal a are in the class (which is beyond stretching the language, it is downright breaking it) it does not matter, as that is not what we get, we get the former. kre
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote: > Date:Wed, 11 Apr 2018 11:28:38 +0200 > From:Joerg Schilling > Message-ID: > <5acdd546.fkln7tigk21a+de6%joerg.schill...@fokus.fraunhofer.de> > > | The problem is that the term "quote removal" is not related to a real > verified > | shell implementation but rather explained by means of abstract wording > that > | tries to avoid being too close to a real algorithm. > > Yes, I know that - and that's fine, provided what is specified actually works > (so if someone were to implement it exacty as described, everything would > work.) I am certainly not expecting that real, useful, implementations would > be done that way. Sometimes, I believe that it would help to understand the text if there was a real world example algorithm, e.g. what I mentioned on how Bourne Shell and ksh do it. > | In special: your example ["] does not work as your text might mean. > | > | echo ["] > > No, I know that - in my first message on this topic (I think) I made that > clear - [ and ] are just word characters to the lexer, and mean nothing > at all (no different tnan a (except they're not alpha) or _ or %. Quotes > need to be paired (must have a beginning and end), a better example > at that level would be > > echo [""] > > which should (in some obscure theory) match a file named " if > one exists, and in that case print just "\n (double quote followed by > newlline) Well, since the argument is first passed through the macro expansion that removes "" and prefixes internal (currently none) characters all by a \, this did never match a file named ". > | > case "$x" in '*') echo found an asterisk;; esac > | > case "$x" in \*) echo found an asterisk;; esac > | > | Both commands are 100% equivalent: > > They are as currently implemented, yes, but not as specified, either before > or after 985. Then we should change the wording. > The history lesson of how the Bourne shell worked, and has been changed > over time, is interesting to read, but in no way really relevant to anything. > > What we need to specify is how shells (in general) actually work - what the > users can rely upon safely using in their scripts, and what they cannot. > > | and since the '-' is quoted, this does not match, as the pattern is > equivalent > | to: [a\-c] that just lists the tree characters 'a', '-' and 'c'. > > Except that in char classes as defined in XBD 9.3.5 (which XCU 2.13 defers > to, except for the change of ^ into ! for sh globs) does not treat \ as any > kind of quoting character: > > The special characters '.', '*', '[', and '\\' (, , > , and , respectively) shall lose > their special meaning within a bracket expression. The characters '.', '*' and '[' really lose their special meaning inside a character class. The '\\' on the other side always allowed to escape the meaning of '-' and the meaning of any other char, see the original code fragment from 1977: SWITCH c = *p++ IN case '[': {BOOL ok; INT lc; ok=0; lc=07; WHILE c = *p++ DO IF c==']' THENreturn(ok?gmatch(s,p):0); ELIF c==MINUS THENIF lc<=scc ANDF scc<=(*p++) THEN ok++ FI ELSEIF scc==(lc=(c&STRIP)) THEN ok++ FI FI OD return(0); } Check: "ELIF c==MINUS" here as the parser in original Bourne Shell converted the string \- into "'-' + 0200" that does not match c==MINUS. All modern implementations I am aware of do something similar with explicit '\\' chars in the string. So the reason for the deviating behavior of ksh93 may be that it tries to follow 9.3.5 that does not seem to be alighed with the Bourne Shell and ksh88. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Date:Wed, 11 Apr 2018 11:28:38 +0200 From:Joerg Schilling Message-ID: <5acdd546.fkln7tigk21a+de6%joerg.schill...@fokus.fraunhofer.de> | The problem is that the term "quote removal" is not related to a real verified | shell implementation but rather explained by means of abstract wording that | tries to avoid being too close to a real algorithm. Yes, I know that - and that's fine, provided what is specified actually works (so if someone were to implement it exacty as described, everything would work.) I am certainly not expecting that real, useful, implementations would be done that way. | In special: your example ["] does not work as your text might mean. | | echo ["] No, I know that - in my first message on this topic (I think) I made that clear - [ and ] are just word characters to the lexer, and mean nothing at all (no different tnan a (except they're not alpha) or _ or %. Quotes need to be paired (must have a beginning and end), a better example at that level would be echo [""] which should (in some obscure theory) match a file named " if one exists, and in that case print just "\n (double quote followed by newlline) or match nothing, and then print [""]\n (the arg unchanged, followed by newline) if no file called " exists. (substitute "printf '%s\n'" for "echo" if you prefer, just to avoid any "echo should do..." discussions.) Note: I don't believe any shell actually implements things that way, and I don't think it would be useful to make them - quoting at sh script level is more useful than character class purity - it just needs to be specified properly, and currently, we do not have that. | > case "$x" in '*') echo found an asterisk;; esac | > case "$x" in \*) echo found an asterisk;; esac | | Both commands are 100% equivalent: They are as currently implemented, yes, but not as specified, either before or after 985. The history lesson of how the Bourne shell worked, and has been changed over time, is interesting to read, but in no way really relevant to anything. What we need to specify is how shells (in general) actually work - what the users can rely upon safely using in their scripts, and what they cannot. | and since the '-' is quoted, this does not match, as the pattern is equivalent | to: [a\-c] that just lists the tree characters 'a', '-' and 'c'. Except that in char classes as defined in XBD 9.3.5 (which XCU 2.13 defers to, except for the change of ^ into ! for sh globs) does not treat \ as any kind of quoting character: The special characters '.', '*', '[', and '\\' (, , , and , respectively) shall lose their special meaning within a bracket expression. Nothing in XCU 2.13 contradicts that or says it does not apply. Hence, according to the standard, that class [a\-c] should match any one of the characters \ ] ^ _ ` a b c (that is, an a or anything between \ and c which in ascii anyway, is that set of chars - 'a' matches both literally, and as a character in the range, but that is OK, just as [ba-c] is OK. Again, I know that's not how shells work, which is why it is under discussion here, the text needs to be fixed to specify what the shells actually do - properly. kre
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 11 Apr 2018: > > Incidentally, I know that this part of the 985 new text ... > > the first argument (pattern) is the same as patt, except each character > that was quoted in patt and is not in a bracket expression is prefixed > by a backslash > > is intended to handle this problem, except it cannot - once we have done quote > removal, what "was quoted" is lost, either we have the quotes, and know what > is > quoted, or we don't, and don't. No, that text is very careful to say "*was* quoted", not "is quoted", for precisely this reason. To conform to this requirement, the shell has to remember which characters were quoted when it removes the quotes. How this is done is a matter for the implementor. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Geoff Clare wrote: > Here's a much simpler demonstration of the same "quoting within > brackets" issue: > > $ ls > b > > ksh93: > $ echo ["a-c"] > b > > ksh88 and bash: > $ echo ["a-c"] > [a-c] > > As Joerg pointed out, the intention would have been for POSIX to > specify the ksh88 behaviour, so this should be considered to be bug > in ksh93. Thank you for this nice example as it helps to verify a behavior that I believed, it was impossible to verify. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Geoff Clare wrote: > Robert Elz wrote, on 11 Apr 2018: > > > > Lower down, it says ... > > > > In order from the beginning to the end of the case statement, each > > pattern > > that labels a compound-list shall be subjected to tilde expansion, > > parameter > > expansion, command substitution, and arithmetic expansion, and the > > result > > [note: no quote removal] > > of these expansions shall be compared against the expansion of word, > > The missing quote removal here is a known defect in the standard. > See http://austingroupbugs.net/view.php?id=985 > > > Not doing quote removal on patterns is correct. > > No it isn't. As bug 985 notes: > > $ case 'foo bar' in "foo bar") echo "quotes removed";; esac > quotes removed In the Bourne Shell, this matches the C-string "foo bar" against the pattern \f\o\o\ \ \b\a\r since the "case pattern" is subject of macro expansion that expands '"' quoted strings to strings with quoted characters. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Date:Wed, 11 Apr 2018 10:00:27 +0100 From:Geoff Clare Message-ID: <20180411090027.GA18582@lt2.masqnet> | There is nothing to suggest that this does not apply to the characters | which, when unquoted, have a special meaning within bracket expressions | ('!', '-', "[.", etc.) In file name patterns that might be correct, (because file name expansion happens before quote removal) but if bug 985 is correct, then in case atterns, the quoting would already be removed before the pattern was examined, so given var=!a case b in ["$var"]) whatever;; esac we expand (etc) the word first (nothing to do there) then each pattern (there is just one) in turn, first parameter expansion (etc), producing ["!a"] then quote removal [!a] and then we match ('b' is not 'a'). That the quotes used to be there is now no longer apparent. I suspect that the text in 985 needs to be revised to allow for this, or there is no question but that the ksh93 interpretation is correct, and every other shell is wrong. In general, quoting in patterns has only ever been possible using \ and in character classes, no quoting at all ([\]] is traditionally a class containg a backslash, followed by a literal ']' not a class containing a ']'. Since order in a class is irrelevant, ordering of the elements has been used to allow any character to appear in the class) without needing a quoting mechanism. Shells have largely not been that strict, largely because (at least for the older shells, I don't know how more modern ones do it) the posix requirement that the quotes in quoted words be left intact in the result from the lexer has largely been ignored, and quoting has been indicated in other ways, which make it easier, and faster, to tell exactly what is quoted and what is not every time later the shell needs to know (the lexer does the scanning once, and after that nothing ever needs to count beginning and ending quote chars, etc). A side effect of that is that (with quote removal not being done - and this is why I assume the standard did not originally specify it for case patterns) everything just works the way it is expected (a quoted a and an unquoted a still match, but a quoted ! is not the "not in class" character, only an unquoted ! can be that. I suspect ksh93 has "fixed" all of this, and implements more what the standard actually says. We need to be much more precise about matching, and everything related to it than we currently are, and 985 doesn't help, it makes things worse (though I fully understand, and agree with, the motivation for that defect report.) Incidentally, I know that this part of the 985 new text ... the first argument (pattern) is the same as patt, except each character that was quoted in patt and is not in a bracket expression is prefixed by a backslash is intended to handle this problem, except it cannot - once we have done quote removal, what "was quoted" is lost, either we have the quotes, and know what is quoted, or we don't, and don't. The only way to fix this is to remove quote removal from case patterns, and instead specify more precisely how a (possibly quoted) string is turned into a fnmatch pattern. kre
Re: Should shell quoting within glob bracket patterns be effective?
Martijn Dekker wrote: > Op 10-04-18 om 15:59 schreef Joerg Schilling: > > Whom do you call "current ksh93 lead developers"? > > As far as I can tell from what's going on at the github repo, Siteshwar > Vashisht and Kurtis Rader currently appear to be in charge of its > development. I am still in hope that David will soon again be the "leader" again. He understands the internals of ksh and he is one of the guys at AT&T that made important decisions on many interfaces. BTW: I started to become the Bourne Shell maintainer in 2006, but it took 7 years for me to become able to make enhancements that need an in-depth understanding of the data flow in the shell. ... even though I maintain my own other shell since 1984. Do not expect newcomers to be the right decision now and be careful about the changes they introduce. They did e.g. remove code just because they don't understand it... Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote: > Date:Tue, 10 Apr 2018 13:41:25 +0100 > From:Martijn Dekker > Message-ID: > > | Does POSIX specify anything, either way, regarding the effect of shell > | quoting within glob bracket patterns? > > I would say it is unclear - in general, quoting inside [] does not work > (XCU 2.13 char classes are derived from XBD 9.3.5 char classes, and > in the latter, quote characters are just characters ["] is a char class > containing just a double quote character. The problem is that the term "quote removal" is not related to a real verified shell implementation but rather explained by means of abstract wording that tries to avoid being too close to a real algorithm. In special: your example ["] does not work as your text might mean. echo ["] results in a secondary prompt with all roughly POSIX-like shells I am aware of, including the historic Bourne Shell. ... > That said, in practice, shells implement, and people expect, that "" and '' > quoting works in case patterns, at least in expressions like > > case "$x" in '*') echo found an asterisk;; esac > > even though this seems to be against the literal interpretation of 2.13.1 > which > would require > > case "$x" in \*) echo found an asterisk;; esac Both commands are 100% equivalent: The historical Bourne Shell did convert 'a' and \a into a 'a' with the top bit set in the parser and kept '"'s in the argument strings. In the late 1980's Bourne Shell and ksh88 have been modified to convert 'a' and \a into a \a and a string like 'abc' into \a\b\c in the parser and keep '"'s in the argument strings. During macro expansion, the historic Bourne Shell did convert "abc" strings into the string abc with the top bit set on all characters and modern Bourne Shells and ksh88 started to convert "abc" during macro expansion into \a\b\c, so this prevents glob expansion for the related characters. The code fragment: var='a-c' case b in ["$var"]) ... is thus equivalent to: case b in [\a\-\c]) ... and since the '-' is quoted, this does not match, as the pattern is equivalent to: [a\-c] that just lists the tree characters 'a', '-' and 'c'. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
Martijn Dekker wrote: > Op 10-04-18 om 21:06 schreef Robert Elz: > > Date:Tue, 10 Apr 2018 13:41:25 +0100 > > From:Martijn Dekker > > Message-ID: > > > >| Does POSIX specify anything, either way, regarding the effect of shell > >| quoting within glob bracket patterns? > > > > I would say it is unclear - in general, quoting inside [] does not work > > (XCU 2.13 char classes are derived from XBD 9.3.5 char classes, and > > in the latter, quote characters are just characters ["] is a char class > > containing just a double quote character. > > However: > > $ ksh93 -c 'case \" in ["a-z"]) echo match;; *) echo no match;; esac' > no match See my mail from just 5 minutes ago: the '"' is handled by the parser already and thus ["] will cause a secondary prompt. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: Should shell quoting within glob bracket patterns be effective?
I wrote: > > And I believe the standard does clearly require the ksh88/bash > behaviour because of this statement in 2.2.3 Double-Quotes: > > Enclosing characters in double-quotes ("") shall preserve the > literal value of all characters within the double-quotes, with the > exception of the characters backquote, , and > > > There is nothing to suggest that this does not apply to the characters > which, when unquoted, have a special meaning within bracket expressions > ('!', '-', "[.", etc.) Furthermore, there is clear evidence from 2.13.1 that double quotes do affect special characters within bracket expressions: A bracket expression starting with an unquoted character produces unspecified results. Note the use of "unquoted". -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Martijn Dekker wrote, on 10 Apr 2018: > > Re: https://github.com/att/ast/issues/71 > > Consider this test script: > > (set -o posix) 2>/dev/null && set -o posix > emulate sh 2>/dev/null # for zsh > for var in 'a-c' '!a'; do > case b in > ( ["$var"] )echo 'quirk' ;; > ( [$var] ) echo 'no quirk' ;; > esac > done Here's a much simpler demonstration of the same "quoting within brackets" issue: $ ls b ksh93: $ echo ["a-c"] b ksh88 and bash: $ echo ["a-c"] [a-c] As Joerg pointed out, the intention would have been for POSIX to specify the ksh88 behaviour, so this should be considered to be bug in ksh93. And I believe the standard does clearly require the ksh88/bash behaviour because of this statement in 2.2.3 Double-Quotes: Enclosing characters in double-quotes ("") shall preserve the literal value of all characters within the double-quotes, with the exception of the characters backquote, , and There is nothing to suggest that this does not apply to the characters which, when unquoted, have a special meaning within bracket expressions ('!', '-', "[.", etc.) -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Robert Elz wrote, on 11 Apr 2018: > > Lower down, it says ... > > In order from the beginning to the end of the case statement, each > pattern > that labels a compound-list shall be subjected to tilde expansion, > parameter > expansion, command substitution, and arithmetic expansion, and the > result > [note: no quote removal] > of these expansions shall be compared against the expansion of word, The missing quote removal here is a known defect in the standard. See http://austingroupbugs.net/view.php?id=985 > Not doing quote removal on patterns is correct. No it isn't. As bug 985 notes: $ case 'foo bar' in "foo bar") echo "quotes removed";; esac quotes removed If quote removal were not performed on the patterns, this would not match. You would see: $ case '"foo bar"' in "foo bar") echo "quotes not removed";; esac quotes not removed instead. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: Should shell quoting within glob bracket patterns be effective?
Op 10-04-18 om 22:50 schreef Jilles Tjoelker: I prefer "no quirk" twice as output but it is indeed not fully specified. I agree with your preference. Ignoring shell quoting in glob bracket patterns means removing a useful feature: the ability to pass an arbitrary string of characters in a parameter, one of which is to be matched. OTOH, honouring shell quoting in glob bracket patterns does not remove any functionality, as you can simply not quote the expansion (which is safe, as it is not subject to split or glob in that context). So if this is indeed not specified, I think the standard ought to be amended to specify the current majority behaviour (everything but ksh93). - Martijn
Re: Should shell quoting within glob bracket patterns be effective?
On Tue, Apr 10, 2018 at 01:41:25PM +0100, Martijn Dekker wrote: > Re: https://github.com/att/ast/issues/71 > Consider this test script: > (set -o posix) 2>/dev/null && set -o posix > emulate sh 2>/dev/null # for zsh > for var in 'a-c' '!a'; do > case b in > ( ["$var"] )echo 'quirk' ;; > ( [$var] ) echo 'no quirk' ;; > esac > done > Most shells output 'no quirk' for both values of 'var', but AT&T ksh93 > outputs 'quirk' for both, as does zsh 5.2 and earlier (zsh-as-sh changed > to match the majority in 5.3). Now one of the current ksh93 lead > developers says this does not look like a bug. > Does POSIX specify anything, either way, regarding the effect of shell > quoting within glob bracket patterns? I can't find any relevant text > under "2.13 Pattern Matching Notation" or anything it references, so > clarification would be appreciated. The first paragraph of 2.13.1 Patterns Matching a Single Character contains some confusing or contradictory text about backslashes; this text was amended for http://austingroupbugs.net/view.php?id=806 but was confusing or contradictory even before that change. The change was made for fnmatch() and perhaps the part about backslashes in the first paragraph was actually meant to handled in the last paragraph in the part that explicitly says it is only about contexts such as fnmatch() where shell quote removal is not performed. The rest of 2.13.1 discusses "quoting" of characters in various locations. I think it is reasonable to assume that shell quoting is meant. Only the effect of quoting '!', '-' and ']' in a bracket expression is not specified (but the effect of quoting '^' is: it makes the '^' a literal part of the set). I prefer "no quirk" twice as output but it is indeed not fully specified. -- Jilles Tjoelker
Re: Should shell quoting within glob bracket patterns be effective?
Op 10-04-18 om 15:59 schreef Joerg Schilling: Whom do you call "current ksh93 lead developers"? As far as I can tell from what's going on at the github repo, Siteshwar Vashisht and Kurtis Rader currently appear to be in charge of its development. - M.
Re: Should shell quoting within glob bracket patterns be effective?
Op 10-04-18 om 21:52 schreef Robert Elz: No, it doesn't. Read that again, with the emphasis I am adding ... |http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05 | | The conditional construct case shall execute the/compound-list/ | | corresponding to the first one of several/patterns/ (see Pattern | | Matching Notation) that is matched by the string resulting from the | | tilde expansion, parameter expansion, command substitution, | | arithmetic expansion, and quote removal **of the given word**. That part is talking about the "case $x in" $x is the "given word", that is certainly subject to quote removal. Quite right, I stand corrected. Thanks, - M.
Re: Should shell quoting within glob bracket patterns be effective?
Date:Tue, 10 Apr 2018 21:28:01 +0100 From:Martijn Dekker Message-ID: <6e79f3b1-732e-a7d4-1d07-a04d7a9cf...@inlv.org> | But it is. POSIX explicitly specifies quote removal for 'case' patterns: No, it doesn't. Read that again, with the emphasis I am adding ... | http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05 | | The conditional construct case shall execute the /compound-list/ | | corresponding to the first one of several /patterns/ (see Pattern | | Matching Notation) that is matched by the string resulting from the | | tilde expansion, parameter expansion, command substitution, | | arithmetic expansion, and quote removal **of the given word**. That part is talking about the "case $x in" $x is the "given word", that is certainly subject to quote removal. Lower down, it says ... In order from the beginning to the end of the case statement, each pattern that labels a compound-list shall be subjected to tilde expansion, parameter expansion, command substitution, and arithmetic expansion, and the result [note: no quote removal] of these expansions shall be compared against the expansion of word, [from the sectin you quoted] according to the rules described in Section 2.13 If quote removal were done on patterns, then to match a literal asterisk we would need something like case "$x" in \\*) ... as the quote removal would leave \* which would then be a quoted asterisk. Similarly, '*' would be interpreted as just * (the quotes being removed) and so "match anything" which is also not what anyone does, or wants. Not doing quote removal on patterns is correct. | I hope you won't change it to ksh93's counterintuitive behaviour. Your | current behaviour is certainly consistent with POSIX (as well as every | other current shell except ksh93). I have no current plan to change that, this is an area where I believe the standard needs some work first. After that, if what the standard says is different from what we implement, and is also reasonable (and unlikely to break too much) then I might make changes. kre
Re: Should shell quoting within glob bracket patterns be effective?
On 4/10/18 4:28 PM, Martijn Dekker wrote: >> [this includes case patterns as quote removal is not performed on them] > > But it is. POSIX explicitly specifies quote removal for 'case' patterns: > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05 > > | The conditional construct case shall execute the /compound-list/ > | corresponding to the first one of several /patterns/ (see Pattern > | Matching Notation) that is matched by the string resulting from the > | tilde expansion, parameter expansion, command substitution, > | arithmetic expansion, and quote removal of the given word. That text is describing the `word', not the patterns kre is talking about. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: Should shell quoting within glob bracket patterns be effective?
Op 10-04-18 om 21:06 schreef Robert Elz: Date:Tue, 10 Apr 2018 13:41:25 +0100 From:Martijn Dekker Message-ID: | Does POSIX specify anything, either way, regarding the effect of shell | quoting within glob bracket patterns? I would say it is unclear - in general, quoting inside [] does not work (XCU 2.13 char classes are derived from XBD 9.3.5 char classes, and in the latter, quote characters are just characters ["] is a char class containing just a double quote character. However: $ ksh93 -c 'case \" in ["a-z"]) echo match;; *) echo no match;; esac' no match The quotes are not considered part of the bracket expression, but removed by the shell, even on ksh93. Also, 2.13.1 does say: When pattern matching is used where shell quote removal is not performed [...] [this includes case patterns as quote removal is not performed on them] But it is. POSIX explicitly specifies quote removal for 'case' patterns: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05 | The conditional construct case shall execute the /compound-list/ | corresponding to the first one of several /patterns/ (see Pattern | Matching Notation) that is matched by the string resulting from the | tilde expansion, parameter expansion, command substitution, | arithmetic expansion, and quote removal of the given word. The NetBSD sh however produces "no quirk" for both - I hope you won't change it to ksh93's counterintuitive behaviour. Your current behaviour is certainly consistent with POSIX (as well as every other current shell except ksh93). - M.
Re: Should shell quoting within glob bracket patterns be effective?
Date:Tue, 10 Apr 2018 13:41:25 +0100 From:Martijn Dekker Message-ID: | Does POSIX specify anything, either way, regarding the effect of shell | quoting within glob bracket patterns? I would say it is unclear - in general, quoting inside [] does not work (XCU 2.13 char classes are derived from XBD 9.3.5 char classes, and in the latter, quote characters are just characters ["] is a char class containing just a double quote character. Also, 2.13.1 does say: When pattern matching is used where shell quote removal is not performed [...] [this includes case patterns as quote removal is not performed on them] special characters can be escaped to remove their special meaning by preceding them with a character. [...] and that is the only quoting method provided. That would suggest that ["$var"] is first parameter expanded ($var becomes !a in one of the cases) resulting in ["!a"] which is a character class that matches a double quote, an exclamation mark, or an 'a' (including a character twice is harmless - though the terminating " is needed here (somewhere) so the lexer can recognise the pattern word properly. That said, in practice, shells implement, and people expect, that "" and '' quoting works in case patterns, at least in expressions like case "$x" in '*') echo found an asterisk;; esac even though this seems to be against the literal interpretation of 2.13.1 which would require case "$x" in \*) echo found an asterisk;; esac to achieve this effect - with the earlier one matching a string that starts and ends with single quote chars, and has anything between them. Regardless of the POSIX wording, I think this part is set in stone (that both of the above match a literal asterisk) and should be clarified. The effect of quotes inside [] though is much less clear. Joerg: I suspect that the original Bourne sh behaviour is probably just an artifact of the (crude) way that quoting was parsed in the lexer, which is in no way posix (nor useful, nor implemented any more). That would have changed the '!' and '-' into things that were not those characters, unconditionally - hence they don't perform as they would if they appeared unquoted. That is, I do not believe that provides any useful help. My interpretation from the standard of the correct expected result is "quirk" for a-c (as ["a-c"] is a class containing a double quote, and all chars from a to c (which includes b) but "no qurk" for !a as 'b' is none of a double quote, an exclamation point, nor an 'a'. The NetBSD sh however produces "no quirk" for both - again partly because of the quirky way that it implements quoting in the lexer (different than the original Bourns sh, but still not the same as POSIX expects.) kre
Re: Should shell quoting within glob bracket patterns be effective?
Martijn Dekker wrote: > Re: https://github.com/att/ast/issues/71 > > Consider this test script: > > (set -o posix) 2>/dev/null && set -o posix > emulate sh 2>/dev/null # for zsh > for var in 'a-c' '!a'; do > case b in > ( ["$var"] )echo 'quirk' ;; > ( [$var] ) echo 'no quirk' ;; > esac > done > > Most shells output 'no quirk' for both values of 'var', but AT&T ksh93 > outputs 'quirk' for both, as does zsh 5.2 and earlier (zsh-as-sh changed > to match the majority in 5.3). Now one of the current ksh93 lead > developers says this does not look like a bug. Whom do you call "current ksh93 lead developers"? > Does POSIX specify anything, either way, regarding the effect of shell > quoting within glob bracket patterns? I can't find any relevant text > under "2.13 Pattern Matching Notation" or anything it references, so > clarification would be appreciated. Given that ksh88 and the original Bourne Shell both return 'no quirk' for both values, this is a strong hint that ksh93 is wrong. Given that "bosh" returned "quirk" for the first one before I fixed a bug in the gmatch() implementation, it is highly probable that ksh93 has a bug in it's pattern matcher. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Should shell quoting within glob bracket patterns be effective?
Re: https://github.com/att/ast/issues/71 Consider this test script: (set -o posix) 2>/dev/null && set -o posix emulate sh 2>/dev/null # for zsh for var in 'a-c' '!a'; do case b in ( ["$var"] ) echo 'quirk' ;; ( [$var] ) echo 'no quirk' ;; esac done Most shells output 'no quirk' for both values of 'var', but AT&T ksh93 outputs 'quirk' for both, as does zsh 5.2 and earlier (zsh-as-sh changed to match the majority in 5.3). Now one of the current ksh93 lead developers says this does not look like a bug. Does POSIX specify anything, either way, regarding the effect of shell quoting within glob bracket patterns? I can't find any relevant text under "2.13 Pattern Matching Notation" or anything it references, so clarification would be appreciated. Thanks, - Martijn