Re: Backslashes in unquoted parameter expansions
Op 26-03-18 om 17:38 schreef Harald van Dijk: > And not by dash 0.5.4. Like I wrote, dash 0.5.5 had some bugs that were > fixed in 0.5.6, which mostly restored the behaviour to match <0.5.5. Ah, sorry. dash 0.5.4 and earlier don't compile on my system, so they are not included in my conveniently accessible arsenal of test shells. > As for my patches, that was by accident and doesn't work reliably. When > the shell sees no metacharacters, pathname expansion is bypassed, and > backslash isn't considered a metacharacter. Which got me to my original > example of /de\v: there are no metacharacters in there, so the shell > doesn't look to see if it matches anything. Which seems highly > desirable: the shell shouldn't need to hit the file system for words not > containing metacharacters. The only way then to get consistent behaviour > is if the backslash is taken as quoted, so I'm not tempted to argue for > the behaviour you're hoping for, sorry. :) But 'case' never hits the file system. There may be a compelling reason to differ from bash (and ignore the apparent POSIX requirement) when it comes to pathname expansion, but I don't see one for 'case'. Plus, expansions within 'case' are already treated differently: no field splitting or generating of fields, no pathname expansion. And the pattern matching behaviour is already different as well. So if we're going to ignore what POSIX appears to require anyway, maybe this behaviour does not really need to be consistent between 'case' and pathname expansion. You initially asked for "scenarios where it's important to treat an expanded backslash as unquoted". So I gave you a use case involving a shell function that does pattern matching with 'case', which needs this functionality to match arbitrary strings without an expensive workaround. I guess you don't think that use case is compelling enough? > And remember, personal playground, lots of disclaimers about bugs. Yes, I'm aware. As I indicated earlier, I'm now using it as my default /usr/local/bin/dash to help you find those bugs. > I suspected the intent for dash was to treat it as quoted, but I was > hoping for verification. Calling Herbert Xu, come in please... - M. -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backslashes in unquoted parameter expansions
On 26/03/2018 15:34, Martijn Dekker wrote: Op 26-03-18 om 14:12 schreef Harald van Dijk: On 26/03/2018 13:57, Martijn Dekker wrote: I don't see any inconsistency. Expansions are consistently treated differently within 'case' than outside it. Among other things, expansions within 'case' are *not* subject to pathname expansion; it's string pattern matching using glob patterns, which is something completely different. It's not something completely different. Pathname expansion is defined in terms of pattern matching (the pattern matching used in e.g. case statements), plus a specific set of differences. See 2.6.6 Pathname Expansion: After field splitting, if set -f is not in effect, each field in the resulting command line shall be expanded using the algorithm described in Pattern Matching Notation, qualified by the rules in Patterns Used for Filename Expansion. That specific set of differences, 2.13.3 Patterns Used for Filename Expansion, doesn't include different treatment of backslashes. I see your point now. You're absolutely right. Hmmm... If we backslash-escape a glob character, '?': $ touch '_foo?bar_' $ testshells -c 'p='\''*o\?b*'\''; printf %s $p' The backslash is correctly honoured by: bash 2.05b through git: _foo?bar_ dash 0.5.5.1: _foo?bar_ dash-hvdijk: _foo?bar_ zsh as sh: _foo?bar_ The backslash is *not* honoured by: dash 0.5.6 through 0.5.9.1: *o\?b* ksh93: *o\?b* mksh/lksh: *o\?b* yash -o posix: *o\?b* And not by dash 0.5.4. Like I wrote, dash 0.5.5 had some bugs that were fixed in 0.5.6, which mostly restored the behaviour to match <0.5.5. And if we backslash-escape a non-glob character, 'b': $ touch '_foo?bar_' $ testshells -c 'p='\''*o?\b*'\''; printf %s $p' The backslash is correctly honoured by: bash 2.05b through git: _foo?bar_ dash 0.5.5.1: _foo?bar_ dash-hvdijk: _foo?bar_ The backslash is *not* honoured by: dash 0.5.6 through 0.5.9.1: *o\?b* ksh93: *o\?b* mksh/lksh: *o\?b* yash -o posix: *o\?b* zsh as sh: *o\?b* Also not by dash 0.5.4. Funny how these results are different from the results I get when doing the same test with 'case' pattern matching. As you point out, they are supposed to be subject to the same rules with some modifications *not* including backslash parsing. So the results should at least be identical for each shell. So yes, dash is inconsistent. But given what POSIX says, I think dash should probably go back to honouring the backslash for pathname expansion as it did in 0.5.5.1 and does in your fork. Maybe you should argue the case with the Austin Group. It would be nice to get clarification on the issue. I don't think 0.5.5 should be taken as the reference point for historic dash behaviour when older versions disagree with it as much as the newer ones. As for my patches, that was by accident and doesn't work reliably. When the shell sees no metacharacters, pathname expansion is bypassed, and backslash isn't considered a metacharacter. Which got me to my original example of /de\v: there are no metacharacters in there, so the shell doesn't look to see if it matches anything. Which seems highly desirable: the shell shouldn't need to hit the file system for words not containing metacharacters. The only way then to get consistent behaviour is if the backslash is taken as quoted, so I'm not tempted to argue for the behaviour you're hoping for, sorry. :) And remember, personal playground, lots of disclaimers about bugs. Don't consider it a fork, I don't treat it as a separate project, it's just that dash is a nice program for me to play around with. When I noticed the difference, that's what prompted me to ask the question in the first place. I suspected the intent for dash was to treat it as quoted, but I was hoping for verification. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backslashes in unquoted parameter expansions
Op 26-03-18 om 14:12 schreef Harald van Dijk: > On 26/03/2018 13:57, Martijn Dekker wrote: >> I don't see any inconsistency. Expansions are consistently treated >> differently within 'case' than outside it. Among other things, >> expansions within 'case' are *not* subject to pathname expansion; it's >> string pattern matching using glob patterns, which is something >> completely different. > > It's not something completely different. Pathname expansion is defined > in terms of pattern matching (the pattern matching used in e.g. case > statements), plus a specific set of differences. See 2.6.6 Pathname > Expansion: > >> After field splitting, if set -f is not in effect, each field in the >> resulting command line shall be expanded using the algorithm described >> in Pattern Matching Notation, qualified by the rules in Patterns Used >> for Filename Expansion. > > That specific set of differences, 2.13.3 Patterns Used for Filename > Expansion, doesn't include different treatment of backslashes. I see your point now. You're absolutely right. Hmmm... If we backslash-escape a glob character, '?': $ touch '_foo?bar_' $ testshells -c 'p='\''*o\?b*'\''; printf %s $p' The backslash is correctly honoured by: bash 2.05b through git: _foo?bar_ dash 0.5.5.1: _foo?bar_ dash-hvdijk: _foo?bar_ zsh as sh: _foo?bar_ The backslash is *not* honoured by: dash 0.5.6 through 0.5.9.1: *o\?b* ksh93: *o\?b* mksh/lksh: *o\?b* yash -o posix: *o\?b* And if we backslash-escape a non-glob character, 'b': $ touch '_foo?bar_' $ testshells -c 'p='\''*o?\b*'\''; printf %s $p' The backslash is correctly honoured by: bash 2.05b through git: _foo?bar_ dash 0.5.5.1: _foo?bar_ dash-hvdijk: _foo?bar_ The backslash is *not* honoured by: dash 0.5.6 through 0.5.9.1: *o\?b* ksh93: *o\?b* mksh/lksh: *o\?b* yash -o posix: *o\?b* zsh as sh: *o\?b* Funny how these results are different from the results I get when doing the same test with 'case' pattern matching. As you point out, they are supposed to be subject to the same rules with some modifications *not* including backslash parsing. So the results should at least be identical for each shell. So yes, dash is inconsistent. But given what POSIX says, I think dash should probably go back to honouring the backslash for pathname expansion as it did in 0.5.5.1 and does in your fork. Maybe you should argue the case with the Austin Group. It would be nice to get clarification on the issue. - M. -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backslashes in unquoted parameter expansions
On 26/03/2018 13:57, Martijn Dekker wrote: Op 26-03-18 om 12:30 schreef Harald van Dijk: With the snipping it's not clear that I was specifically confused by the inconsistency. I had included another example: pat="/de\v" printf "%s\n" $pat I can understand treating backslash as quoted, or treating it as unquoted, but not quoted-unless-in-a-case-statement. What justifies this one exception? I don't see any inconsistency. Expansions are consistently treated differently within 'case' than outside it. Among other things, expansions within 'case' are *not* subject to pathname expansion; it's string pattern matching using glob patterns, which is something completely different. It's not something completely different. Pathname expansion is defined in terms of pattern matching (the pattern matching used in e.g. case statements), plus a specific set of differences. See 2.6.6 Pathname Expansion: After field splitting, if set -f is not in effect, each field in the resulting command line shall be expanded using the algorithm described in Pattern Matching Notation, qualified by the rules in Patterns Used for Filename Expansion. That specific set of differences, 2.13.3 Patterns Used for Filename Expansion, doesn't include different treatment of backslashes. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backslashes in unquoted parameter expansions
Op 26-03-18 om 12:30 schreef Harald van Dijk: > With the snipping it's not clear that I was specifically confused by the > inconsistency. > > I had included another example: > > pat="/de\v" > printf "%s\n" $pat > > I can understand treating backslash as quoted, or treating it as > unquoted, but not quoted-unless-in-a-case-statement. What justifies this > one exception? I don't see any inconsistency. Expansions are consistently treated differently within 'case' than outside it. Among other things, expansions within 'case' are *not* subject to pathname expansion; it's string pattern matching using glob patterns, which is something completely different. - M. -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backslashes in unquoted parameter expansions
On 26/03/2018 11:34, Martijn Dekker wrote: Op 25-03-18 om 22:56 schreef Harald van Dijk: case /dev in $pat) echo why ;; esac Now, bash and dash say that the pattern does match -- they take the backslash as unquoted, allowing it to escape the v. Most other shells (bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as quoted. This doesn't make sense to me, and doesn't match historic practice: [...] With the snipping it's not clear that I was specifically confused by the inconsistency. I had included another example: pat="/de\v" printf "%s\n" $pat I can understand treating backslash as quoted, or treating it as unquoted, but not quoted-unless-in-a-case-statement. What justifies this one exception? Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backslashes in unquoted parameter expansions
Op 25-03-18 om 22:56 schreef Harald van Dijk: > case /dev in $pat) echo why ;; esac > > Now, bash and dash say that the pattern does match -- they take the > backslash as unquoted, allowing it to escape the v. Most other shells > (bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as > quoted. > > This doesn't make sense to me, and doesn't match historic practice: [...] POSIX says: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05 | In order from the beginning to the end of the case statement, each | pattern that labels a compound-list shall be subjected to tilde | expansion, parameter expansion, command substitution, and arithmetic | expansion, and the result of these expansions shall be compared | against the expansion of word, according to the rules described in | Pattern Matching Notation (which also describes the effect of quoting | parts of the pattern). The way I read this, this clearly says that quoting in a pattern (particularly backslash quoting, which is the only kind specified in "Pattern Matching Notation") still needs to have the usual effect even if the pattern results from one or more expansions. But I understand there are differences of opinion about this. It's certainly true that few shells actually act this way, but dash is one that does, as is Busybox ash -- and so is bash (for the most part; see further on). I think *not* acting this way is illogical. Why should 'case' parse glob characters resulting from expansions, but not the backslashes that could quote those glob characters? I can see no reason for that. Note that quoting the expansion, as in case /dev in "$pat") echo why ;; esac does what you would expect: the pattern resulting from the expansion is fully quoted. So with dash and bash you can easily and cleanly have it either way, unlike with other shells. (Note that yash, ksh93 and zsh-as-sh act half-baked: backslashes in patterns resulting from expansions are accepted to quote glob characters and backslashes themselves, but not any other character. AFAICT, that behaviour doesn't conform to POSIX no matter which way you slice it.) [...] > or are there scenarios where it's important to treat an expanded > backslash as unquoted? Consider this function from modernish (simplified version): match() { case $1 in ( $2 ) ;; ( * ) return 1 ;; esac } This allows doing: if match STRING GLOBPATTERN; then on every POSIX shell. Very convenient. Easier than 'case', especially if you want to combine it like: command1 && match foo bar && command3, etc. And the syntax is not an eyesore, finger-twister and spacing pitfall, unlike that of '[['. But consider this: match 'a\bcd' 'a\?c*' The '?' is escaped so shouldn't match. This correctly returns a negative on dash, bash, ksh93, and zsh. It returns a false positive on yash and mksh. (I haven't tested other shells like FreeBSD sh lately.) This means on those shells you can't use a backslash to escape a glob character in a pattern passed as a parameter. And how about this: match 'a\bcd' 'a\\bcd' Same pattern as above. This correctly returns a positive on dash, bash, ksh93, and zsh-as-sh; a false negative on the rest. However, this: match '? *xy' '??\*\x\y' only correctly return a positive on bash and dash. That's because ksh93 and zsh-as-sh, for patterns resulting from expansions, only parse backslash quoting for glob characters and the backslash itself, but not for other characters. On bash, there is a bug that breaks backslash quoting on match() if the pattern contains a ^A (\001). So bash can't robustly use the simple match() for arbitrary patterns. This is *mostly* fixed in the development version; the fix is good enough for the simple match() to work. Bottom line is, dash and Busybox ash (but not FreeBSD sh), as well as the upcoming release version of bash, are currently the only shells that can reliably use the plain, simple and fast match() above for arbitrary patterns. When running on other shells (as determined by an init-time feature test using a simple match()), modernish match() detects one or more backslashes in the pattern, and if it finds any, quotes the pattern except for glob characters and backslashes, so it can safely be 'eval'-ed as a literal pattern. This workaround is effective, but was a bitch to get right and is not exactly a performance winner. So yeah, I'd like to keep dash the way it is, please :) - Martijn -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Backslashes in unquoted parameter expansions
Hi, Consider pat="/de\v" printf "%s\n" $pat All shells appear to be in agreement that the backslash is taken literally here. It's treated as quoted, even though $pat is unquoted. Then, case /dev in $pat) echo why ;; esac Now, bash and dash say that the pattern does match -- they take the backslash as unquoted, allowing it to escape the v. Most other shells (bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as quoted. This doesn't make sense to me, and doesn't match historic practice: dash before v0.5.5 took this backslash as quoted too. v0.5.5 then had some other related bugs that were fixed in v0.5.6, but it looks like an accident that this was not restored to the v0.5.4 behaviour. This comes from expand.c's memtodest: if ((quotes & QUOTES_ESC) && ((syntax[c] == CCTL) || (((quotes & EXP_FULL) || syntax != BASESYNTAX) && syntax[c] == CBACK))) USTPUTC(CTLESC, q); This only escapes backslashes in field expansion context and in quoted string context. Should this simply be if ((quotes & QUOTES_ESC) && ((syntax[c] == CCTL) || (syntax[c] == CBACK))) USTPUTC(CTLESC, q); or are there scenarios where it's important to treat an expanded backslash as unquoted? Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html