Re: dash bug: double-quoted "\" breaks glob protection for next char
On 02/03/2018 08:49, Herbert Xu wrote: On Thu, Mar 01, 2018 at 08:24:22PM +0100, Harald van Dijk wrote: On 01/03/2018 00:04, Harald van Dijk wrote: $ bash -c 'x=yz; echo "${x#'"'y'"'}"' z $ dash -c 'x=yz; echo "${x#'"'y'"'}"' yz (That is, they are executing x=yz; echo "${x#'y'}".) POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the pattern is considered unquoted regardless of the outer quotation marks. Because of that, the single quote characters should not be taken literally, but should be taken as quoting the y. ksh, posh and zsh agree with bash. Unfortunately, this causes another problem with all of the backslash approaches so far: x=''; printf "%s\n" "${x#''}" This should print a blank line. (bash, ksh, posh and zsh agree.) Here, dash's parser stores '$\$\', where $ is a control character. preglob would need to turn this into . The problem is again that preglob cannot increase the string length. Perhaps the parser needs to store this as '$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it requires some more invasive changes. These are different issues. dash's parser currently does not understand nested quoting in patterns at all. That is, if your parameter expansion are within double quotes, then dash at the parser level will consider the pattern to be double-quoted. Thus any nested single-quotes will be literals instead of actual quotes. That's the same thing though. The problem with the backslashes is also that dash sees them as double-quoted when they should be seen as unquoted, and the approach taken in commit 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c that lasts to this day was specifically to *not* fix this in the parser, but to simply have the parser record enough information so that quote status can be determined and patched up during expansion. It's just that in the case of single quotes, expansion was never modified to recognise them. Thinking some more, I don't think the parser actually records enough information to let that work. If we fix this in the parser then everything should just work. Right, that's the approach FreeBSD sh has taken that I referred to in my message from Feb 18, that I'd personally prefer as well. It basically involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a variable expansion starts, and finding a sensible way to change the syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an explicit stack of syntaxes is created for this, but that might be avoidable: with slight modifications to what gets stored in the byte after CTLVAR/CTLARI, it might be possible to go back through the parser output to determine the syntax to revert to. I'll see if I can get that working. Cheers, -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On Fri, Mar 02, 2018 at 11:58:41AM +0100, Harald van Dijk wrote: > > >If we fix this in the parser then everything should just work. > > Right, that's the approach FreeBSD sh has taken that I referred to in my > message from Feb 18, that I'd personally prefer as well. It basically > involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax > to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a > variable expansion starts, and finding a sensible way to change the syntax > back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an > explicit stack of syntaxes is created for this, but that might be avoidable: > with slight modifications to what gets stored in the byte after > CTLVAR/CTLARI, it might be possible to go back through the parser output to > determine the syntax to revert to. I'll see if I can get that working. Yes but that's overkill just to fix single quote within patterns. We already support nested double-quotes in patterns correctly. As single quotes cannot nest, it should be an easy fix. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 02/03/2018 16:28, Herbert Xu wrote: On Fri, Mar 02, 2018 at 11:58:41AM +0100, Harald van Dijk wrote: If we fix this in the parser then everything should just work. Right, that's the approach FreeBSD sh has taken that I referred to in my message from Feb 18, that I'd personally prefer as well. It basically involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a variable expansion starts, and finding a sensible way to change the syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an explicit stack of syntaxes is created for this, but that might be avoidable: with slight modifications to what gets stored in the byte after CTLVAR/CTLARI, it might be possible to go back through the parser output to determine the syntax to revert to. I'll see if I can get that working. Yes but that's overkill just to fix single quote within patterns. We already support nested double-quotes in patterns correctly. As single quotes cannot nest, it should be an easy fix. Single quotes indeed cannot nest, but you do need to reliably determine when a single quote is a special character, which gets very tricky very quickly with nested substitutions. In "${x+''}", the ' is the literal ' character. In "${x#''}", the ' is a quote character. This part is easy, this part is just a matter of setting another variable when the parse of the substitution starts. But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}", the ' is a quote character. This part is hard. If the above is done simply using another local variable, then the parse of the nested ${y-} would clobber that variable. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On Fri, Mar 02, 2018 at 05:05:46PM +0100, Harald van Dijk wrote: > > But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}", > the ' is a quote character. This part is hard. If the above is done simply > using another local variable, then the parse of the nested ${y-} would > clobber that variable. I don't see why that's hard. You just need to remember whether you're in a pattern context (i.e., after a %/%% or #/##). If you are, then you need to go back to basesyntax instead of dqsyntax until the next right brace. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk wrote: > Currently: > > $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' > <> > > This is what I expect, and also what bash, ksh and posh do. > > With your patch: > > $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' > I was looking into this specific example and I believe it is a _bash_ bug. The [a\]] is misinterpreted by it (and probably by many people). The gist is: \] is not a valid escape for ] in set glob expression. Glob sets have no escaping at all, ] can be in a set if it is the first char: []abc], dash can be in a set if it is first or last: [abc-], [ and \ need no protections at all: [a[b\c] is a valid set of 5 chars. Therefore, "[a\]]" glob pattern means "a or \, then ]". Since that does not match "a", the result of ${foo#[a\]]}> should be "a". -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 02/03/2018 18:00, Denys Vlasenko wrote: On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk wrote: Currently: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' <> This is what I expect, and also what bash, ksh and posh do. With your patch: $ dash -c 'foo=a; echo "<${foo#[a\]]}>"' I was looking into this specific example and I believe it is a _bash_ bug. The [a\]] is misinterpreted by it (and probably by many people). The gist is: \] is not a valid escape for ] in set glob expression. Glob sets have no escaping at all, ] can be in a set if it is the first char: []abc], dash can be in a set if it is first or last: [abc-], [ and \ need no protections at all: [a[b\c] is a valid set of 5 chars. Therefore, "[a\]]" glob pattern means "a or \, then ]". Since that does not match "a", the result of ${foo#[a\]]}> should be "a". Are you sure about this? "Patterns Matching a Single Character"'s first paragraph contains "A character shall escape the following character. The escaping shall be discarded." The shell does this first. It removes the backslash (remembering that the following character is escaped) before it starts interpreting the result as a bracket expression, and so never gets to the point where the \ should be taken literally. case \] in [\]]) echo matched ;; esac prints "matched" in all shells I can check. Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dash bug: double-quoted "\" breaks glob protection for next char
On 02/03/2018 17:33, Herbert Xu wrote: On Fri, Mar 02, 2018 at 05:05:46PM +0100, Harald van Dijk wrote: But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}", the ' is a quote character. This part is hard. If the above is done simply using another local variable, then the parse of the nested ${y-} would clobber that variable. I don't see why that's hard. You just need to remember whether you're in a pattern context (i.e., after a %/%% or #/##). If you are, then you need to go back to basesyntax instead of dqsyntax until the next right brace. Let me slightly modify my example so that the effect becomes different: "${x#"${y-*}"''}" When the parser has processed "${x#"${y- would the state be "in a pattern context", or "not in a pattern context"? If the state is "in a pattern context", how do you prevent the next * from being taken as unquoted? It should be taken as quoted. If it stores "not in a pattern context", and the parser is processing the last character in "${x#"${y-*} how can it reset the state to "in a pattern context"? Where could this state be stored? With arbitrarily deep nesting of variable expansions, I do not see how you can avoid requiring arbitrarily large state, which gets difficult with dash's non-recursive parser. Mind you, I do hope I'm missing something obvious here! Cheers, Harald van Dijk -- To unsubscribe from this list: send the line "unsubscribe dash" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html