Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 08:49, Herbert Xu wrote:

On Thu, Mar 01, 2018 at 08:24:22PM +0100, Harald van Dijk wrote:

On 01/03/2018 00:04, Harald van Dijk wrote:

$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
z

$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
yz

(That is, they are executing x=yz; echo "${x#'y'}".)

POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the
pattern is considered unquoted regardless of the outer quotation marks.
Because of that, the single quote characters should not be taken
literally, but should be taken as quoting the y. ksh, posh and zsh agree
with bash.


Unfortunately, this causes another problem with all of the backslash
approaches so far:

   x=''; printf "%s\n" "${x#''}"

This should print a blank line. (bash, ksh, posh and zsh agree.)

Here, dash's parser stores '$\$\', where $ is a control character. preglob
would need to turn this into . The problem is again that preglob
cannot increase the string length. Perhaps the parser needs to store this as
'$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it
requires some more invasive changes.


These are different issues.  dash's parser currently does not
understand nested quoting in patterns at all.  That is, if your
parameter expansion are within double quotes, then dash at the
parser level will consider the pattern to be double-quoted.  Thus
any nested single-quotes will be literals instead of actual quotes.


That's the same thing though. The problem with the backslashes is also 
that dash sees them as double-quoted when they should be seen as 
unquoted, and the approach taken in commit 
7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c that lasts to this day was 
specifically to *not* fix this in the parser, but to simply have the 
parser record enough information so that quote status can be determined 
and patched up during expansion. It's just that in the case of single 
quotes, expansion was never modified to recognise them. Thinking some 
more, I don't think the parser actually records enough information to 
let that work.



If we fix this in the parser then everything should just work.


Right, that's the approach FreeBSD sh has taken that I referred to in my 
message from Feb 18, that I'd personally prefer as well. It basically 
involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting 
syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse 
of a variable expansion starts, and finding a sensible way to change the 
syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD 
sh, an explicit stack of syntaxes is created for this, but that might be 
avoidable: with slight modifications to what gets stored in the byte 
after CTLVAR/CTLARI, it might be possible to go back through the parser 
output to determine the syntax to revert to. I'll see if I can get that 
working.



Cheers,

--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Herbert Xu
On Fri, Mar 02, 2018 at 11:58:41AM +0100, Harald van Dijk wrote:
>
> >If we fix this in the parser then everything should just work.
> 
> Right, that's the approach FreeBSD sh has taken that I referred to in my
> message from Feb 18, that I'd personally prefer as well. It basically
> involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax
> to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a
> variable expansion starts, and finding a sensible way to change the syntax
> back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an
> explicit stack of syntaxes is created for this, but that might be avoidable:
> with slight modifications to what gets stored in the byte after
> CTLVAR/CTLARI, it might be possible to go back through the parser output to
> determine the syntax to revert to. I'll see if I can get that working.

Yes but that's overkill just to fix single quote within patterns.
We already support nested double-quotes in patterns correctly.  As
single quotes cannot nest, it should be an easy fix.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 16:28, Herbert Xu wrote:

On Fri, Mar 02, 2018 at 11:58:41AM +0100, Harald van Dijk wrote:



If we fix this in the parser then everything should just work.


Right, that's the approach FreeBSD sh has taken that I referred to in my
message from Feb 18, that I'd personally prefer as well. It basically
involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting syntax
to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse of a
variable expansion starts, and finding a sensible way to change the syntax
back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD sh, an
explicit stack of syntaxes is created for this, but that might be avoidable:
with slight modifications to what gets stored in the byte after
CTLVAR/CTLARI, it might be possible to go back through the parser output to
determine the syntax to revert to. I'll see if I can get that working.


Yes but that's overkill just to fix single quote within patterns.
We already support nested double-quotes in patterns correctly.  As
single quotes cannot nest, it should be an easy fix.


Single quotes indeed cannot nest, but you do need to reliably determine 
when a single quote is a special character, which gets very tricky very 
quickly with nested substitutions.


In "${x+''}", the ' is the literal ' character. In "${x#''}", the ' is a 
quote character. This part is easy, this part is just a matter of 
setting another variable when the parse of the substitution starts.


But in "${x+${y-}''}", the ' is the literal ' character. In 
"${x#${y-}''}", the ' is a quote character. This part is hard. If the 
above is done simply using another local variable, then the parse of the 
nested ${y-} would clobber that variable.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Herbert Xu
On Fri, Mar 02, 2018 at 05:05:46PM +0100, Harald van Dijk wrote:
>
> But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}",
> the ' is a quote character. This part is hard. If the above is done simply
> using another local variable, then the parse of the nested ${y-} would
> clobber that variable.

I don't see why that's hard.  You just need to remember whether
you're in a pattern context (i.e., after a %/%% or #/##).  If you
are, then you need to go back to basesyntax instead of dqsyntax
until the next right brace.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Denys Vlasenko
On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk  wrote:
> Currently:
>
> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
> <>
>
> This is what I expect, and also what bash, ksh and posh do.
>
> With your patch:
>
> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
> 


I was looking into this specific example and I believe it is a _bash_ bug.

The [a\]] is misinterpreted by it (and probably by many people).
The gist is: \] is not a valid escape for ] in set glob expression.
Glob sets have no escaping at all, ] can be in a set
if it is the first char: []abc],
dash can be in a set if it is first or last: [abc-],
[ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.

Therefore, "[a\]]" glob pattern means "a or \, then ]".
Since that does not match "a", the result of ${foo#[a\]]}> should be "a".
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 18:00, Denys Vlasenko wrote:

On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk  wrote:

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'




I was looking into this specific example and I believe it is a _bash_ bug.

The [a\]] is misinterpreted by it (and probably by many people).
The gist is: \] is not a valid escape for ] in set glob expression.
Glob sets have no escaping at all, ] can be in a set
if it is the first char: []abc],
dash can be in a set if it is first or last: [abc-],
[ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.

Therefore, "[a\]]" glob pattern means "a or \, then ]".
Since that does not match "a", the result of ${foo#[a\]]}> should be "a".


Are you sure about this? "Patterns Matching a Single Character"'s first 
paragraph contains "A  character shall escape the following 
character. The escaping  shall be discarded." The shell does 
this first. It removes the backslash (remembering that the following 
character is escaped) before it starts interpreting the result as a 
bracket expression, and so never gets to the point where the \ should be 
taken literally.


  case \] in [\]]) echo matched ;; esac

prints "matched" in all shells I can check.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dash bug: double-quoted "\" breaks glob protection for next char

2018-03-02 Thread Harald van Dijk

On 02/03/2018 17:33, Herbert Xu wrote:

On Fri, Mar 02, 2018 at 05:05:46PM +0100, Harald van Dijk wrote:


But in "${x+${y-}''}", the ' is the literal ' character. In "${x#${y-}''}",
the ' is a quote character. This part is hard. If the above is done simply
using another local variable, then the parse of the nested ${y-} would
clobber that variable.


I don't see why that's hard.  You just need to remember whether
you're in a pattern context (i.e., after a %/%% or #/##).  If you
are, then you need to go back to basesyntax instead of dqsyntax
until the next right brace.


Let me slightly modify my example so that the effect becomes different:

  "${x#"${y-*}"''}"

When the parser has processed

  "${x#"${y-

would the state be "in a pattern context", or "not in a pattern context"?

If the state is "in a pattern context", how do you prevent the next * 
from being taken as unquoted? It should be taken as quoted.


If it stores "not in a pattern context", and the parser is processing 
the last character in


  "${x#"${y-*}

how can it reset the state to "in a pattern context"? Where could this 
state be stored?


With arbitrarily deep nesting of variable expansions, I do not see how 
you can avoid requiring arbitrarily large state, which gets difficult 
with dash's non-recursive parser. Mind you, I do hope I'm missing 
something obvious here!


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html