Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Martijn Dekker
Op 26-03-18 om 17:38 schreef Harald van Dijk:
> And not by dash 0.5.4. Like I wrote, dash 0.5.5 had some bugs that were
> fixed in 0.5.6, which mostly restored the behaviour to match <0.5.5.

Ah, sorry. dash 0.5.4 and earlier don't compile on my system, so they
are not included in my conveniently accessible arsenal of test shells.

> As for my patches, that was by accident and doesn't work reliably. When
> the shell sees no metacharacters, pathname expansion is bypassed, and
> backslash isn't considered a metacharacter. Which got me to my original
> example of /de\v: there are no metacharacters in there, so the shell
> doesn't look to see if it matches anything. Which seems highly
> desirable: the shell shouldn't need to hit the file system for words not
> containing metacharacters. The only way then to get consistent behaviour
> is if the backslash is taken as quoted, so I'm not tempted to argue for
> the behaviour you're hoping for, sorry. :)

But 'case' never hits the file system. There may be a compelling reason
to differ from bash (and ignore the apparent POSIX requirement) when it
comes to pathname expansion, but I don't see one for 'case'.

Plus, expansions within 'case' are already treated differently: no field
splitting or generating of fields, no pathname expansion. And the
pattern matching behaviour is already different as well.

So if we're going to ignore what POSIX appears to require anyway, maybe
this behaviour does not really need to be consistent between 'case' and
pathname expansion.

You initially asked for "scenarios where it's important to treat an
expanded backslash as unquoted". So I gave you a use case involving a
shell function that does pattern matching with 'case', which needs this
functionality to match arbitrary strings without an expensive
workaround. I guess you don't think that use case is compelling enough?

> And remember, personal playground, lots of disclaimers about bugs.

Yes, I'm aware. As I indicated earlier, I'm now using it as my default
/usr/local/bin/dash to help you find those bugs.

> I suspected the intent for dash was to treat it as quoted, but I was
> hoping for verification.

Calling Herbert Xu, come in please...

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Harald van Dijk

On 26/03/2018 15:34, Martijn Dekker wrote:

Op 26-03-18 om 14:12 schreef Harald van Dijk:

On 26/03/2018 13:57, Martijn Dekker wrote:

I don't see any inconsistency. Expansions are consistently treated
differently within 'case' than outside it. Among other things,
expansions within 'case' are *not* subject to pathname expansion; it's
string pattern matching using glob patterns, which is something
completely different.


It's not something completely different. Pathname expansion is defined
in terms of pattern matching (the pattern matching used in e.g. case
statements), plus a specific set of differences. See 2.6.6 Pathname
Expansion:


After field splitting, if set -f is not in effect, each field in the
resulting command line shall be expanded using the algorithm described
in Pattern Matching Notation, qualified by the rules in Patterns Used
for Filename Expansion.


That specific set of differences, 2.13.3 Patterns Used for Filename
Expansion, doesn't include different treatment of backslashes.


I see your point now. You're absolutely right.

Hmmm...

If we backslash-escape a glob character, '?':

$ touch '_foo?bar_'
$ testshells -c 'p='\''*o\?b*'\''; printf %s $p'
The backslash is correctly honoured by:
bash 2.05b through git: _foo?bar_
dash 0.5.5.1: _foo?bar_
dash-hvdijk: _foo?bar_
zsh as sh: _foo?bar_
The backslash is *not* honoured by:
dash 0.5.6 through 0.5.9.1: *o\?b*
ksh93: *o\?b*
mksh/lksh: *o\?b*
yash -o posix: *o\?b*


And not by dash 0.5.4. Like I wrote, dash 0.5.5 had some bugs that were 
fixed in 0.5.6, which mostly restored the behaviour to match <0.5.5.



And if we backslash-escape a non-glob character, 'b':

$ touch '_foo?bar_'
$ testshells -c 'p='\''*o?\b*'\''; printf %s $p'
The backslash is correctly honoured by:
bash 2.05b through git: _foo?bar_
dash 0.5.5.1: _foo?bar_
dash-hvdijk: _foo?bar_
The backslash is *not* honoured by:
dash 0.5.6 through 0.5.9.1: *o\?b*
ksh93: *o\?b*
mksh/lksh: *o\?b*
yash -o posix: *o\?b*
zsh as sh: *o\?b*


Also not by dash 0.5.4.


Funny how these results are different from the results I get when doing
the same test with 'case' pattern matching. As you point out, they are
supposed to be subject to the same rules with some modifications *not*
including backslash parsing. So the results should at least be identical
for each shell.

So yes, dash is inconsistent. But given what POSIX says, I think dash
should probably go back to honouring the backslash for pathname
expansion as it did in 0.5.5.1 and does in your fork.

Maybe you should argue the case with the Austin Group. It would be nice
to get clarification on the issue.


I don't think 0.5.5 should be taken as the reference point for historic 
dash behaviour when older versions disagree with it as much as the newer 
ones.


As for my patches, that was by accident and doesn't work reliably. When 
the shell sees no metacharacters, pathname expansion is bypassed, and 
backslash isn't considered a metacharacter. Which got me to my original 
example of /de\v: there are no metacharacters in there, so the shell 
doesn't look to see if it matches anything. Which seems highly 
desirable: the shell shouldn't need to hit the file system for words not 
containing metacharacters. The only way then to get consistent behaviour 
is if the backslash is taken as quoted, so I'm not tempted to argue for 
the behaviour you're hoping for, sorry. :)


And remember, personal playground, lots of disclaimers about bugs. Don't 
consider it a fork, I don't treat it as a separate project, it's just 
that dash is a nice program for me to play around with. When I noticed 
the difference, that's what prompted me to ask the question in the first 
place. I suspected the intent for dash was to treat it as quoted, but I 
was hoping for verification.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Martijn Dekker
Op 26-03-18 om 14:12 schreef Harald van Dijk:
> On 26/03/2018 13:57, Martijn Dekker wrote:
>> I don't see any inconsistency. Expansions are consistently treated
>> differently within 'case' than outside it. Among other things,
>> expansions within 'case' are *not* subject to pathname expansion; it's
>> string pattern matching using glob patterns, which is something
>> completely different.
> 
> It's not something completely different. Pathname expansion is defined
> in terms of pattern matching (the pattern matching used in e.g. case
> statements), plus a specific set of differences. See 2.6.6 Pathname
> Expansion:
> 
>> After field splitting, if set -f is not in effect, each field in the
>> resulting command line shall be expanded using the algorithm described
>> in Pattern Matching Notation, qualified by the rules in Patterns Used
>> for Filename Expansion.
> 
> That specific set of differences, 2.13.3 Patterns Used for Filename
> Expansion, doesn't include different treatment of backslashes.

I see your point now. You're absolutely right.

Hmmm...

If we backslash-escape a glob character, '?':

$ touch '_foo?bar_'
$ testshells -c 'p='\''*o\?b*'\''; printf %s $p'
The backslash is correctly honoured by:
bash 2.05b through git: _foo?bar_
dash 0.5.5.1: _foo?bar_
dash-hvdijk: _foo?bar_
zsh as sh: _foo?bar_
The backslash is *not* honoured by:
dash 0.5.6 through 0.5.9.1: *o\?b*
ksh93: *o\?b*
mksh/lksh: *o\?b*
yash -o posix: *o\?b*

And if we backslash-escape a non-glob character, 'b':

$ touch '_foo?bar_'
$ testshells -c 'p='\''*o?\b*'\''; printf %s $p'
The backslash is correctly honoured by:
bash 2.05b through git: _foo?bar_
dash 0.5.5.1: _foo?bar_
dash-hvdijk: _foo?bar_
The backslash is *not* honoured by:
dash 0.5.6 through 0.5.9.1: *o\?b*
ksh93: *o\?b*
mksh/lksh: *o\?b*
yash -o posix: *o\?b*
zsh as sh: *o\?b*

Funny how these results are different from the results I get when doing
the same test with 'case' pattern matching. As you point out, they are
supposed to be subject to the same rules with some modifications *not*
including backslash parsing. So the results should at least be identical
for each shell.

So yes, dash is inconsistent. But given what POSIX says, I think dash
should probably go back to honouring the backslash for pathname
expansion as it did in 0.5.5.1 and does in your fork.

Maybe you should argue the case with the Austin Group. It would be nice
to get clarification on the issue.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Harald van Dijk

On 26/03/2018 13:57, Martijn Dekker wrote:

Op 26-03-18 om 12:30 schreef Harald van Dijk:

With the snipping it's not clear that I was specifically confused by the
inconsistency.

I had included another example:

   pat="/de\v"
   printf "%s\n" $pat

I can understand treating backslash as quoted, or treating it as
unquoted, but not quoted-unless-in-a-case-statement. What justifies this
one exception?


I don't see any inconsistency. Expansions are consistently treated
differently within 'case' than outside it. Among other things,
expansions within 'case' are *not* subject to pathname expansion; it's
string pattern matching using glob patterns, which is something
completely different.


It's not something completely different. Pathname expansion is defined 
in terms of pattern matching (the pattern matching used in e.g. case 
statements), plus a specific set of differences. See 2.6.6 Pathname 
Expansion:



After field splitting, if set -f is not in effect, each field in the resulting 
command line shall be expanded using the algorithm described in Pattern 
Matching Notation, qualified by the rules in Patterns Used for Filename 
Expansion.


That specific set of differences, 2.13.3 Patterns Used for Filename 
Expansion, doesn't include different treatment of backslashes.


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Martijn Dekker
Op 26-03-18 om 12:30 schreef Harald van Dijk:
> With the snipping it's not clear that I was specifically confused by the
> inconsistency.
> 
> I had included another example:
> 
>   pat="/de\v"
>   printf "%s\n" $pat
> 
> I can understand treating backslash as quoted, or treating it as
> unquoted, but not quoted-unless-in-a-case-statement. What justifies this
> one exception?

I don't see any inconsistency. Expansions are consistently treated
differently within 'case' than outside it. Among other things,
expansions within 'case' are *not* subject to pathname expansion; it's
string pattern matching using glob patterns, which is something
completely different.

- M.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Harald van Dijk

On 26/03/2018 11:34, Martijn Dekker wrote:

Op 25-03-18 om 22:56 schreef Harald van Dijk:

   case /dev in $pat) echo why ;; esac

Now, bash and dash say that the pattern does match -- they take the
backslash as unquoted, allowing it to escape the v. Most other shells
(bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as
quoted.

This doesn't make sense to me, and doesn't match historic practice:

[...]


With the snipping it's not clear that I was specifically confused by the 
inconsistency.


I had included another example:

  pat="/de\v"
  printf "%s\n" $pat

I can understand treating backslash as quoted, or treating it as 
unquoted, but not quoted-unless-in-a-case-statement. What justifies this 
one exception?


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backslashes in unquoted parameter expansions

2018-03-26 Thread Martijn Dekker
Op 25-03-18 om 22:56 schreef Harald van Dijk:
>   case /dev in $pat) echo why ;; esac
> 
> Now, bash and dash say that the pattern does match -- they take the
> backslash as unquoted, allowing it to escape the v. Most other shells
> (bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as
> quoted.
> 
> This doesn't make sense to me, and doesn't match historic practice:
[...]

POSIX says:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05
| In order from the beginning to the end of the case statement, each
| pattern that labels a compound-list shall be subjected to tilde
| expansion, parameter expansion, command substitution, and arithmetic
| expansion, and the result of these expansions shall be compared
| against the expansion of word, according to the rules described in
| Pattern Matching Notation (which also describes the effect of quoting
| parts of the pattern).

The way I read this, this clearly says that quoting in a pattern
(particularly backslash quoting, which is the only kind specified in
"Pattern Matching Notation") still needs to have the usual effect even
if the pattern results from one or more expansions. But I understand
there are differences of opinion about this. 

It's certainly true that few shells actually act this way, but dash is
one that does, as is Busybox ash -- and so is bash (for the most part;
see further on).

I think *not* acting this way is illogical. Why should 'case' parse glob
characters resulting from expansions, but not the backslashes that could
quote those glob characters? I can see no reason for that.

Note that quoting the expansion, as in
case /dev in "$pat") echo why ;; esac
does what you would expect: the pattern resulting from the expansion is
fully quoted. So with dash and bash you can easily and cleanly have it
either way, unlike with other shells.

(Note that yash, ksh93 and zsh-as-sh act half-baked: backslashes in
patterns resulting from expansions are accepted to quote glob characters
and backslashes themselves, but not any other character. AFAICT, that
behaviour doesn't conform to POSIX no matter which way you slice it.)

[...]
> or are there scenarios where it's important to treat an expanded
> backslash as unquoted?

Consider this function from modernish (simplified version):

match() {
case $1 in
( $2 ) ;;
( * ) return 1 ;;
esac
}

This allows doing:

if match STRING GLOBPATTERN; then

on every POSIX shell. Very convenient. Easier than 'case', especially if
you want to combine it like: command1 && match foo bar && command3, etc.
And the syntax is not an eyesore, finger-twister and spacing pitfall,
unlike that of '[['.

But consider this:

match 'a\bcd' 'a\?c*'

The '?' is escaped so shouldn't match. This correctly returns a negative
on dash, bash, ksh93, and zsh. It returns a false positive on yash and
mksh. (I haven't tested other shells like FreeBSD sh lately.) This means
on those shells you can't use a backslash to escape a glob character in
a pattern passed as a parameter.

And how about this:

match 'a\bcd' 'a\\bcd'

Same pattern as above. This correctly returns a positive on dash, bash,
ksh93, and zsh-as-sh; a false negative on the rest.

However, this:

match '? *xy' '??\*\x\y'

only correctly return a positive on bash and dash. That's because ksh93
and zsh-as-sh, for patterns resulting from expansions, only parse
backslash quoting for glob characters and the backslash itself, but not
for other characters.

On bash, there is a bug that breaks backslash quoting on match() if the
pattern contains a ^A (\001). So bash can't robustly use the simple
match() for arbitrary patterns. This is *mostly* fixed in the
development version; the fix is good enough for the simple match() to work.

Bottom line is, dash and Busybox ash (but not FreeBSD sh), as well as
the upcoming release version of bash, are currently the only shells that
can reliably use the plain, simple and fast match() above for arbitrary
patterns.

When running on other shells (as determined by an init-time feature test
using a simple match()), modernish match() detects one or more
backslashes in the pattern, and if it finds any, quotes the pattern
except for glob characters and backslashes, so it can safely be
'eval'-ed as a literal pattern. This workaround is effective, but was a
bitch to get right and is not exactly a performance winner.

So yeah, I'd like to keep dash the way it is, please :)

- Martijn
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Backslashes in unquoted parameter expansions

2018-03-25 Thread Harald van Dijk

Hi,

Consider

  pat="/de\v"
  printf "%s\n" $pat

All shells appear to be in agreement that the backslash is taken 
literally here. It's treated as quoted, even though $pat is unquoted.


Then,

  case /dev in $pat) echo why ;; esac

Now, bash and dash say that the pattern does match -- they take the 
backslash as unquoted, allowing it to escape the v. Most other shells 
(bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as 
quoted.


This doesn't make sense to me, and doesn't match historic practice: dash 
before v0.5.5 took this backslash as quoted too. v0.5.5 then had some 
other related bugs that were fixed in v0.5.6, but it looks like an 
accident that this was not restored to the v0.5.4 behaviour.


This comes from expand.c's memtodest:

if ((quotes & QUOTES_ESC) &&
((syntax[c] == CCTL) ||
 (((quotes & EXP_FULL) || syntax != BASESYNTAX) &&
  syntax[c] == CBACK)))
   USTPUTC(CTLESC, q);

This only escapes backslashes in field expansion context and in quoted 
string context. Should this simply be


if ((quotes & QUOTES_ESC) &&
((syntax[c] == CCTL) || (syntax[c] == CBACK)))
   USTPUTC(CTLESC, q);

or are there scenarios where it's important to treat an expanded 
backslash as unquoted?


Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html