Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-10-24 Thread Geoff Clare
Harald van Dijk  wrote, on 23 Oct 2019:
>
> On 22/10/2019 09:47, Geoff Clare wrote:
> >Good catch.  Since there is no reason for a user or application to
> >escape a slash with a backslash, I see no reason why this shouldn't be
> >made unspecified.

> I wanted to agree with this, especially since I felt it could be unspecified
> in the first place, but I found that not only is there a real reason for a
> backslash before a slash, it actually happens in practice...

During discussion in today's teleconference, it was also pointed out
that a reason applications might want to escape a slash with a backslash
is if they are being lazy and just escape every character to ensure
they are all treated literally.  Currently the standard requires this
to work.

So the consensus was to leave the bug 1234 resolution as-is. Shells
which don't meet the current requirement should fix their behaviour.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-10-24 Thread Robert Elz
Date:Thu, 24 Oct 2019 09:25:52 +0100
From:Harald van Dijk 
Message-ID:  <9c38ef54-adf6-9af5-0f98-1f3105526...@gigawatt.nl>

  | That is what almost all shells do, but not what POSIX specifies.

Not explicitly perhaps, but it is almost the only way to achieve the
effects that are required.   This is one of the perils of attempting
to be as general as possible in what is specified, rather than simply
saying exactly what should be done.

  | Any form of quoting stops tilde expansion before a quoted / is even 
  | seen.

By definition, yes, unless the / is all that is quoted, as in the
cases I indicated.

  | Making that work does not require preceding / with CTLESC: it 
  | would be embedded in CTLQUOTEMARK characters anyway, and CTLQUOTEMARK is 
  | enough to stop tilde expansion.

Yes, it would be, if it appeared.   It might when ~ expansion happens
in the ~'/' case, but not in the ~\/ case.   In any case this is all
just boring implementation detail, not really appropriate for this list,
we can discuss it more off list if you really want.

kre




Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-10-24 Thread Harald van Dijk

On 24/10/2019 08:16, Robert Elz wrote:

 Date:Thu, 24 Oct 2019 06:46:52 +0100
 From:Harald van Dijk 
 Message-ID:  <5d23eeba-1ac5-6574-d348-2a8b43f97...@gigawatt.nl>

   | This is currently well-defined. We are talking about changing the
   | standard to make it unspecified.

I am not sure about the first sentence ... my understanding, and the
way I have always seen it done, is that when doing filename expansion,
the first step is to separate the pathname at / characters (and quouting
is irrelevant, quoted or unquoted, any / divides the path into components).


That is what almost all shells do, but not what POSIX specifies. That 
difference is how this subthread started. It is apparently also not what 
glibc's glob() does.



[...]
If for some reason I wanted to use the libc glob rather than
the built in version (I assume to save some space, as with dynamic
linking, all of libc is there anyway, including its glob, so if the
internal code can be eliminated the shell will be fractionally smaller,
subtract the globbing code, but add the code to first set up the glob
args, and then deal with the results properly), anyway, if I were to
do that, I'd do it properly, and do the / separation first, then unpack
each component string, which would simply drop a terminating CTLESC,
and then put the results back together with /'s between the components.

That avoids the unspecified terminating \ in a component, or whether
a \/ sequence is meant to mean something different (like a file name
component containing a '/' or something else similar, and impossible).


That is a valid implementation approach, but the fact that there are 
valid implementation approaches that would avoid the problem with making 
\/ unspecified does not help deal with current applications that are 
already using \/.


If it were already unspecified, I would agree that this is a bug in dash 
and should be fixed in dash. But if it is not already unspecified, it is 
perfectly valid for dash to do what it currently does, there is nothing 
less proper in its implementation than there would be in yours.



ps:  (boring implementation detail follows)

   | However, special characters inside a single-/double-quoted string are
   | preceded by CTLESC as well. Slash counts as a special character

The latter is solely to deal with ~/ vs ~\/ (or ~'/' etc) - and the
CTLESC before it only happens in general (that is, in other contexts)
as preventing it is more expensive than allowing it to happen, as stray
useless CTLESC marks aren't supposed to affect anything - if they are
allowed to change or affect the results in any way at all (when they are
not needed to actually quote/escape something) then something is broken.
And that would simply be a bug.


Any form of quoting stops tilde expansion before a quoted / is even 
seen. Making that work does not require preceding / with CTLESC: it 
would be embedded in CTLQUOTEMARK characters anyway, and CTLQUOTEMARK is 
enough to stop tilde expansion.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-10-24 Thread Robert Elz
Date:Thu, 24 Oct 2019 06:46:52 +0100
From:Harald van Dijk 
Message-ID:  <5d23eeba-1ac5-6574-d348-2a8b43f97...@gigawatt.nl>

  | This is currently well-defined. We are talking about changing the 
  | standard to make it unspecified.

I am not sure about the first sentence ... my understanding, and the
way I have always seen it done, is that when doing filename expansion,
the first step is to separate the pathname at / characters (and quouting
is irrelevant, quoted or unquoted, any / divides the path into components).
(There is some special casing for the leading '/' causing the search to
start from "/" rather than ".").   After that each component now as a
separate string is treated as a unit, and either considered as a directory
(or the final segment when it is last) if it contains no magic chars, or
as a pattern if there are, in which case a set of matching names are located.

Whether the separation into components is done one component at a time,
with the work for that component done immediately after, or all at once
at the start, is irrelevant (doesn't affect the results).

Given that, what we have the way you have described it is a component
("dev\") that ends with a \ which escapes nothing, which as I reacall
it anyway, is an unspecified case.   When, in ash derived shells, it
is a CTLESC which escapes nothing, it is simply ignored (or is when
the bugs in the original code that occasionally could have it excaping
the terminating \0 and then wandering off into lala land are fixed).
But that is all just internal implementation noise, and should be
invisible externally.

If for some reason I wanted to use the libc glob rather than
the built in version (I assume to save some space, as with dynamic
linking, all of libc is there anyway, including its glob, so if the
internal code can be eliminated the shell will be fractionally smaller,
subtract the globbing code, but add the code to first set up the glob
args, and then deal with the results properly), anyway, if I were to
do that, I'd do it properly, and do the / separation first, then unpack
each component string, which would simply drop a terminating CTLESC,
and then put the results back together with /'s between the components.

That avoids the unspecified terminating \ in a component, or whether
a \/ sequence is meant to mean something different (like a file name
component containing a '/' or something else similar, and impossible).

kre

ps:  (boring implementation detail follows)

  | However, special characters inside a single-/double-quoted string are
  | preceded by CTLESC as well. Slash counts as a special character

The latter is solely to deal with ~/ vs ~\/ (or ~'/' etc) - and the
CTLESC before it only happens in general (that is, in other contexts)
as preventing it is more expensive than allowing it to happen, as stray
useless CTLESC marks aren't supposed to affect anything - if they are
allowed to change or affect the results in any way at all (when they are
not needed to actually quote/escape something) then something is broken.
And that would simply be a bug.




Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-10-23 Thread Harald van Dijk

On 24/10/2019 00:42, Robert Elz wrote:

 Date:Wed, 23 Oct 2019 20:58:08 +0100
 From:Harald van Dijk 
 Message-ID:  

   | That is, if a user runs
   |echo "/dev/"*
   | dash will call glob() with a pattern string of '\/dev\/*'.

Bizarre.   Why?


Some shell internals that you are already mostly familiar with: in its 
internal string representations, it has two escape mechanisms, CTLESC 
and CTLQUOTEMARK. Generally, CTLQUOTEMARK is used for single- and 
double-quoted strings, and CTLESC is used for backslash-escaping. 
However, special characters inside a single-/double-quoted string are 
preceded by CTLESC as well. Slash counts as a special character.


When generating the pattern for glob(), CTLQUOTEMARK is ignored, and any 
character preceded by CTLESC gets a backslash in the pattern. This is 
fine, since backslash-escaping characters that do not need it is 
perfectly valid and is specified to have no effect.


If I write
  echo \/\d\e\v\/*
instead, the glob() pattern changes to exactly that.


However, that is no reason to change anything (proposed or otherwise).
That dash can be configured to generate unspecified code is no-one's
business but theirs,


This is currently well-defined. We are talking about changing the 
standard to make it unspecified.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-10-23 Thread Robert Elz
Date:Wed, 23 Oct 2019 20:58:08 +0100
From:Harald van Dijk 
Message-ID:  

  | That is, if a user runs
  |echo "/dev/"*
  | dash will call glob() with a pattern string of '\/dev\/*'.

Bizarre.   Why?

However, that is no reason to change anything (proposed or otherwise).
That dash can be configured to generate unspecified code is no-one's
business but theirs, and as long as glibc() (or whatever other libc
they happen to use) doesn't change the way they treat \ before /
(and it being made unspecified in POSIX doesn't provide any reason
to do that, if anything, it justifies their choice .. which is probably
the right one anyway) then dash will keep on working fine, even if it
does continue to use unspecified code (or dash can fix its code, and
not escape the / there, just as they apparently don't escape the d e or v
and there will no longer be an issue).

kre



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-10-23 Thread Harald van Dijk

On 22/10/2019 09:47, Geoff Clare wrote:

Good catch.  Since there is no reason for a user or application to
escape a slash with a backslash, I see no reason why this shouldn't be
made unspecified.

I suggest adding the following to the bug 1234 resolution:

On page 2383 line 76261 section 2.13.3, append to item 1:

 If a  character is found following an unescaped 
 character, the behavior is unspecified.

(this wording style matches the left-square-bracket case in the middle
of the paragraph).


I wanted to agree with this, especially since I felt it could be 
unspecified in the first place, but I found that not only is there a 
real reason for a backslash before a slash, it actually happens in 
practice...


dash, when compiled to use libc's glob() for pathname expansion, will 
escape slashes in the pattern it sends to glob() if they were quoted in 
the shell word. That is, if a user runs


  echo "/dev/"*

dash will call glob() with a pattern string of '\/dev\/*'. I can see 
other applications doing the same in similar cases. Making unspecified 
indirect backslashes before slash in the shell itself is probably fine, 
but making unspecified backslashes before slash in glob() is probably a 
step too far.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-10-22 Thread Geoff Clare
Harald van Dijk  wrote, on 19 Oct 2019:
>
> On 23/09/2019 16:39, Austin Group Bug Tracker wrote:
> >--
> >  (0004564) geoffclare (manager) - 2019-09-23 15:39
> >  http://austingroupbugs.net/view.php?id=1234#c4564
> >--
> >Interpretation response
> >
> >[...]
> >If a pattern ends with an unescaped , the behavior is
> >unspecified.
> 
> Another problem here, one that already exists in the current wording:
> 
> For patterns used for filename expansion, in current shells that treat
> unescaped backslash as an escape character, behaviour is inconsistent when
> unescaped backslash appears before a forward slash.
> 
> Shells appear to typically implement 2.13.3 by effectively splitting strings
> on forward slashes, then interpreting each component as a pattern if needed.
> This results in some components ending with an unescaped backslash.
> 
[...]
> 
> However, 2.13.3 does not specify that strings are split on forward slashes,
> so by the description in the standard, these patterns do not end in an
> unescaped backslash. As such, I believe the current and proposed wording
> requires treating the backslash as escaping the slash, which is not supposed
> to have any effect. I believe the required behaviour is that indirect [a]\/b
> and a\/[b] both find a file named 'b' in a directory named 'a'.
> 
> Is my understanding correct? If so, should these patterns become unspecified
> as well, to allow current and older shell behaviour?

Good catch.  Since there is no reason for a user or application to
escape a slash with a backslash, I see no reason why this shouldn't be
made unspecified.

I suggest adding the following to the bug 1234 resolution:

On page 2383 line 76261 section 2.13.3, append to item 1:

If a  character is found following an unescaped 
character, the behavior is unspecified.

(this wording style matches the left-square-bracket case in the middle
of the paragraph).

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-10-19 Thread Harald van Dijk

On 23/09/2019 16:39, Austin Group Bug Tracker wrote:

--
  (0004564) geoffclare (manager) - 2019-09-23 15:39
  http://austingroupbugs.net/view.php?id=1234#c4564
--
Interpretation response

[...]
If a pattern ends with an unescaped , the behavior is
unspecified.


Another problem here, one that already exists in the current wording:

For patterns used for filename expansion, in current shells that treat 
unescaped backslash as an escape character, behaviour is inconsistent 
when unescaped backslash appears before a forward slash.


Shells appear to typically implement 2.13.3 by effectively splitting 
strings on forward slashes, then interpreting each component as a 
pattern if needed. This results in some components ending with an 
unescaped backslash.


In bash 5, an indirect [a]\/b will find a file named 'b' in a directory 
named 'a\', but an indirect a\/[b] will find a file named 'b' in a 
directory named 'a'. An indirect a\/b (which would not trigger pathname 
expansion under the new wording), meanwhile, found a/b in bash 5 up to 
patchlevel 2, then changed to a\/b as of patchlevel 3.


In bash 4, an indirect [a]\/b will not find any files, but an indirect 
a\/[b] will find a file named 'b' in a directory named 'a'.


In nbsh, an indirect [a]\/b will not find any files, but an indirect 
a\/[b] will find a file named 'b' in a directory named 'a\'. (In nbsh, 
whether backslash acts as an escape character is currently determined 
per pathname component, as mentioned earlier.)


However, 2.13.3 does not specify that strings are split on forward 
slashes, so by the description in the standard, these patterns do not 
end in an unescaped backslash. As such, I believe the current and 
proposed wording requires treating the backslash as escaping the slash, 
which is not supposed to have any effect. I believe the required 
behaviour is that indirect [a]\/b and a\/[b] both find a file named 'b' 
in a directory named 'a'.


Is my understanding correct? If so, should these patterns become 
unspecified as well, to allow current and older shell behaviour?


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-30 Thread Harald van Dijk

On 30/09/2019 12:00, Geoff Clare wrote:

Harald van Dijk  wrote, on 30 Sep 2019:

(As an aside, why is this exception limited to patterns used for filename
expansion? Existing practice is that it applies to all patterns:

   case [a in [*) echo match ;; *) echo no match ;; esac

This prints "no match" in bosh, dash, ksh, and pdksh and posh.)


Because it's not the exception you thought it was?  If the one stated
in 2.13.3 applied here, then it would apply to *[ as well as [*.


It does apply to *[ as well as [* in most of those shells.


That's surprising.


   case a[ in *[) echo match ;; *) echo no match ;; esac

This still prints "no match" in bosh, ksh and pdksh and posh, so the
exception in 2.13.3 should still apply here if it is intended to match
existing shell behaviour.


I think this should be raised as a separate bug. Since you are the
person who noticed it, do you want to be the one to submit the bug?


Okay, I will do so.

Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-30 Thread Geoff Clare
Harald van Dijk  wrote, on 30 Sep 2019:
> >>
> >>>If the pattern contains an open bracket ( '[' ) that does not introduce a 
> >>>bracket expression as in XBD RE Bracket Expression, it is unspecified 
> >>>whether other unquoted pattern matching characters within the same 
> >>>slash-delimited component of the pattern retain their special meanings or 
> >>>are treated as ordinary characters. For example, the pattern "a*[/b*" may 
> >>>match all filenames beginning with 'b' in the directory "a*[" or it may 
> >>>match all filenames beginning with 'b' in all directories with names 
> >>>beginning with 'a' and ending with '['.
> >>This is because the shell may have already committed to parsing it as an
> >>ordinary character as it was under the impression a bracket expression had
> >>started.
> >
> >If that was the reason, wouldn't it only apply to characters after the '['?
> >As per the example given - a*[/b* - it also applies to characters before it.
> 
> Huh, I missed that. You're right.
> 
> That raises another question: is it unspecified whether each individual
> pattern character is treated literally, or all at once? That is, it is clear
> that *[*/x* is permitted to list the file names starting with "x" in
> directory "*[*", and it is permitted to list the file names starting with
> "x" in directories with names containing "[", but is it also permitted to
> list the file names starting with "x" in directories with names ending in
> "[*"? All three behaviours can be found in current shells.

The unspecified behaviour applies to each character individually. In order
to require that they are all treated the same within a component, the
standard would have to say so explicitly.

> >>(As an aside, why is this exception limited to patterns used for filename
> >>expansion? Existing practice is that it applies to all patterns:
> >>
> >>   case [a in [*) echo match ;; *) echo no match ;; esac
> >>
> >>This prints "no match" in bosh, dash, ksh, and pdksh and posh.)
> >
> >Because it's not the exception you thought it was?  If the one stated
> >in 2.13.3 applied here, then it would apply to *[ as well as [*.
> 
> It does apply to *[ as well as [* in most of those shells.

That's surprising.

>   case a[ in *[) echo match ;; *) echo no match ;; esac
> 
> This still prints "no match" in bosh, ksh and pdksh and posh, so the
> exception in 2.13.3 should still apply here if it is intended to match
> existing shell behaviour.

I think this should be raised as a separate bug. Since you are the
person who noticed it, do you want to be the one to submit the bug?

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-30 Thread Harald van Dijk

On 30/09/2019 10:51, Geoff Clare wrote:

Harald van Dijk  wrote, on 28 Sep 2019:


On 23/09/2019 16:39, Austin Group Bug Tracker wrote:

--
  (0004564) geoffclare (manager) - 2019-09-23 15:39
  http://austingroupbugs.net/view.php?id=1234#c4564
--
[...]
For the shell only, it is unspecified whether or not a 
character inside a bracket expression preserves the literal value of the
following character.


I noticed now that the resolution of bug 1233 was to not change the rules
for how  is treated inside bracket expressions (yet), but this
does change it. This may be worth mentioning in bug 1233.

"inside a bracket expression" is probably too narrow, given the exception in
2.13.3:

   bs='\'
   set -- ?[$bs.*

2.13.3 states that despite the * not being part of a bracket expression, it
may be treated as an ordinary character:


If the pattern contains an open bracket ( '[' ) that does not introduce a bracket expression as in 
XBD RE Bracket Expression, it is unspecified whether other unquoted pattern matching characters 
within the same slash-delimited component of the pattern retain their special meanings or are 
treated as ordinary characters. For example, the pattern "a*[/b*" may match all filenames 
beginning with 'b' in the directory "a*[" or it may match all filenames beginning with 
'b' in all directories with names beginning with 'a' and ending with '['.

This is because the shell may have already committed to parsing it as an
ordinary character as it was under the impression a bracket expression had
started.


If that was the reason, wouldn't it only apply to characters after the '['?
As per the example given - a*[/b* - it also applies to characters before it.


Huh, I missed that. You're right.

That raises another question: is it unspecified whether each individual 
pattern character is treated literally, or all at once? That is, it is 
clear that *[*/x* is permitted to list the file names starting with "x" 
in directory "*[*", and it is permitted to list the file names starting 
with "x" in directories with names containing "[", but is it also 
permitted to list the file names starting with "x" in directories with 
names ending in "[*"? All three behaviours can be found in current shells.


(A perverse implementation could also list the file names starting with 
"x" in directories with names starting with "*[", but I cannot think of 
any legitimate reason why a shell would do that and I do not know any 
shell that does it.)



2.13.3 should be modified to also state that despite the indirect backslash
not being part of a bracket expression, it is also unspecified whether it
preserves the literal value of the following character here.


I was reading "unquoted pattern matching characters" as including backslash,
but it would make sense to spell it out by changing that bit to:

 ... whether other unquoted '*', '?', '[' or  characters
 within the same ...


That looks good to me.

I did not think of backslash as a pattern matching character. The term 
does not have a definition, as far as I know, and backslash is no longer 
one of the characters that triggers pattern matching under the proposed 
bug resolution, but I can see how it could be read differently.



appropriate.to:3. If a specified pattern contains
any '*', '?' or '[' characters that will be treated as special (see [xref
to 2.13.1]), it shall be matched against existing filenames and pathnames,
as appropriate.


How is this intended to interact with that exception in 2.13.3? For an
unquoted [* word, this may either unconditionally produce "[*", or it may
expand to the names of files starting with "[". In shells where it
unconditionally produces "[*", does that mean pathname expansion is not
performed, as none of the characters are treated as special? Or does
"unspecified" mean it may be treated as a pattern matching character when
determining whether pathname expansion is going to be performed, but then as
a literal character during the actual pathname expansion? This is relevant
if the [* is at the end of a larger pattern containing an indirect
backslash.


I would consider that to be a quality-of-implementation issue. A good
quality implementation would ensure consistency between the precheck
code to decide whether to match against pathnames and the actual pattern
matching code.  A poor quality one might not.


Okay.

The reason for asking was that depending on how certain unspecified 
aspects of pattern matching are handled, fully parsing bracket 
expressions, including character classes, equivalence classes and 
collating symbols in the precheck code is unnecessary, it is possible to 
limit the check to pathname components containing an unquoted ?, an 
unquoted *, or an unquoted [ that is later followed by an unquoted ] 
(with special exceptions for [] and [!]). Being able to leave the rest 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-30 Thread Geoff Clare
Harald van Dijk  wrote, on 28 Sep 2019:
>
> On 23/09/2019 16:39, Austin Group Bug Tracker wrote:
> >--
> >  (0004564) geoffclare (manager) - 2019-09-23 15:39
> >  http://austingroupbugs.net/view.php?id=1234#c4564
> >--
> >[...]
> >For the shell only, it is unspecified whether or not a 
> >character inside a bracket expression preserves the literal value of the
> >following character.
> 
> I noticed now that the resolution of bug 1233 was to not change the rules
> for how  is treated inside bracket expressions (yet), but this
> does change it. This may be worth mentioning in bug 1233.
> 
> "inside a bracket expression" is probably too narrow, given the exception in
> 2.13.3:
> 
>   bs='\'
>   set -- ?[$bs.*
> 
> 2.13.3 states that despite the * not being part of a bracket expression, it
> may be treated as an ordinary character:
> 
> >If the pattern contains an open bracket ( '[' ) that does not introduce a 
> >bracket expression as in XBD RE Bracket Expression, it is unspecified 
> >whether other unquoted pattern matching characters within the same 
> >slash-delimited component of the pattern retain their special meanings or 
> >are treated as ordinary characters. For example, the pattern "a*[/b*" may 
> >match all filenames beginning with 'b' in the directory "a*[" or it may 
> >match all filenames beginning with 'b' in all directories with names 
> >beginning with 'a' and ending with '['.
> This is because the shell may have already committed to parsing it as an
> ordinary character as it was under the impression a bracket expression had
> started.

If that was the reason, wouldn't it only apply to characters after the '['?
As per the example given - a*[/b* - it also applies to characters before it.

> 2.13.3 should be modified to also state that despite the indirect backslash
> not being part of a bracket expression, it is also unspecified whether it
> preserves the literal value of the following character here.

I was reading "unquoted pattern matching characters" as including backslash,
but it would make sense to spell it out by changing that bit to:

... whether other unquoted '*', '?', '[' or  characters
within the same ...

> >appropriate.to:3. If a specified pattern contains
> >any '*', '?' or '[' characters that will be treated as special (see [xref
> >to 2.13.1]), it shall be matched against existing filenames and pathnames,
> >as appropriate.
> 
> How is this intended to interact with that exception in 2.13.3? For an
> unquoted [* word, this may either unconditionally produce "[*", or it may
> expand to the names of files starting with "[". In shells where it
> unconditionally produces "[*", does that mean pathname expansion is not
> performed, as none of the characters are treated as special? Or does
> "unspecified" mean it may be treated as a pattern matching character when
> determining whether pathname expansion is going to be performed, but then as
> a literal character during the actual pathname expansion? This is relevant
> if the [* is at the end of a larger pattern containing an indirect
> backslash.

I would consider that to be a quality-of-implementation issue. A good
quality implementation would ensure consistency between the precheck
code to decide whether to match against pathnames and the actual pattern
matching code.  A poor quality one might not.

> (As an aside, why is this exception limited to patterns used for filename
> expansion? Existing practice is that it applies to all patterns:
> 
>   case [a in [*) echo match ;; *) echo no match ;; esac
> 
> This prints "no match" in bosh, dash, ksh, and pdksh and posh.)

Because it's not the exception you thought it was?  If the one stated
in 2.13.3 applied here, then it would apply to *[ as well as [*.

> Finally, just to clarify, with bs='\' and expanding [/$bs.] in a context
> where pathname expansion could be performed, it is my understanding that
> this is not a bracket expression, despite the word containing what would be
> a bracket expression when used in other contexts, therefore this would be
> required to expand to "[/\.]" regardless of the contents of the file system.
> Is that understanding correct?

Yes, as per 2.13.3 item 1, the '[' is treated as an ordinary character,
so it doesn't meet the "that will be treated as special" condition in
the bug 1234 resolution.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-27 Thread Harald van Dijk

On 23/09/2019 16:39, Austin Group Bug Tracker wrote:

--
  (0004564) geoffclare (manager) - 2019-09-23 15:39
  http://austingroupbugs.net/view.php?id=1234#c4564
--
[...]
For the shell only, it is unspecified whether or not a 
character inside a bracket expression preserves the literal value of the
following character.


I noticed now that the resolution of bug 1233 was to not change the 
rules for how  is treated inside bracket expressions (yet), 
but this does change it. This may be worth mentioning in bug 1233.


"inside a bracket expression" is probably too narrow, given the 
exception in 2.13.3:


  bs='\'
  set -- ?[$bs.*

2.13.3 states that despite the * not being part of a bracket expression, 
it may be treated as an ordinary character:



If the pattern contains an open bracket ( '[' ) that does not introduce a bracket expression as in 
XBD RE Bracket Expression, it is unspecified whether other unquoted pattern matching characters 
within the same slash-delimited component of the pattern retain their special meanings or are 
treated as ordinary characters. For example, the pattern "a*[/b*" may match all filenames 
beginning with 'b' in the directory "a*[" or it may match all filenames beginning with 
'b' in all directories with names beginning with 'a' and ending with '['.
This is because the shell may have already committed to parsing it as an 
ordinary character as it was under the impression a bracket expression 
had started.


2.13.3 should be modified to also state that despite the indirect 
backslash not being part of a bracket expression, it is also unspecified 
whether it preserves the literal value of the following character here.



appropriate.to:3. If a specified pattern contains
any '*', '?' or '[' characters that will be treated as special (see [xref
to 2.13.1]), it shall be matched against existing filenames and pathnames,
as appropriate.


How is this intended to interact with that exception in 2.13.3? For an 
unquoted [* word, this may either unconditionally produce "[*", or it 
may expand to the names of files starting with "[". In shells where it 
unconditionally produces "[*", does that mean pathname expansion is not 
performed, as none of the characters are treated as special? Or does 
"unspecified" mean it may be treated as a pattern matching character 
when determining whether pathname expansion is going to be performed, 
but then as a literal character during the actual pathname expansion? 
This is relevant if the [* is at the end of a larger pattern containing 
an indirect backslash.


(As an aside, why is this exception limited to patterns used for 
filename expansion? Existing practice is that it applies to all patterns:


  case [a in [*) echo match ;; *) echo no match ;; esac

This prints "no match" in bosh, dash, ksh, and pdksh and posh.)

Finally, just to clarify, with bs='\' and expanding [/$bs.] in a context 
where pathname expansion could be performed, it is my understanding that 
this is not a bracket expression, despite the word containing what would 
be a bracket expression when used in other contexts, therefore this 
would be required to expand to "[/\.]" regardless of the contents of the 
file system. Is that understanding correct?


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-26 Thread Harald van Dijk

On 26/09/2019 14:27, Joerg Schilling wrote:

Harald van Dijk  wrote:


Not the same way, but it could still be trivially fixed: instead of

as_echo='printf %s\n'

configure scripts could do

as_echo() { printf '%s\n' "$@"; }
as_echo=as_echo


Well, the problem with using printf instead of echo is that not all printf
implementations handle 'foo\0bar' correctly.


All of this is inside a conditional, that is the reason why a helper 
variable is used in the first place. as_echo='printf %s\n' is only 
executed if the configure script detects that on the current system, 
printf is good enough (it checks another printf bug though, not \0 
handling) and no better alternative (some shells' "print" builtin 
command) is available. The replacement using an as_echo function would 
be done in the exact same conditions.



Jörg




Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-26 Thread Joerg Schilling
Harald van Dijk  wrote:

> Not the same way, but it could still be trivially fixed: instead of
>
>as_echo='printf %s\n'
>
> configure scripts could do
>
>as_echo() { printf '%s\n' "$@"; }
>as_echo=as_echo

Well, the problem with using printf instead of echo is that not all printf 
implementations handle 'foo\0bar' correctly.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-25 Thread Harald van Dijk

On 25/09/2019 15:49, Geoff Clare wrote:

There are differences due to the version number change and there are
differences due to the build configuration being different.  I only
mentioned the build configuration in order to preempt a response
claiming that the differences between bash 3 and bash 4 were not
sufficient to justify treating them as different shells.


I still do not understand this, but seeing how little relevance this has 
to the discussion, I am okay with dropping this.



I see it as a separate decision whether to do matching against pathnames
or not.  If matching is done, the treatment of backslash is then the same
as in glob(), find, etc.  If matching is not done, the result is the
same as if matching had been done and no matching pathnames were found.


I guess that makes sense. I was still thinking from the perspective of 
pathname expansion always being performed (aside perhaps from fully 
quoted words) in theory, but optimised away by shells in practice, that 
POSIX currently describes.



Personally I would prefer the backslash-is-always-special option, but
breaking autoconf when a %sn file exists was enough for me to accept
the bash2/3/4 behaviour as a compromise.


Earlier you wrote "the likelihood of this causing problems is extremely
small". This applies here as well. How likely is it for a '%sn' file to
exist? Other than as a deliberate attempt to cause the configure script to
fail, that is, in which case it is doing exactly what the user wanted.


For '%sn' perhaps not very likely, but the fact that this case came
to light in a widely-used open source application means that other
similar cases are likely to exist in other open source applications
and in closed source applications, user's private scripts, etc.


Agreed that similar cases are likely to exist (both with backslashes and 
with other special characters), but are there any cases that we can 
expect to cause problems for users that do not specifically create files 
to break scripts? I suspect the answer to that is no.



Those scripts can be fixed simply by adding quoting.  The autoconf
problem with bash5 can't be fixed that way.


Not the same way, but it could still be trivially fixed: instead of

  as_echo='printf %s\n'

configure scripts could do

  as_echo() { printf '%s\n' "$@"; }
  as_echo=as_echo

Incidentally, this problematic use of $as_echo had already been dropped 
in autoconf more than five years ago, replaced by a direct printf '%s\n' 
without any helper variable, it's just that there has not been a new 
release of autoconf since then, so bash and other software never picked 
up that update. On the autoconf front, there is nothing that needs to be 
done to ensure compatibility with the bash 5 behaviour even in the 
presence of a %sn file aside from getting out a new release. This does 
not really help us today though.



All the problems of all approaches are corner cases that are unlikely to
cause real problems in practice.


And yet, as Stephane reports, there have been several bug reports
against bash5 because of the new behaviour.


Those bug reports are about the unfortunate interaction between this 
treatment of backslashes and non-standard options, not about problems 
with scripts written for POSIX sh or invoking bash in POSIX mode, from 
what I have seen. Yes, I can agree that it is a problem for bash that


  var='printf %s\n hello'
  $var

errors out when the failglob option is enabled, or drops the newline 
when the nullglob option is enabled. I can think of some possible ways 
to handle that, but unless POSIX adds these options, determining how 
they should interact with indirect backslashes should probably not be 
done on this list.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-25 Thread Geoff Clare
Harald van Dijk  wrote, on 25 Sep 2019:
>
> On 25/09/2019 10:22, Geoff Clare wrote:
> >Harald van Dijk  wrote, on 24 Sep 2019:
> >>
> Regardless, a single shell is not enough to say "most shells", not even if
> it is multiple versions of that single shell.
> >>>
> >>>I consider bash 4 on Linux and bash 3 on macOS to be different shells.
> >>>(Their build configuration is different.)
> >>
> >>I do not understand this logic. The build configuration does not differ in
> >>any way that is relevant to pathname expansion. Surely the NetBSD shell is
> >>not counted separately for each port listed on
> >>, so why is bash different?
> >
> >Their behaviour is sufficiently different (in areas other than pathname
> >expansion) to consider them to be different shells.  The same is true
> >for ksh88 and ksh93.
> 
> So it is just that bash 3 and bash 4 are significantly different and both
> versions are still used on current versions of operating systems, it is not
> about build configuration?

There are differences due to the version number change and there are
differences due to the build configuration being different.  I only
mentioned the build configuration in order to preempt a response
claiming that the differences between bash 3 and bash 4 were not
sufficient to justify treating them as different shells.

> >Okay, I see your point now.  When putting part of a pathname in a
> >variable you have to know how it is going to be used in order to know
> >how backslash will be handled.  But this is just one aspect of a wider
> >problem - e.g. you have to know if the variable will be quoted or not
> >when used, which applies to the backslash-is-always-special behaviour
> >as well.
> 
> The shell script author does not necessarily have full control over this,
> though. In $dir/$file, how $dir is treated depends on whether $file contains
> metacharacters, and vice versa. Quoted vs. unquoted is something the shell
> script author does have full control over, and it is easy to check in
> typical scripts that all uses of $dir are quoted, or that all uses of $dir
> are unquoted.

Okay, I guess this counts as an entry in the "cons" columns for the
bash2/3/4 behaviour then.  I'm sure Stephane and others will argue
that it is outweighed by the "cons" for the bash5 behaviour, and I'm
inclined to agree.

> >In any case I see this as a very minor issue.  Putting a whole pattern
> >in a variable is a rare thing to do.  Putting part in a variable and
> >part direct is even more rare.  Coupled with the fact that using
> >backslash in patterns (that you want to be expanded) is also rare, the
> >likelihood of this causing problems is extremely small.
> 
> Putting a pattern in a variable is not that rare. The rest probably is, but
> see below.
> 
> >I wrote the above before I had fully thought it through, and having slept
> >on it my preference is now much stronger, and I certainly would object to
> >specifying the NetBSD sh behaviour.  The reason is because treating
> >backslash differently in different components in indirect shell patterns
> >is inconsistent with direct shell patterns, glob(), find -path, and the
> >pax pattern operand, none of which vary their treatment of backslash
> >across different components of a pattern that contains slashes.
> 
> Likewise, none of them vary their treatment of backslash according to
> whether (other) metacharacters are present. If a file named 'x' exists,
> find . -name '\x' will find it, despite '\x' not containing any
> metacharacters. The proposed resolution already treats backslashes
> differently to how they are treated in glob(), find, etc.

I see it as a separate decision whether to do matching against pathnames
or not.  If matching is done, the treatment of backslash is then the same
as in glob(), find, etc.  If matching is not done, the result is the
same as if matching had been done and no matching pathnames were found.

> >Personally I would prefer the backslash-is-always-special option, but
> >breaking autoconf when a %sn file exists was enough for me to accept
> >the bash2/3/4 behaviour as a compromise.
> 
> Earlier you wrote "the likelihood of this causing problems is extremely
> small". This applies here as well. How likely is it for a '%sn' file to
> exist? Other than as a deliberate attempt to cause the configure script to
> fail, that is, in which case it is doing exactly what the user wanted.

For '%sn' perhaps not very likely, but the fact that this case came
to light in a widely-used open source application means that other
similar cases are likely to exist in other open source applications
and in closed source applications, user's private scripts, etc.

> If you do think that is a problem, it is already a problem regardless of how
> backslash is handled in existing scripts, which pass URLs with query strings
> unquoted to curl or wget. That is, if a script contains
> 
>   curl https://some.site/path?name=value
> 
> you can break that 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-25 Thread Stephane Chazelas
2019-09-25 10:22:07 +0100, Geoff Clare:
[...]
> I wrote the above before I had fully thought it through, and having slept
> on it my preference is now much stronger, and I certainly would object to
> specifying the NetBSD sh behaviour.  The reason is because treating
> backslash differently in different components in indirect shell patterns
> is inconsistent with direct shell patterns, glob(), find -path, and the
> pax pattern operand, none of which vary their treatment of backslash
> across different components of a pattern that contains slashes.

For the record, I agree the NetBSD 8.1 sh behaviour is
undesirable (I believe I made and expanded that case earlier)

[...]
> Personally I would prefer the backslash-is-always-special option, but
> breaking autoconf when a %sn file exists was enough for me to accept
> the bash2/3/4 behaviour as a compromise.
[...]

Note that the new bash5 behaviour has already been the subject
of several bug reports on the bash mailing list, not so
much about the type of case where a %sn exists as those are
dormant kind of issues that are hard to detect, but because it
becomes much more visible when the nullglob or failglob options
are enabled.

As in:

$ NL='\n' bash5 -O failglob -O xpg_echo -c 'echo $NL'
bash5: no match: \n
$ touch n
$ NL='\n' bash5 -O failglob -O xpg_echo -c 'echo $NL'
n

(and yes, one should use printf '%s\n' "$NL", not echo $NL, but
unfortunately not many people are aware that echo mustn't be
used or that parameter expansions must always be quoted in list
contexts even the bash documentation and the POSIX standard text
make those mistakes).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-25 Thread Harald van Dijk

On 25/09/2019 10:22, Geoff Clare wrote:

Harald van Dijk  wrote, on 24 Sep 2019:



Regardless, a single shell is not enough to say "most shells", not even if
it is multiple versions of that single shell.


I consider bash 4 on Linux and bash 3 on macOS to be different shells.
(Their build configuration is different.)


I do not understand this logic. The build configuration does not differ in
any way that is relevant to pathname expansion. Surely the NetBSD shell is
not counted separately for each port listed on
, so why is bash different?


Their behaviour is sufficiently different (in areas other than pathname
expansion) to consider them to be different shells.  The same is true
for ksh88 and ksh93.


So it is just that bash 3 and bash 4 are significantly different and 
both versions are still used on current versions of operating systems, 
it is not about build configuration?



Okay, I see your point now.  When putting part of a pathname in a
variable you have to know how it is going to be used in order to know
how backslash will be handled.  But this is just one aspect of a wider
problem - e.g. you have to know if the variable will be quoted or not
when used, which applies to the backslash-is-always-special behaviour
as well.


The shell script author does not necessarily have full control over 
this, though. In $dir/$file, how $dir is treated depends on whether 
$file contains metacharacters, and vice versa. Quoted vs. unquoted is 
something the shell script author does have full control over, and it is 
easy to check in typical scripts that all uses of $dir are quoted, or 
that all uses of $dir are unquoted.



In any case I see this as a very minor issue.  Putting a whole pattern
in a variable is a rare thing to do.  Putting part in a variable and
part direct is even more rare.  Coupled with the fact that using
backslash in patterns (that you want to be expanded) is also rare, the
likelihood of this causing problems is extremely small.


Putting a pattern in a variable is not that rare. The rest probably is, 
but see below.



I wrote the above before I had fully thought it through, and having slept
on it my preference is now much stronger, and I certainly would object to
specifying the NetBSD sh behaviour.  The reason is because treating
backslash differently in different components in indirect shell patterns
is inconsistent with direct shell patterns, glob(), find -path, and the
pax pattern operand, none of which vary their treatment of backslash
across different components of a pattern that contains slashes.


Likewise, none of them vary their treatment of backslash according to 
whether (other) metacharacters are present. If a file named 'x' exists,
find . -name '\x' will find it, despite '\x' not containing any 
metacharacters. The proposed resolution already treats backslashes 
differently to how they are treated in glob(), find, etc.



Personally I would prefer the backslash-is-always-special option, but
breaking autoconf when a %sn file exists was enough for me to accept
the bash2/3/4 behaviour as a compromise.


Earlier you wrote "the likelihood of this causing problems is extremely 
small". This applies here as well. How likely is it for a '%sn' file to 
exist? Other than as a deliberate attempt to cause the configure script 
to fail, that is, in which case it is doing exactly what the user wanted.


If you do think that is a problem, it is already a problem regardless of 
how backslash is handled in existing scripts, which pass URLs with query 
strings unquoted to curl or wget. That is, if a script contains


  curl https://some.site/path?name=value

you can break that script by creating a 'https:' directory, a 
'some.site' directory in that, and a 'pathXname=value' file in that. 
This is not hypothetical, I have seen multiple scripts that did this. I 
have seen that they did this because I was experimenting with bash's 
failglob option, which of course reported it as not matching anything.


We are not changing the shell semantics to say that pathname expansion 
is no longer performed on words that look like URLs, we just accept that 
this is technically a bug in those scripts, but that it is a bug that is 
so unlikely to cause real problems that for practical purposes we can 
ignore it.


All the problems of all approaches are corner cases that are unlikely to 
cause real problems in practice.



I agree there's a problem.  The proposed wording implies that the indirect
backslash escapes the shell-quoting backslash.

Here's suggestion for how to fix that in the 1st bullet in 2.13.1:

 A  character that is not inside a bracket expression
 shall preserve the literal value of the following character, unless
 the following character is in a part of the pattern where shell
 quoting can be used and is a shell quoting character, in which case
 the behavior is unspecified.

It says the behaviour is unspecified because it seems to cause 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-25 Thread Geoff Clare
Harald van Dijk  wrote, on 24 Sep 2019:
>
> >>Regardless, a single shell is not enough to say "most shells", not even if
> >>it is multiple versions of that single shell.
> >
> >I consider bash 4 on Linux and bash 3 on macOS to be different shells.
> >(Their build configuration is different.)
> 
> I do not understand this logic. The build configuration does not differ in
> any way that is relevant to pathname expansion. Surely the NetBSD shell is
> not counted separately for each port listed on
> , so why is bash different?

Their behaviour is sufficiently different (in areas other than pathname
expansion) to consider them to be different shells.  The same is true
for ksh88 and ksh93.

> >>>I think the way the bug 1234 resolution specifies it (as per bash2/3/4) is
> >>>more straightforward and easier for users to understand. It's a simple 
> >>>binary
> >>>choice: either matching against existing pathnames is performed or it 
> >>>isn't.
> >>>If it is performed, all special pattern-matching characters, including
> >>>backslash, have their special meaning in all components of the pattern.
> >>
> >>You get situations where $dir/file1 and $dir/file2 name two files, but
> >>$dir/file[12] cannot be used to match them both in a single word, though.
> >
> >Can you be more specific?  Perhaps I'm missing something obvious, but
> >I can't think of a case that *cannot* be matched somehow.  E.g. to match
> >a backslash in a filename you can use [\\].
> 
> If dir='\x' and files 'x/file1', 'x/file2', '\x/file1', and '\x/file2' all
> exist, then under the proposed wording, $dir/file1 and $dir/file2 name the
> latter two files, but when you try to combine them in $dir/file[12], the
> meaning changes to that it names the former two. Yes, the value stored in
> the dir variable can be modified to avoid this inconsistency, but that does
> not change that there is an inconsistency in the shell's pathname expansion.
> 
> If indirect \ is always treated as literal, all would match the latter two
> files.
> 
> If indirect \ is always treated as a metacharacter, all would match the
> former two files.
> 
> If indirect \ is determined per pathname component, all would match the
> latter two files.

Okay, I see your point now.  When putting part of a pathname in a
variable you have to know how it is going to be used in order to know
how backslash will be handled.  But this is just one aspect of a wider
problem - e.g. you have to know if the variable will be quoted or not
when used, which applies to the backslash-is-always-special behaviour
as well.

In any case I see this as a very minor issue.  Putting a whole pattern
in a variable is a rare thing to do.  Putting part in a variable and
part direct is even more rare.  Coupled with the fact that using
backslash in patterns (that you want to be expanded) is also rare, the
likelihood of this causing problems is extremely small.

> >>That is a problem that NetBSD sh does not have, and one that the
> >>alternatives of always treating indirect \ as literal or never doing so also
> >>do not have.
> >
> >The three choices that were considered were always treating indirect \
> >as literal, never treating it as literal, or the middle option that has
> >been in use for many years in bash2/3/4.  Given the problems with the
> >first two choices that were discussed at great length on the mailing
> >list, the middle option was felt to be the one that has the best chance
> >of achieving consensus.
> 
> For completeness, problems with the middle option were also discussed at
> great length on the mailing list.
> 
> >I must have overlooked the fact that NetBSD sh behaves slightly differently
> >in among the deluge of emails. (I note that Stephane says he did mention it).

On reflection, I think that the reason I haven't been considering the
NetBSD sh behaviour as an option is because kre said at one point in the
discussion that he would change it after the NetBSD 9 release.

Presumably he is waiting to see how bug 1234 is resolved before deciding
how to change it.

> >I have a slight preference for the bash2/3/4 behaviour for the reasons I
> >stated above, but would not object if others would prefer it.

I wrote the above before I had fully thought it through, and having slept
on it my preference is now much stronger, and I certainly would object to
specifying the NetBSD sh behaviour.  The reason is because treating
backslash differently in different components in indirect shell patterns
is inconsistent with direct shell patterns, glob(), find -path, and the
pax pattern operand, none of which vary their treatment of backslash
across different components of a pattern that contains slashes.

> You probably know my position, which is that it is too late for POSIX to
> change the requirements after shells had started implementing the previously
> required behaviour that under the new wording would no longer be permitted,
> and that aside from that it complicates the shell 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Harald van Dijk

On 24/09/2019 15:32, Geoff Clare wrote:

Harald van Dijk  wrote, on 24 Sep 2019:

There is a reason I wrote "in its current version". I do not think it is
reasonable to describe bash 4 behaviour that had already been changed when
this was being discussed as "existing practice".


Bash 3 on macOS counts as a "current version" shell since it comes
preinstalled as /bin/sh.


On a system that conforms to an old version of POSIX, but okay, it is 
still the current version of the OS, that is fair enough.



Regardless, a single shell is not enough to say "most shells", not even if
it is multiple versions of that single shell.


I consider bash 4 on Linux and bash 3 on macOS to be different shells.
(Their build configuration is different.)


I do not understand this logic. The build configuration does not differ 
in any way that is relevant to pathname expansion. Surely the NetBSD 
shell is not counted separately for each port listed on 
, so why is bash different?



I think the way the bug 1234 resolution specifies it (as per bash2/3/4) is
more straightforward and easier for users to understand. It's a simple binary
choice: either matching against existing pathnames is performed or it isn't.
If it is performed, all special pattern-matching characters, including
backslash, have their special meaning in all components of the pattern.


You get situations where $dir/file1 and $dir/file2 name two files, but
$dir/file[12] cannot be used to match them both in a single word, though.


Can you be more specific?  Perhaps I'm missing something obvious, but
I can't think of a case that *cannot* be matched somehow.  E.g. to match
a backslash in a filename you can use [\\].


If dir='\x' and files 'x/file1', 'x/file2', '\x/file1', and '\x/file2' 
all exist, then under the proposed wording, $dir/file1 and $dir/file2 
name the latter two files, but when you try to combine them in 
$dir/file[12], the meaning changes to that it names the former two. Yes, 
the value stored in the dir variable can be modified to avoid this 
inconsistency, but that does not change that there is an inconsistency 
in the shell's pathname expansion.


If indirect \ is always treated as literal, all would match the latter 
two files.


If indirect \ is always treated as a metacharacter, all would match the 
former two files.


If indirect \ is determined per pathname component, all would match the 
latter two files.



That is a problem that NetBSD sh does not have, and one that the
alternatives of always treating indirect \ as literal or never doing so also
do not have.


The three choices that were considered were always treating indirect \
as literal, never treating it as literal, or the middle option that has
been in use for many years in bash2/3/4.  Given the problems with the
first two choices that were discussed at great length on the mailing
list, the middle option was felt to be the one that has the best chance
of achieving consensus.


For completeness, problems with the middle option were also discussed at 
great length on the mailing list.



I must have overlooked the fact that NetBSD sh behaves slightly differently
in among the deluge of emails. (I note that Stephane says he did mention it).
I have a slight preference for the bash2/3/4 behaviour for the reasons I
stated above, but would not object if others would prefer it.


You probably know my position, which is that it is too late for POSIX to 
change the requirements after shells had started implementing the 
previously required behaviour that under the new wording would no longer 
be permitted, and that aside from that it complicates the shell language 
for very little benefit. I had already implemented never treating 
indirect \ as literal and would have no issue implementing always 
treating it as literal, but if it needs to be done sometimes under 
hard-to-describe conditions, I will wait and see what other shells do 
before changing mine. I take no position at this time on whether the 
bash 4 vs. NetBSD sh behaviour is preferable, I see issues in both.


There is a corner case though to handle in the wording with any indirect 
backslash that is not taken literally. (There are probably other corner 
cases as well.) Assume for the purposes of the next paragraphs that the 
examples are in contexts where pathname expansion is not bypassed, so 
part of a larger pattern containing metacharacters if needed.


Given a='*' b='\', it is not fully clear to me whether $b$a and $b* are 
supposed to expand the same way by the proposed wording. This is not 
covered by "If a pattern ends with an unescaped , the 
behavior is unspecified." since the backslash does not end the pattern. 
I think these are supposed to expand the same way, as the backslash 
itself appears in a context where "a shell-quoting  cannot be 
used to preserve the literal value of a character", regardless of 
whether the asterisk does.


If that is correct, then $b\* becomes problematic, as 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Stephane Chazelas
2019-09-24 15:53:07 +0100, Geoff Clare:
[...]
> to:
> 
> 3. If a specified pattern contains any '*', '?' or '[' characters
>that will be treated as special (see [xref to 2.13.1]), it shall
>be matched against existing filenames and pathnames, as
>appropriate. Each component that contains any such characters
>shall require read permission in the directory containing that
>component. Each component that contains a  that will
>be treated as special may require read permission in the
>directory containing that component.  Any component, except the 
>last, that does not contain any '*', '?', or '[' characters that
>will be treated as special shall require search permission.
[...]

Looks good. Note also the GLOB_MARK flag of glob() as discussed
in a separate thread (and the "markdirs" option of some shells)
that introduces a difference.

Without GLOB_MARK, you don't need search permission to x in x/*,
but with GLOB_MARK, you *may* need it if readdir() doesn't
return type information and you have to resort to lstat(), or
if the "Each pathname that is a directory that matches pattern
shall have a  appended" also includes symlinks to
directories (which I believe it does) in which case
implementations do need to call stat() at least on the entries
that readdir() reports as being symlinks.

The search permission would be needed to be able to mark the
entries of type directory, it's not clear what should happen if
the type of the file can't be determined (if the entries should
be left unmarked or not included for instance). The reference to
stat() in the glob() specification suggests the errors should be
reported with GLOB_ERR in any case.

(in anycase, to address that, we don't need to touch 2.13, it's
better addressed in the glob() spec).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Geoff Clare
Stephane Chazelas  wrote, on 24 Sep 2019:
>
> In:
> 
> */x, the shell only needs search access to all the directories
> in the current directory (it will typically attempt a
> lstat(dir/x) for each of them. (And you need search and read
> access to the current directory)
> 
> In x/*, you need search access to the current directory but no
> read access. The shell is not looking for a file matching "x" in
> all the entries in the current directory, it's directly opening
> x (and then looping through all its entries to find files
> matching "*" (for which it needs read but not search
> permission). (note that it's different when the nocaseglob
> non-standard option of some shells is used).
> 
> There's no difference for a litteral */'*' or '*'/* (or */\* and
> \*/*), but now there's the question of:
> 
> */$var and $var/* where var='\*'. Several of those shells that
> implement a \ globbing operator treat it like the other glob
> operators in that regard and trigger a "look for files matching
> the pattern in the content of the directory" even if the path
> component that contains an unquoted \ doesn't contain unquoted
> unescaped ?*[, so in effect the */$var and $var/* expansion is
> done the same way as if $var contained [*].

The current wording in 2.13.3 is rather vague here.  It says

Each component that contains a pattern character shall require
read permission in the directory containing that component.

It's not clear whether backslash counts as a "pattern character", and
even whether quoted *, ? and [..] still count as pattern characters
(which obviously isn't intended).

I'd suggest incorporating a change to that text in with the current
2.13.3 change, i.e.:

On page 2384 line 76271 section 2.13.3, change:

3. Specified patterns shall be matched against existing filenames
   and pathnames, as appropriate. Each component that contains a
   pattern character shall require read permission in the
   directory containing that component. Any component, except the
   last, that does not contain a pattern character shall require
   search permission.

to:

3. If a specified pattern contains any '*', '?' or '[' characters
   that will be treated as special (see [xref to 2.13.1]), it shall
   be matched against existing filenames and pathnames, as
   appropriate. Each component that contains any such characters
   shall require read permission in the directory containing that
   component. Each component that contains a  that will
   be treated as special may require read permission in the
   directory containing that component.  Any component, except the 
   last, that does not contain any '*', '?', or '[' characters that
   will be treated as special shall require search permission.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Geoff Clare
Harald van Dijk  wrote, on 24 Sep 2019:
>
> On 24/09/2019 12:24, Geoff Clare wrote:
> >Harald van Dijk  wrote, on 24 Sep 2019:
> >>
> >>>2. Existing practice in most shells that do treat backslash as special in
> >>>"indirect" patterns in pathname expansions is only to match patterns
> >>>against existing pathnames if the pattern includes a '*', '?' or '[' that
> >>>is treated as special.  This prevents accidental removal of backslash
> >>
> >>I cannot find any shell that behaves this way in its current version. Can
> >>you share some details about what shells you tested?
> >
> >Bash 4 on Linux and bash 3 on macOS (I recall someone saying that bash 2
> >also behaved the same way).
> 
> There is a reason I wrote "in its current version". I do not think it is
> reasonable to describe bash 4 behaviour that had already been changed when
> this was being discussed as "existing practice".

Bash 3 on macOS counts as a "current version" shell since it comes
preinstalled as /bin/sh.

> Regardless, a single shell is not enough to say "most shells", not even if
> it is multiple versions of that single shell.

I consider bash 4 on Linux and bash 3 on macOS to be different shells.
(Their build configuration is different.)

> >I think the way the bug 1234 resolution specifies it (as per bash2/3/4) is
> >more straightforward and easier for users to understand. It's a simple binary
> >choice: either matching against existing pathnames is performed or it isn't.
> >If it is performed, all special pattern-matching characters, including
> >backslash, have their special meaning in all components of the pattern.
> 
> You get situations where $dir/file1 and $dir/file2 name two files, but
> $dir/file[12] cannot be used to match them both in a single word, though.

Can you be more specific?  Perhaps I'm missing something obvious, but
I can't think of a case that *cannot* be matched somehow.  E.g. to match
a backslash in a filename you can use [\\].

> That is a problem that NetBSD sh does not have, and one that the
> alternatives of always treating indirect \ as literal or never doing so also
> do not have.

The three choices that were considered were always treating indirect \
as literal, never treating it as literal, or the middle option that has
been in use for many years in bash2/3/4.  Given the problems with the
first two choices that were discussed at great length on the mailing
list, the middle option was felt to be the one that has the best chance
of achieving consensus.

I must have overlooked the fact that NetBSD sh behaves slightly differently
in among the deluge of emails. (I note that Stephane says he did mention it).
I have a slight preference for the bash2/3/4 behaviour for the reasons I
stated above, but would not object if others would prefer it.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Stephane Chazelas
2019-09-24 12:24:43 +0100, Geoff Clare:
[...]
> > In NetBSD sh, backslash is given this special treatment only if the current
> > pathname component of the pattern includes a metacharacter. That is, an
> > indirect /de\v/nul[l] does not find /dev/null, but an indirect /d[e]\v/null
> > does.
> 
> That's a subtlety that I don't think had been raised before.
[...]

I did raise it a few times

That was point [10] at
https://www.mail-archive.com/austin-group-l@opengroup.org/msg04136.html
for instance.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Stephane Chazelas
2019-09-24 09:46:27 +0100, Geoff Clare:
[...]
> > Regardless, the above question applies in
> >
[...]
> var='\*'
[...]
> > printf '%s\n' */$var
> >
> > Or
> >
> > printf '%s\n' $var/*
>
> Those both have a * that will be treated as special, so matching
> against existing files is performed.  The permission requirements in
> these two cases are the same as if you had used var=foo.
>
> > (where there are variations among the shells that implement that
> > second meaning of backslash).
>
> Do you mean you see variation in those last two cases, or only in
> the first case, where bash 4 and earlier do what's specified in the
> bug 1234 resolution but in bash 5, printf "%s\n" $var outputs *
> when a * file exists?
[...]

Sorry for not being clear here.

In:

*/x, the shell only needs search access to all the directories
in the current directory (it will typically attempt a
lstat(dir/x) for each of them. (And you need search and read
access to the current directory)

In x/*, you need search access to the current directory but no
read access. The shell is not looking for a file matching "x" in
all the entries in the current directory, it's directly opening
x (and then looping through all its entries to find files
matching "*" (for which it needs read but not search
permission). (note that it's different when the nocaseglob
non-standard option of some shells is used).

There's no difference for a litteral */'*' or '*'/* (or */\* and
\*/*), but now there's the question of:

*/$var and $var/* where var='\*'. Several of those shells that
implement a \ globbing operator treat it like the other glob
operators in that regard and trigger a "look for files matching
the pattern in the content of the directory" even if the path
component that contains an unquoted \ doesn't contain unquoted
unescaped ?*[, so in effect the */$var and $var/* expansion is
done the same way as if $var contained [*].

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Harald van Dijk

On 24/09/2019 12:24, Geoff Clare wrote:

Harald van Dijk  wrote, on 24 Sep 2019:



2. Existing practice in most shells that do treat backslash as special in
"indirect" patterns in pathname expansions is only to match patterns
against existing pathnames if the pattern includes a '*', '?' or '[' that
is treated as special.  This prevents accidental removal of backslash


I cannot find any shell that behaves this way in its current version. Can
you share some details about what shells you tested?


Bash 4 on Linux and bash 3 on macOS (I recall someone saying that bash 2
also behaved the same way).


There is a reason I wrote "in its current version". I do not think it is 
reasonable to describe bash 4 behaviour that had already been changed 
when this was being discussed as "existing practice".


Regardless, a single shell is not enough to say "most shells", not even 
if it is multiple versions of that single shell.



I think the way the bug 1234 resolution specifies it (as per bash2/3/4) is
more straightforward and easier for users to understand. It's a simple binary
choice: either matching against existing pathnames is performed or it isn't.
If it is performed, all special pattern-matching characters, including
backslash, have their special meaning in all components of the pattern.


You get situations where $dir/file1 and $dir/file2 name two files, but 
$dir/file[12] cannot be used to match them both in a single word, 
though. That is a problem that NetBSD sh does not have, and one that the 
alternatives of always treating indirect \ as literal or never doing so 
also do not have.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Geoff Clare
Harald van Dijk  wrote, on 24 Sep 2019:
>
> >2. Existing practice in most shells that do treat backslash as special in
> >"indirect" patterns in pathname expansions is only to match patterns
> >against existing pathnames if the pattern includes a '*', '?' or '[' that
> >is treated as special.  This prevents accidental removal of backslash
> 
> I cannot find any shell that behaves this way in its current version. Can
> you share some details about what shells you tested?

Bash 4 on Linux and bash 3 on macOS (I recall someone saying that bash 2
also behaved the same way).

> In bash and in my shell, backslash is always given this treatment regardless
> of whether any other special characters are present.
> 
> In NetBSD sh, backslash is given this special treatment only if the current
> pathname component of the pattern includes a metacharacter. That is, an
> indirect /de\v/nul[l] does not find /dev/null, but an indirect /d[e]\v/null
> does.

That's a subtlety that I don't think had been raised before.

I think it still qualifies as a shell which only matches patterns against
existing pathnames if the pattern includes a '*', '?' or '[' that is
treated as special.  The subtlety is then in how backslash is interpreted
in the various components after the decision to perform matching has been
made.  (I.e. in the /de\v/nul[l] case the matching operation was performed,
but nothing was found because it was looking for a directory called de\v
instead of one called dev.)

I think the way the bug 1234 resolution specifies it (as per bash2/3/4) is
more straightforward and easier for users to understand. It's a simple binary
choice: either matching against existing pathnames is performed or it isn't.
If it is performed, all special pattern-matching characters, including
backslash, have their special meaning in all components of the pattern.

If kre reads this perhaps he could comment on whether he plans to change
NetBSD sh to behave as per the bug 1234 resolution (if/when it becomes an
approved interpretation).

> In zsh, backslash is given this special treatment only if the next character
> is special, so an indirect [a]\? matches a?, but an indirect [a]\b does not
> match ab.

That seems highly inconsistent with the treatment of backslash in all the
other utilities and functions that do pattern matching.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Harald van Dijk

On 23/09/2019 16:39, Austin Group Bug Tracker wrote:
[...]

--
  (0004564) geoffclare (manager) - 2019-09-23 15:39
  http://austingroupbugs.net/view.php?id=1234#c4564
--

[...]

Rationale:
-
1. Although existing practice in some shells is not to treat backslash as


This should say "most shells" or even "almost all shells", not just 
"some shells".


[...]


2. Existing practice in most shells that do treat backslash as special in
"indirect" patterns in pathname expansions is only to match patterns
against existing pathnames if the pattern includes a '*', '?' or '[' that
is treated as special.  This prevents accidental removal of backslash


I cannot find any shell that behaves this way in its current version. 
Can you share some details about what shells you tested?


In bash and in my shell, backslash is always given this treatment 
regardless of whether any other special characters are present.


In NetBSD sh, backslash is given this special treatment only if the 
current pathname component of the pattern includes a metacharacter. That 
is, an indirect /de\v/nul[l] does not find /dev/null, but an indirect 
/d[e]\v/null does.


In zsh, backslash is given this special treatment only if the next 
character is special, so an indirect [a]\? matches a?, but an indirect 
[a]\b does not match ab.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Geoff Clare
Stephane Chazelas  wrote, on 24 Sep 2019:
>
> 2019-09-23 15:39:49 +, Austin Group Bug Tracker:
> [...]
> > On page 2384 line 76271 section 2.13.3, change:3. Specified
> > patterns shall be matched against existing filenames and pathnames, as
> > appropriate.to:3. If a specified pattern contains
> > any '*', '?' or '[' characters that will be treated as special (see [xref
> > to 2.13.1]), it shall be matched against existing filenames and pathnames,
> > as appropriate.
> > On page 2384 line 76295 section 2.13.3, add:4. If a specified
> > pattern does not contain any '*', '?' or '[' characters that will be
> > treated as special, the pattern string shall be left
> > unchanged.
> [...]
> 
> Thanks for looking into that.
> 
> Does the above mean that
> 
> var='\*' IFS=
> printf "%s\n" $var
> 
> is required to output \* regardless of whether a * or \foo file
> exists in the current directory exists or not (or whether
> nullglob is enabled after bugid:247 resolution)?
> 
> (and we'd need var='[*]' for a glob that matches a * file).

Yes.

> If not, and if that \* is meant to match on a * file (like for
> find -name '\*'), would that be based on whether the "*" file
> can be accessed (search permissions to the current directory
> needed), or found in the contents of current directory (read
> permission needed)?
> 
> Regardless, the above question applies in
> 
> printf '%s\n' */$var
> 
> Or 
> 
> printf '%s\n' $var/*

Those both have a * that will be treated as special, so matching
against existing files is performed.  The permission requirements in
these two cases are the same as if you had used var=foo.

> (where there are variations among the shells that implement that
> second meaning of backslash).

Do you mean you see variation in those last two cases, or only in
the first case, where bash 4 and earlier do what's specified in the
bug 1234 resolution but in bash 5, printf "%s\n" $var outputs *
when a * file exists?

> Also applies to glob().

Looks like we should modify the description of GLOB_NOCHECK.  It refers
to rule 3 in XCU 2.13.3 but then goes on to describe the behaviour in
a way that doesn't (fully) match that rule any more.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-09-24 Thread Stephane Chazelas
2019-09-23 15:39:49 +, Austin Group Bug Tracker:
[...]
> On page 2384 line 76271 section 2.13.3, change:3. Specified
> patterns shall be matched against existing filenames and pathnames, as
> appropriate.to:3. If a specified pattern contains
> any '*', '?' or '[' characters that will be treated as special (see [xref
> to 2.13.1]), it shall be matched against existing filenames and pathnames,
> as appropriate.
> On page 2384 line 76295 section 2.13.3, add:4. If a specified
> pattern does not contain any '*', '?' or '[' characters that will be
> treated as special, the pattern string shall be left
> unchanged.
[...]

Thanks for looking into that.

Does the above mean that

var='\*' IFS=
printf "%s\n" $var

is required to output \* regardless of whether a * or \foo file
exists in the current directory exists or not (or whether
nullglob is enabled after bugid:247 resolution)?

(and we'd need var='[*]' for a glob that matches a * file).

If not, and if that \* is meant to match on a * file (like for
find -name '\*'), would that be based on whether the "*" file
can be accessed (search permissions to the current directory
needed), or found in the contents of current directory (read
permission needed)?

Regardless, the above question applies in

printf '%s\n' */$var

Or 

printf '%s\n' $var/*

(where there are variations among the shells that implement that
second meaning of backslash). Also applies to glob().

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-04 Thread Harald van Dijk

On 04/07/2019 09:09, Geoff Clare wrote:

Harald van Dijk  wrote, on 03 Jul 2019:



No, it's a context where shell-quoting backslash *doesn't* work. Therefore
the backslash should act as an escape character just like in find, pax,
fnmatch() and glob().


It's not about shell quoting the backslash, it's about whether shell quoting
can be used on the asterisk. The quoted sentence says that if shell quoting
cannot be used on the asterisk, then it can be escaped with a backslash
instead. But shell quoting can be used on the asterisk: just try

   a='*'
   ls -ld "$a"

This does work for listing only a file literally named '*'.


That's because of the double quotes around $a.  Pathname expansion
is not done inside double quotes, so there is no pattern here, just
a string that contains a '*'.


Is that relevant? It still shows that shell quoting can force the '*' to 
be treated literally, does it not?


In this case, whether pathname expansion is performed does not affect 
the result. In the general case, words may be partly quoted, and the 
quoted parts of such words are definitely supposed to be used in 
pathname expansion, so let's go with a different example:


  a='*'
  ls -ld $a*   #1
  ls -ld "$a"* #2

Here, it is clear that pathname expansion is performed. #1 lists all 
non-hidden files, #2 lists all files starting with '*'. Since shell 
quoting works, under my understanding of your proposed wording, that 
means backslash cannot be used to escape that '*', and


  a='\*'
  ls -ld $a*

must list all files starting with '\' since backslash does not function 
as an escape character here.


It would be bizarre if the handling of '\' in $a is different from the 
handling of '\' in $a*.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-04 Thread Geoff Clare
Geoff Clare  wrote, on 04 Jul 2019:
>
> Harald van Dijk  wrote, on 03 Jul 2019:
> >
> > >No, it's a context where shell-quoting backslash *doesn't* work. Therefore
> > >the backslash should act as an escape character just like in find, pax,
> > >fnmatch() and glob().
> > 
> > It's not about shell quoting the backslash, it's about whether shell quoting
> > can be used on the asterisk. The quoted sentence says that if shell quoting
> > cannot be used on the asterisk, then it can be escaped with a backslash
> > instead. But shell quoting can be used on the asterisk: just try
> > 
> >   a='*'
> >   ls -ld "$a"
> > 
> > This does work for listing only a file literally named '*'.
> 
> That's because of the double quotes around $a.  Pathname expansion
> is not done inside double quotes, so there is no pattern here, just
> a string that contains a '*'.

Given that the side-discussion that arose from this may well lead to
rewording that does mean double quotes "affect" indirect patterns, I
think it would be prudent to reword the new condition in 2.13.1.

I'm thinking make it specific to backslash:

In a pattern, or part of one, where a shell-quoting 
can be used ...

In a pattern, or part of one, where a shell-quoting 
cannot be used (such as ...

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-04 Thread Geoff Clare
Harald van Dijk  wrote, on 03 Jul 2019:
>
> >No, it's a context where shell-quoting backslash *doesn't* work. Therefore
> >the backslash should act as an escape character just like in find, pax,
> >fnmatch() and glob().
> 
> It's not about shell quoting the backslash, it's about whether shell quoting
> can be used on the asterisk. The quoted sentence says that if shell quoting
> cannot be used on the asterisk, then it can be escaped with a backslash
> instead. But shell quoting can be used on the asterisk: just try
> 
>   a='*'
>   ls -ld "$a"
> 
> This does work for listing only a file literally named '*'.

That's because of the double quotes around $a.  Pathname expansion
is not done inside double quotes, so there is no pattern here, just
a string that contains a '*'.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Harald van Dijk

On 03/07/2019 15:27, Geoff Clare wrote:

Harald van Dijk  wrote, on 03 Jul 2019:


On 03/07/2019 11:08, Geoff Clare wrote:

Stephane Chazelas  wrote, on 03 Jul 2019:


2019-07-03 09:24:10 +0100, Geoff Clare:
[...]

   [...] If any character (ordinary, shell
special, or pattern special) is quoted or (where shell quoting is not
in effect) escaped with a , that pattern shall match the
character itself. [...]

[...]

And again, that's an incompatible change for dash, ksh88, ksh93,
pdksh, mksh, bosh, yash where:

a='\*'
ls -ld $a

lists the files that start with \


Which is inconsistent with find, pax, fnmatch() and glob().


I thought the new wording you proposed would require this to list the files
that start with \, as this is a context where shell quoting is in effect,


No, it's a context where shell-quoting backslash *doesn't* work. Therefore
the backslash should act as an escape character just like in find, pax,
fnmatch() and glob().


It's not about shell quoting the backslash, it's about whether shell 
quoting can be used on the asterisk. The quoted sentence says that if 
shell quoting cannot be used on the asterisk, then it can be escaped 
with a backslash instead. But shell quoting can be used on the asterisk: 
just try


  a='*'
  ls -ld "$a"

This does work for listing only a file literally named '*'.

Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Geoff Clare
Stephane Chazelas  wrote, on 03 Jul 2019:
>
> Before (Bourne/ksh88...) it was:
> 
> *, ? and [...] are wildcard operators and quoting can be used to
> remove their special meaning.
> 
> Which applies to both shell and fnmatch() (where quoting is done
> with \).
> 
> With your proposed change, the sh documentation has to be
> changed to:
> 
> *, ? and [...] are wildcard operators and quoting can be used to
> remove their special meaning, but an unquoted backslash (as can
> be produced by leaving a word expansion unquoted) can also be
> used to escape the following character if any (unspecified if
> there's no following character), but bearing in mind that in the
> pathname expansion case, it only happens for words that contain
> an unquoted and unescaped (with that unquoted \ character)
> wildcard operator (unspecified if [ is not matched by an
> unquoted unescaped ] in the same path component)...

I think you have deliberately come up with as bad a modification
as you could.  Obviously it could be made much clearer and easier
to read.

A lot of the added complexity applies equally to direct patterns,
and is already needed.  If that stuff is added (or already exists
in some form) somewhere that applies to both direct and indirect
patterns, then all that would be needed to this part is:

*, ? and [...] are wildcard operators and quoting can be used to
remove their special meaning.  In contexts where the pattern is not
affected by shell quoting, only a preceding backslash can be used.

although it might be helpful to the reader to include the "(such as ...)"
by way of illustration.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Shware Systems

Just being picky, re:
"Arguments to find, pax, fnmatch() and glob() are others."

at the bottom, which to me should be:
"Arguments to exec('find',...), exec('pax',...), fnmatch() and glob() are 
others."

as parameters of find and pax in scripts are shell words covered by the 
statement preceding that. It is after parameter expansions and quote removal 
that the resulting token value may be an argument string to be treated as a 
pattern for those utilities.

On Wednesday, July 3, 2019 Geoff Clare  wrote:

Stephane Chazelas  wrote, on 03 Jul 2019:
>
> 2019-07-03 11:08:57 +0100, Geoff Clare:
> [...]
> > > And again, that's an incompatible change for dash, ksh88, ksh93,
> > > pdksh, mksh, bosh, yash where:
> > >
> > > a='\*'
> > > ls -ld $a
> > >
> > > lists the files that start with \
> >
> > Which is inconsistent with find, pax, fnmatch() and glob().
>
> And again, that argument doesn't hold.
>
> There's no find implementation that I know where
>
> find . -name '"*"*'
> find . -name '"$var"*'
>
> works the same as
>
> printf '%s\n' "*"*
> printf '%s\n' "$var"*

The goal is consistency of backslash handling.  There was never any
intention in 1992 to require find, pax, fnmatch() and glob() to mimic
shell single quotes or double quotes, nor is there now, nor does there
need to be.

> fnamtch() didn't add \ support for consistency with the shell,
> It did add the *, ? and [ glob operators of the shell and the \
> quoting operator of the shell. \ is not a glob operator there
> but a quoting operator in its very limited syntax (compared to
> that of the shell which has other forms of quoting, and many
> forms of expansions).
>
> Adding it back to the shell *as an extra layer* hardly helps
> with consistency and adds confusion.

The proposed resolution makes clear that it is not an extra layer, it's
an alternative for situations where the shell quoting backslash is not
available.  Thus providing consistency and reducing confusion.

> [...]
> > > a='\d*'
> > > ls -ld $a
> > >
> > > lists the filenames that start with \d
> >
> > Which is inconsistent with find, pax, fnmatch() and glob().
>
> Irrelevant, pax, fnmatch() and glob() don't do variable
> expansion. find -name '$a' is unspecified but in all
> implementations, that returns the files called $a literally.

The goal is consistency of how backslash behaves in patterns.
A direct pattern in a shell word and an indirect pattern in a shell
variable (that is then used unquoted) are two places a pattern can
occur.  Arguments to find, pax, fnmatch() and glob() are others.


--
Geoff Clare <
g.cl...@opengroup.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England





Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Stephane Chazelas
2019-07-03 15:19:18 +0100, Geoff Clare:
[...]
> > Irrelevant, pax, fnmatch() and glob() don't do variable
> > expansion. find -name '$a' is unspecified but in all
> > implementations, that returns the files called $a literally.
> 
> The goal is consistency of how backslash behaves in patterns.
> A direct pattern in a shell word and an indirect pattern in a shell
> variable (that is then used unquoted) are two places a pattern can
> occur.  Arguments to find, pax, fnmatch() and glob() are others.
[...]

And now, you have to balance it against breaking backward
compatibility and making the syntax of the shell more
complicated, for no benefit other than that preceived
consistency.

Before (Bourne/ksh88...) it was:

*, ? and [...] are wildcard operators and quoting can be used to
remove their special meaning.

Which applies to both shell and fnmatch() (where quoting is done
with \).

With your proposed change, the sh documentation has to be
changed to:

*, ? and [...] are wildcard operators and quoting can be used to
remove their special meaning, but an unquoted backslash (as can
be produced by leaving a word expansion unquoted) can also be
used to escape the following character if any (unspecified if
there's no following character), but bearing in mind that in the
pathname expansion case, it only happens for words that contain
an unquoted and unescaped (with that unquoted \ character)
wildcard operator (unspecified if [ is not matched by an
unquoted unescaped ] in the same path component)...

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Geoff Clare
Harald van Dijk  wrote, on 03 Jul 2019:
>
> On 03/07/2019 11:08, Geoff Clare wrote:
> >Stephane Chazelas  wrote, on 03 Jul 2019:
> >>
> >>2019-07-03 09:24:10 +0100, Geoff Clare:
> >>[...]
> >>>   [...] If any character (ordinary, shell
> >>>special, or pattern special) is quoted or (where shell quoting is not
> >>>in effect) escaped with a , that pattern shall match the
> >>>character itself. [...]
> >>[...]
> >>
> >>And again, that's an incompatible change for dash, ksh88, ksh93,
> >>pdksh, mksh, bosh, yash where:
> >>
> >>a='\*'
> >>ls -ld $a
> >>
> >>lists the files that start with \
> >
> >Which is inconsistent with find, pax, fnmatch() and glob().
> 
> I thought the new wording you proposed would require this to list the files
> that start with \, as this is a context where shell quoting is in effect,

No, it's a context where shell-quoting backslash *doesn't* work. Therefore
the backslash should act as an escape character just like in find, pax,
fnmatch() and glob().

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Geoff Clare
Joerg Schilling  wrote, on 03 Jul 2019:
>
> Geoff Clare  wrote:
> 
> > Joerg Schilling  wrote, on 03 Jul 2019:
> > > pax is not a shell and ls does not include own pattern matching.
> > > 
> > > You thus cannot compare the behavior of these programs with each other or 
> > > with 
> > > a shell.
> >
> > Huh?
> >
> > find, pax, fnmatch() and glob() all do pattern matching as described in
> > XCU 2.13.  The shell is supposed to as well.
> >
> > I already said it has nothing to do with ls.
> 
> If you agree that this has nothing to do with ls, then you should also agree 
> that pax and find (that include pattern matching in contrary to ls) still 
> cannot be discussed together with shell behavior since pax/find do not have a
> shell interpreter for strings inside their code.

Nonsense.  They all do pattern matching.

The whole point of requiring backslash escaping in patterns in find,
pax, fnmatch() and glob() is to provide consistency with how backslash
works in the shell (when the pattern is directly in a word) *because*
as you put it, they do not have a shell interpreter for strings inside
their code.

We have now discovered that indirect patterns in a shell variable are
another place that shell-quoting backslash doesn't work, and so for
consistency with all other pattern matching contexts, the shell needs
to do backslash escaping in those indirect patterns.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Geoff Clare
Stephane Chazelas  wrote, on 03 Jul 2019:
>
> 2019-07-03 11:08:57 +0100, Geoff Clare:
> [...]
> > > And again, that's an incompatible change for dash, ksh88, ksh93,
> > > pdksh, mksh, bosh, yash where:
> > > 
> > > a='\*'
> > > ls -ld $a
> > > 
> > > lists the files that start with \ 
> > 
> > Which is inconsistent with find, pax, fnmatch() and glob().
> 
> And again, that argument doesn't hold.
> 
> There's no find implementation that I know where
> 
> find . -name '"*"*'
> find . -name '"$var"*'
> 
> works the same as
> 
> printf '%s\n' "*"*
> printf '%s\n' "$var"*

The goal is consistency of backslash handling.  There was never any
intention in 1992 to require find, pax, fnmatch() and glob() to mimic
shell single quotes or double quotes, nor is there now, nor does there
need to be.

> fnamtch() didn't add \ support for consistency with the shell,
> It did add the *, ? and [ glob operators of the shell and the \
> quoting operator of the shell. \ is not a glob operator there
> but a quoting operator in its very limited syntax (compared to
> that of the shell which has other forms of quoting, and many
> forms of expansions).
> 
> Adding it back to the shell *as an extra layer* hardly helps
> with consistency and adds confusion.

The proposed resolution makes clear that it is not an extra layer, it's
an alternative for situations where the shell quoting backslash is not
available.  Thus providing consistency and reducing confusion.

> [...]
> > > a='\d*'
> > > ls -ld $a
> > > 
> > > lists the filenames that start with \d
> > 
> > Which is inconsistent with find, pax, fnmatch() and glob().
> 
> Irrelevant, pax, fnmatch() and glob() don't do variable
> expansion. find -name '$a' is unspecified but in all
> implementations, that returns the files called $a literally.

The goal is consistency of how backslash behaves in patterns.
A direct pattern in a shell word and an indirect pattern in a shell
variable (that is then used unquoted) are two places a pattern can
occur.  Arguments to find, pax, fnmatch() and glob() are others.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Joerg Schilling
Geoff Clare  wrote:

> Joerg Schilling  wrote, on 03 Jul 2019:
> > pax is not a shell and ls does not include own pattern matching.
> > 
> > You thus cannot compare the behavior of these programs with each other or 
> > with 
> > a shell.
>
> Huh?
>
> find, pax, fnmatch() and glob() all do pattern matching as described in
> XCU 2.13.  The shell is supposed to as well.
>
> I already said it has nothing to do with ls.

If you agree that this has nothing to do with ls, then you should also agree 
that pax and find (that include pattern matching in contrary to ls) still 
cannot be discussed together with shell behavior since pax/find do not have a
shell interpreter for strings inside their code.

As mentioned by Stephane, there is no support for find -name '"*".c' and the 
content of the -name argument in find is typically controlled by the string 
processing of the shell or by a prepared string from a program that calls 
exec().

There is backslash processing by fnmatch() (in former times glob()) and there is
backslash processing in the shell string parsing and processing.

The backslash processing in the string processing of the shell is unchanged 
since 42 years and if that was a problem, it had been discussed in the 1980s 
already. If we like to keep compatibility, we cannot change the requirement for 
somthing as important as that shell feature. If you like to have special 
"escaping" for some characters to treat them literally, you can use [c] for 
such a "c" and this is a method that has already been discussed in the 1980s in 
the related Usenet groups.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Harald van Dijk

On 03/07/2019 11:08, Geoff Clare wrote:

Stephane Chazelas  wrote, on 03 Jul 2019:


2019-07-03 09:24:10 +0100, Geoff Clare:
[...]

   [...] If any character (ordinary, shell
special, or pattern special) is quoted or (where shell quoting is not
in effect) escaped with a , that pattern shall match the
character itself. [...]

[...]

And again, that's an incompatible change for dash, ksh88, ksh93,
pdksh, mksh, bosh, yash where:

a='\*'
ls -ld $a

lists the files that start with \


Which is inconsistent with find, pax, fnmatch() and glob().


I thought the new wording you proposed would require this to list the 
files that start with \, as this is a context where shell quoting is in 
effect, so the use of backslash as an escape character is not supported 
and it just matches itself. If that is not what you were trying to say 
with the new wording, it is not clear to me what the intent is.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Stephane Chazelas
2019-07-03 11:08:57 +0100, Geoff Clare:
[...]
> > And again, that's an incompatible change for dash, ksh88, ksh93,
> > pdksh, mksh, bosh, yash where:
> > 
> > a='\*'
> > ls -ld $a
> > 
> > lists the files that start with \ 
> 
> Which is inconsistent with find, pax, fnmatch() and glob().

And again, that argument doesn't hold.

There's no find implementation that I know where

find . -name '"*"*'
find . -name '"$var"*'

works the same as

printf '%s\n' "*"*
printf '%s\n' "$var"*

even though that's currently allowed by the spec. And I don't
expect we want to go there. And if we go there, I don't expect we
would want to also add back those to the shell as a second level
of evaluation (like in pattern='"$var"*'; ls -- $pattern).

fnamtch() didn't add \ support for consistency with the shell,
It did add the *, ? and [ glob operators of the shell and the \
quoting operator of the shell. \ is not a glob operator there
but a quoting operator in its very limited syntax (compared to
that of the shell which has other forms of quoting, and many
forms of expansions).

Adding it back to the shell *as an extra layer* hardly helps
with consistency and adds confusion. But more importantly would
break backward compatibility in most shells. 

[...]
> > a='\d*'
> > ls -ld $a
> > 
> > lists the filenames that start with \d
> 
> Which is inconsistent with find, pax, fnmatch() and glob().

Irrelevant, pax, fnmatch() and glob() don't do variable
expansion. find -name '$a' is unspecified but in all
implementations, that returns the files called $a literally.

> > And in ksh93
> > 
> > a='\d*'
> > case string in $a)
> > 
> > matches on strings that start with digits.
> 
> Which has never been allowed by any possible interpretation of the standard.

But would be once bug 1234 is fixed properly by allowing both
the most common behaviour and that of bash.


> > I still don't understand why you want to specify a behaviour
> > that is not present in any shell (but one), that would break
> > backward compatibility, that is not needed, that would make the
> > syntax of the shell and the text of the spec more confusing.
> 
> We know you don't get it; you've told us several times already.
> Repeating it more times is not going to achieve anything and will just
> waste everybody's time.
[...]

We have diverging views. Bug 1234 is about saying that your view
is wrong. If you don't agree, you can close that bug as "won't
fix" and be done with it.

But AFAICT, Eric's and Joerg's views at least are currently
closer to mine than to yours.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Harald van Dijk

On 03/07/2019 09:24, Geoff Clare wrote:

Harald van Dijk  wrote, on 02 Jul 2019:



That's not because the word "unquoted" is used, which only applies to shell
quoting, that's because 2.13.1 contains "All of the requirements and effects
of quoting on ordinary, shell special, and special pattern characters shall
apply to escaping in this context", which specifies that quoting and other
escaping have the same effect.


So what's the problem?  The text that you complained about in my
proposed resolution is in 2.13.1, so is covered by this.


Let's get back to the wording you proposed.


   [...] If any character (ordinary, shell
special, or pattern special) is quoted, using either shell quoting
or (where shell quoting is not in effect) a  escape, that
pattern shall match the character itself. [...]


This wording wrongly claims that where shell quoting is not in effect, a
 escape quotes the next character. It does not just say that
character is treated as if it were quoted like the current text says, it
says it *is* quoted.


That's extremely picky, but I take your point. Here's an alternative that
I think fixes the problem:

   [...] If any character (ordinary, shell
special, or pattern special) is quoted or (where shell quoting is not
in effect) escaped with a , that pattern shall match the
character itself. [...]


Thanks, that looks like a proper fix for that issue.

Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Geoff Clare
Joerg Schilling  wrote, on 03 Jul 2019:
>
> Geoff Clare  wrote:
> 
> > Joerg Schilling  wrote, on 03 Jul 2019:
> 
> > > Do you like to say that pax behaves inconsistent to ls?
> >
> > The inconsistentcy has nothing to do with ls.  It's with how those
> > shells interpret the (indirect) pattern \* compared to how find, pax,
> > fnmatch() and glob() (and the shell itself when it's a direct pattern)
> > interpret it.
> 
> It seems that you missinterpret that effect.
> 
> pax is not a shell and ls does not include own pattern matching.
> 
> You thus cannot compare the behavior of these programs with each other or 
> with 
> a shell.

Huh?

find, pax, fnmatch() and glob() all do pattern matching as described in
XCU 2.13.  The shell is supposed to as well.

I already said it has nothing to do with ls.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Joerg Schilling
Geoff Clare  wrote:

> Joerg Schilling  wrote, on 03 Jul 2019:

> > Do you like to say that pax behaves inconsistent to ls?
>
> The inconsistentcy has nothing to do with ls.  It's with how those
> shells interpret the (indirect) pattern \* compared to how find, pax,
> fnmatch() and glob() (and the shell itself when it's a direct pattern)
> interpret it.

It seems that you missinterpret that effect.

pax is not a shell and ls does not include own pattern matching.

You thus cannot compare the behavior of these programs with each other or with 
a shell.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Geoff Clare
Joerg Schilling  wrote, on 03 Jul 2019:
>
> Geoff Clare  wrote:
> 
> > Stephane Chazelas  wrote, on 03 Jul 2019:
> > > And again, that's an incompatible change for dash, ksh88, ksh93,
> > > pdksh, mksh, bosh, yash where:
> > > 
> > > a='\*'
> > > ls -ld $a
> > > 
> > > lists the files that start with \ 
> >
> > Which is inconsistent with find, pax, fnmatch() and glob().
> 
> Do you like to say that pax behaves inconsistent to ls?

The inconsistentcy has nothing to do with ls.  It's with how those
shells interpret the (indirect) pattern \* compared to how find, pax,
fnmatch() and glob() (and the shell itself when it's a direct pattern)
interpret it.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Joerg Schilling
Geoff Clare  wrote:

> Stephane Chazelas  wrote, on 03 Jul 2019:
> > And again, that's an incompatible change for dash, ksh88, ksh93,
> > pdksh, mksh, bosh, yash where:
> > 
> > a='\*'
> > ls -ld $a
> > 
> > lists the files that start with \ 
>
> Which is inconsistent with find, pax, fnmatch() and glob().

Do you like to say that pax behaves inconsistent to ls?

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Geoff Clare
Stephane Chazelas  wrote, on 03 Jul 2019:
>
> 2019-07-03 09:24:10 +0100, Geoff Clare:
> [...]
> >   [...] If any character (ordinary, shell
> >special, or pattern special) is quoted or (where shell quoting is not
> >in effect) escaped with a , that pattern shall match the
> >character itself. [...]
> [...]
> 
> And again, that's an incompatible change for dash, ksh88, ksh93,
> pdksh, mksh, bosh, yash where:
> 
> a='\*'
> ls -ld $a
> 
> lists the files that start with \ 

Which is inconsistent with find, pax, fnmatch() and glob().

> Or zshsh where
> 
> a='\d*'
> ls -ld $a
> 
> lists the filenames that start with \d

Which is inconsistent with find, pax, fnmatch() and glob().

> And in ksh93
> 
> a='\d*'
> case string in $a)
> 
> matches on strings that start with digits.

Which has never been allowed by any possible interpretation of the standard.

> I still don't understand why you want to specify a behaviour
> that is not present in any shell (but one), that would break
> backward compatibility, that is not needed, that would make the
> syntax of the shell and the text of the spec more confusing.

We know you don't get it; you've told us several times already.
Repeating it more times is not going to achieve anything and will just
waste everybody's time.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-03 Thread Geoff Clare
Harald van Dijk  wrote, on 02 Jul 2019:
>
> >>That's not because the word "unquoted" is used, which only applies to shell
> >>quoting, that's because 2.13.1 contains "All of the requirements and effects
> >>of quoting on ordinary, shell special, and special pattern characters shall
> >>apply to escaping in this context", which specifies that quoting and other
> >>escaping have the same effect.
> >
> >So what's the problem?  The text that you complained about in my
> >proposed resolution is in 2.13.1, so is covered by this.
> 
> Let's get back to the wording you proposed.
> 
> >   [...] If any character (ordinary, shell
> >special, or pattern special) is quoted, using either shell quoting
> >or (where shell quoting is not in effect) a  escape, that
> >pattern shall match the character itself. [...]
> 
> This wording wrongly claims that where shell quoting is not in effect, a
>  escape quotes the next character. It does not just say that
> character is treated as if it were quoted like the current text says, it
> says it *is* quoted.

That's extremely picky, but I take your point. Here's an alternative that
I think fixes the problem:

  [...] If any character (ordinary, shell
   special, or pattern special) is quoted or (where shell quoting is not
   in effect) escaped with a , that pattern shall match the
   character itself. [...]

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-02 Thread Harald van Dijk

On 02/07/2019 10:20, Geoff Clare wrote:

Harald van Dijk  wrote, on 01 Jul 2019:

On 01/07/2019 10:36, Geoff Clare wrote:

Harald van Dijk  wrote, on 30 Jun 2019:


POSIX does not even limit the concept of "syntax errors" to errors in the
syntax, see e.g. the "shift" command:


If the n operand is invalid or is greater than "$#", this may be considered a 
syntax error and a non-interactive shell may exit; [...]




The point of this text is to allow something that would not normally
be considered to be a syntax error to be handled as if it was one.
Although it words it by saying "may be considered" rather than "may be
treated as if it was", this is clearly an explicit exception to the
usual rule of what constitutes a syntax error.


It is utterly pointless. The only difference in handling of syntax errors,
per 2.8.1 Consequences of Shell Errors, is that the handling of "Shell
language syntax error" requires a shell diagnostic message, but the handling
of "Special built-in utility error", whether utility syntax error or
otherwise, does not. Clearly, this does not constitute a shell language
syntax error and no shell diagnostic message is required (although a utility
diagnostic message should be printed), and other than that, it is irrelevant
whether the error is considered a syntax error. Given that it is then called
a syntax error anyway, the only conclusion I can draw from that is that the
people responsible for this wording do not care to limit syntax errors to
errors in the syntax.


This use of "syntax error" was written when 2.8.1 was quite different.
If you look at SUSv3 you'll see the table had rows for "Shell language
syntax error" and "Utility syntax error (option or operand error)".
So it was referring to that second row, not to shell syntax errors.


Good. This confirms what I had been saying, that "syntax error" was not 
limited to errors in the syntax.



Less important, under the current wording, backslash escapes the next
character, it does not quote it. The requirements of quoting and escaping
are the same, so perhaps it is okay to change the terminology.


Escaping is a form of quoting.  There are numerous places where the
standard uses "unquoted" to mean that a character is neither quoted
with single- or double-quotes nor escaped with a backslash.


Escaping can be a form of quoting, sure.  2.2.1 Escape Character (Backslash)
is part of  2.2 Quoting, after all. Not all escaping is quoting though. I
went over all uses of the word "unquoted" in Shell Command Language. Every
single one refers to shell quoting, and in the few cases where other levels
of backslash removal also apply, the standard does not refer to that as
quoting.


The current text in 2.13.1 and .3 uses "unquoted" in both senses:

L76212: "A  character shall escape the following character."

L76222: "When unquoted and outside a bracket expression, the following
three characters shall have special meaning ..."

L76235: "special characters can be escaped to remove their special
meaning by preceding them with a  character."

L76288: "it is unspecified whether other unquoted pattern matching
characters within the same slash-delimited component of the pattern
retain their special meanings..."

Obviously those two uses of "unquoted" are intended to include the
 escaping described on L76212 and L76235, not just shell
quoting.


That's not because the word "unquoted" is used, which only applies to shell
quoting, that's because 2.13.1 contains "All of the requirements and effects
of quoting on ordinary, shell special, and special pattern characters shall
apply to escaping in this context", which specifies that quoting and other
escaping have the same effect.


So what's the problem?  The text that you complained about in my
proposed resolution is in 2.13.1, so is covered by this.


Let's get back to the wording you proposed.


   [...] If any character (ordinary, shell
special, or pattern special) is quoted, using either shell quoting
or (where shell quoting is not in effect) a  escape, that
pattern shall match the character itself. [...] 


This wording wrongly claims that where shell quoting is not in effect, a 
 escape quotes the next character. It does not just say that 
character is treated as if it were quoted like the current text says, it 
says it *is* quoted.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-02 Thread Geoff Clare
Harald van Dijk  wrote, on 01 Jul 2019:
>
> On 01/07/2019 10:36, Geoff Clare wrote:
> >Harald van Dijk  wrote, on 30 Jun 2019:
> >>
> >>POSIX does not even limit the concept of "syntax errors" to errors in the
> >>syntax, see e.g. the "shift" command:
> >>
> >>>If the n operand is invalid or is greater than "$#", this may be 
> >>>considered a syntax error and a non-interactive shell may exit; [...]
> >>
> >
> >The point of this text is to allow something that would not normally
> >be considered to be a syntax error to be handled as if it was one.
> >Although it words it by saying "may be considered" rather than "may be
> >treated as if it was", this is clearly an explicit exception to the
> >usual rule of what constitutes a syntax error.
> 
> It is utterly pointless. The only difference in handling of syntax errors,
> per 2.8.1 Consequences of Shell Errors, is that the handling of "Shell
> language syntax error" requires a shell diagnostic message, but the handling
> of "Special built-in utility error", whether utility syntax error or
> otherwise, does not. Clearly, this does not constitute a shell language
> syntax error and no shell diagnostic message is required (although a utility
> diagnostic message should be printed), and other than that, it is irrelevant
> whether the error is considered a syntax error. Given that it is then called
> a syntax error anyway, the only conclusion I can draw from that is that the
> people responsible for this wording do not care to limit syntax errors to
> errors in the syntax.

This use of "syntax error" was written when 2.8.1 was quite different.
If you look at SUSv3 you'll see the table had rows for "Shell language
syntax error" and "Utility syntax error (option or operand error)".
So it was referring to that second row, not to shell syntax errors.
We should change the text now that 2.8.1 no longer has an entry for
utility syntax errors.  (I will submit a new Mantis bug.)

> 
> There's another example in the trap command description:
> 
> >If the trap name [XSI] [Option Start] or number [Option End] is invalid, 
> > a non-zero exit status shall be returned; otherwise, zero shall be 
> > returned. For both interactive and non-interactive shells, invalid signal 
> > names [XSI] [Option Start]  or numbers [Option End] shall not be considered 
> > a syntax error and do not cause the shell to abort.
> 

Same problem - it's a reference to the old version of the table in 2.8.1.

> Less important, under the current wording, backslash escapes the next
> character, it does not quote it. The requirements of quoting and escaping
> are the same, so perhaps it is okay to change the terminology.
> >>>
> >>>Escaping is a form of quoting.  There are numerous places where the
> >>>standard uses "unquoted" to mean that a character is neither quoted
> >>>with single- or double-quotes nor escaped with a backslash.
> >>
> >>Escaping can be a form of quoting, sure.  2.2.1 Escape Character (Backslash)
> >>is part of  2.2 Quoting, after all. Not all escaping is quoting though. I
> >>went over all uses of the word "unquoted" in Shell Command Language. Every
> >>single one refers to shell quoting, and in the few cases where other levels
> >>of backslash removal also apply, the standard does not refer to that as
> >>quoting.
> >
> >The current text in 2.13.1 and .3 uses "unquoted" in both senses:
> >
> >L76212: "A  character shall escape the following character."
> >
> >L76222: "When unquoted and outside a bracket expression, the following
> >three characters shall have special meaning ..."
> >
> >L76235: "special characters can be escaped to remove their special
> >meaning by preceding them with a  character."
> >
> >L76288: "it is unspecified whether other unquoted pattern matching
> >characters within the same slash-delimited component of the pattern
> >retain their special meanings..."
> >
> >Obviously those two uses of "unquoted" are intended to include the
> > escaping described on L76212 and L76235, not just shell
> >quoting.
> 
> That's not because the word "unquoted" is used, which only applies to shell
> quoting, that's because 2.13.1 contains "All of the requirements and effects
> of quoting on ordinary, shell special, and special pattern characters shall
> apply to escaping in this context", which specifies that quoting and other
> escaping have the same effect.

So what's the problem?  The text that you complained about in my
proposed resolution is in 2.13.1, so is covered by this.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-01 Thread Harald van Dijk

On 01/07/2019 10:36, Geoff Clare wrote:

Harald van Dijk  wrote, on 30 Jun 2019:


On 28/06/2019 09:38, Geoff Clare wrote:

Harald van Dijk  wrote, on 27 Jun 2019:


On 27/06/2019 10:04, Geoff Clare wrote:


In particular, XRAT's explanation of it is "Conforming applications
are required to quote or escape the shell special characters
(sometimes called metacharacters). If used without this protection,
syntax errors can result or implementation extensions can be triggered."
The fact that this mentions syntax errors implies that the statement
in 2.13.1 was intended only to apply to patterns that are used directly
in shell commands.


Syntax errors are not limited to shell syntax errors.

I would think this means

   find . -name '('

is allowed to immediately exit with

   find: error: invalid pattern


If you interpret the standard as allowing find to treat that as an invalid
pattern, the error is allowed behaviour courtesy of XCU 1.4 CONSEQUENCES
OF ERRORS, but it's not a syntax error.  So the fact that XRAT specifically
talks about syntax errors is an indication that there was no intention to
allow find to treat '(' as an invalid pattern.


Syntax is not just shell syntax. In this hypothetical find 
implementation, it can be an error in the syntax of the command, or an 
error in the syntax of patterns, depending on whether find first 
determines that -name needs a pattern, commits to parsing a pattern, and 
reports the failure, or whether it first determines that '(' cannot be a 
pattern, then determines that -name needs a pattern, and reports that.



POSIX does not even limit the concept of "syntax errors" to errors in the
syntax, see e.g. the "shift" command:


If the n operand is invalid or is greater than "$#", this may be considered a 
syntax error and a non-interactive shell may exit; [...]




The point of this text is to allow something that would not normally
be considered to be a syntax error to be handled as if it was one.
Although it words it by saying "may be considered" rather than "may be
treated as if it was", this is clearly an explicit exception to the
usual rule of what constitutes a syntax error.


It is utterly pointless. The only difference in handling of syntax 
errors, per 2.8.1 Consequences of Shell Errors, is that the handling of 
"Shell language syntax error" requires a shell diagnostic message, but 
the handling of "Special built-in utility error", whether utility syntax 
error or otherwise, does not. Clearly, this does not constitute a shell 
language syntax error and no shell diagnostic message is required 
(although a utility diagnostic message should be printed), and other 
than that, it is irrelevant whether the error is considered a syntax 
error. Given that it is then called a syntax error anyway, the only 
conclusion I can draw from that is that the people responsible for this 
wording do not care to limit syntax errors to errors in the syntax.


There's another example in the trap command description:


If the trap name [XSI] [Option Start] or number [Option End] is invalid, a 
non-zero exit status shall be returned; otherwise, zero shall be returned. For 
both interactive and non-interactive shells, invalid signal names [XSI] [Option 
Start]  or numbers [Option End] shall not be considered a syntax error and do 
not cause the shell to abort.


If these would already not be considered syntax errors by the mere fact 
that they are not errors in the syntax, there would be no need for 
explicit language that they must not be considered syntax errors.



Less important, under the current wording, backslash escapes the next
character, it does not quote it. The requirements of quoting and escaping
are the same, so perhaps it is okay to change the terminology.


Escaping is a form of quoting.  There are numerous places where the
standard uses "unquoted" to mean that a character is neither quoted
with single- or double-quotes nor escaped with a backslash.


Escaping can be a form of quoting, sure.  2.2.1 Escape Character (Backslash)
is part of  2.2 Quoting, after all. Not all escaping is quoting though. I
went over all uses of the word "unquoted" in Shell Command Language. Every
single one refers to shell quoting, and in the few cases where other levels
of backslash removal also apply, the standard does not refer to that as
quoting.


The current text in 2.13.1 and .3 uses "unquoted" in both senses:

L76212: "A  character shall escape the following character."

L76222: "When unquoted and outside a bracket expression, the following
three characters shall have special meaning ..."

L76235: "special characters can be escaped to remove their special
meaning by preceding them with a  character."

L76288: "it is unspecified whether other unquoted pattern matching
characters within the same slash-delimited component of the pattern
retain their special meanings..."

Obviously those two uses of "unquoted" are intended to include the
 escaping described on L76212 and L76235, not 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-07-01 Thread Geoff Clare
Harald van Dijk  wrote, on 30 Jun 2019:
>
> On 28/06/2019 09:38, Geoff Clare wrote:
> >Harald van Dijk  wrote, on 27 Jun 2019:
> >>
> >>On 27/06/2019 10:04, Geoff Clare wrote:
> >
> >In particular, XRAT's explanation of it is "Conforming applications
> >are required to quote or escape the shell special characters
> >(sometimes called metacharacters). If used without this protection,
> >syntax errors can result or implementation extensions can be triggered."
> >The fact that this mentions syntax errors implies that the statement
> >in 2.13.1 was intended only to apply to patterns that are used directly
> >in shell commands.
> 
> Syntax errors are not limited to shell syntax errors.
> 
> I would think this means
> 
>   find . -name '('
> 
> is allowed to immediately exit with
> 
>   find: error: invalid pattern

If you interpret the standard as allowing find to treat that as an invalid
pattern, the error is allowed behaviour courtesy of XCU 1.4 CONSEQUENCES
OF ERRORS, but it's not a syntax error.  So the fact that XRAT specifically
talks about syntax errors is an indication that there was no intention to
allow find to treat '(' as an invalid pattern.

> POSIX does not even limit the concept of "syntax errors" to errors in the
> syntax, see e.g. the "shift" command:
> 
> >If the n operand is invalid or is greater than "$#", this may be considered 
> >a syntax error and a non-interactive shell may exit; [...]
> 

The point of this text is to allow something that would not normally
be considered to be a syntax error to be handled as if it was one.
Although it words it by saying "may be considered" rather than "may be
treated as if it was", this is clearly an explicit exception to the
usual rule of what constitutes a syntax error.

> >>Less important, under the current wording, backslash escapes the next
> >>character, it does not quote it. The requirements of quoting and escaping
> >>are the same, so perhaps it is okay to change the terminology.
> >
> >Escaping is a form of quoting.  There are numerous places where the
> >standard uses "unquoted" to mean that a character is neither quoted
> >with single- or double-quotes nor escaped with a backslash.
> 
> Escaping can be a form of quoting, sure.  2.2.1 Escape Character (Backslash)
> is part of  2.2 Quoting, after all. Not all escaping is quoting though. I
> went over all uses of the word "unquoted" in Shell Command Language. Every
> single one refers to shell quoting, and in the few cases where other levels
> of backslash removal also apply, the standard does not refer to that as
> quoting.

The current text in 2.13.1 and .3 uses "unquoted" in both senses:

L76212: "A  character shall escape the following character."

L76222: "When unquoted and outside a bracket expression, the following
three characters shall have special meaning ..."

L76235: "special characters can be escaped to remove their special
meaning by preceding them with a  character."

L76288: "it is unspecified whether other unquoted pattern matching
characters within the same slash-delimited component of the pattern
retain their special meanings..."

Obviously those two uses of "unquoted" are intended to include the
 escaping described on L76212 and L76235, not just shell
quoting.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-30 Thread Robert Elz
Date:Thu, 27 Jun 2019 12:27:55 +0200
From:Joerg Schilling 
Message-ID:  <5d149a2b.tueush4pd3wqoutl%joerg.schill...@fokus.fraunhofer.de>

  | Note that POSIX is a portable source standard and other shells that may
  | behave like bash5 currently only compile and work on a single platform.

I haven't been paying much attention here recently (other stuff to do)
and have a lot of unread mail on this list.

But I wanted to address this point in particular when I saw it (it also
came up in some other message I saw in passing) - and apologies one and all
if others have also said what I am about to say, and I just haven't read
your messages yet.

I shall eventually (if needed) return to substantial points of the actual
issues being discussed in this thread sometime later.

First, you're absolutely right that the NetBSD sh isn't (currently)
portable to other systems - though the issue is mostly its build environment,
rather than its code (but yes,  is a nuisance).   Fixing
that is somewhere on my list of things to do one day, but it is not
nearly as high a priority as making the shell work correctly (for my,
and the NetBSD developers' and users' definition of correctly) and
then more efficiently.   [Aside: people I know have managed to build
it on other systems, it is not an impossible task - though it is certainly
not trivial either.]

But all that is 100% irrelevant to anything here - POSIX is so that
applications can be portable, not necessarily in order to make the
systems that implement it portable, or even available at all.   In fact,
when POSIX (and or the SUS before it) were initially written, the shell
which was mostly used as the basis for the XCU section, was not really
available at all - it was all proprietary sources.

Any of today's POSIX conforming systems can (could) be the same.

The is no requirement, anywhere, that any particular piece of any
of those systems be portable to any other system, or be available for
you to test in any way at all.

To be certified, as I undersand it, the whole system needs to be tested
and verified correct (plus all the documentation, blah blah ...) but I
don't believe that you are any part of that process, nor that any of the
sources for the certified system ever need be made available to anyone.

None of that makes such a system less relevant for determining what the
the actual expected operations and expectations of the shell in the
wild actually are - that is, what the standard should say.

Further, shells that are actively being distributed and used with systems
available now (particularly those that are installed as /bin/sh on the
various different systems) are much more relevant to use in deciding what
is (and should be) the standard than other random code that is used almost
nowhere - whatever its ancient lineage.   And whether you can get at them
to test their operations is 100% irrelevant.

Lastly, if you really want to test the NetBSD shell, that is easy to do - all
you need to do is install NetBSD somewhere - which is not a difficult process,
as while it doesn't always handle all the newest hardware all that well, it
is highly portable, and runs on just about every emulation environment you
can imagine (XEN, Virtualbox, VMware, Qemu, ...) as well as on a large
variety of real (bare metal) hardware of many different architectures.

kre




Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-30 Thread Harald van Dijk

On 28/06/2019 09:38, Geoff Clare wrote:

Harald van Dijk  wrote, on 27 Jun 2019:


On 27/06/2019 10:04, Geoff Clare wrote:

Stephane Chazelas  wrote, on 26 Jun 2019:


Or again, forget all about it and treat the ksh93 behaviour as
non-compliant as is already the case.


I'm starting to think that this is what we should do, given the number
of oddities you have identified and the potential to break existing
applications that use parentheses in find -name, fnmatch(), etc.

The primary aim (of those of us discussing the issue in teleconferences)
in resolving bug 1234 is consistency.  I was hoping that we could bring
some consistency between contexts where *(...) etc. are syntax errors in
POSIX and those where they aren't by limiting which cases can be
considered special.  But that doesn't look workable now.

So here's a new proposal which just clarifies that *(...) etc. can
only be special when they would otherwise be a syntax error.


I'm not objecting, but even if you limit it to this, it's still a change,
not a clarification, no? It came as a surprise to some people, but I do not
see anything ambiguous in the current standard.


It's unclear what, precisely, "The shell special characters always
require quoting" is intended to mean.

In particular, XRAT's explanation of it is "Conforming applications
are required to quote or escape the shell special characters
(sometimes called metacharacters). If used without this protection,
syntax errors can result or implementation extensions can be triggered."
The fact that this mentions syntax errors implies that the statement
in 2.13.1 was intended only to apply to patterns that are used directly
in shell commands.


Syntax errors are not limited to shell syntax errors.

I would think this means

  find . -name '('

is allowed to immediately exit with

  find: error: invalid pattern

POSIX does not even limit the concept of "syntax errors" to errors in 
the syntax, see e.g. the "shift" command:



If the n operand is invalid or is greater than "$#", this may be considered a 
syntax error and a non-interactive shell may exit; [...]


[...]

I think I see a small wording issue:


   [...] If any character (ordinary, shell
special, or pattern special) is quoted, using either shell quoting
or (where shell quoting is not in effect) a  escape, that
pattern shall match the character itself. [...]


You excluded the bits in this proposal that would change the handling of
backslash,


The email you replied to is not the complete proposed resolution of
bug 1234; it is just the parts relating to ksh extended glob patterns.


In that case it is definitely a change, not a clarification.


Less important, under the current wording, backslash escapes the next
character, it does not quote it. The requirements of quoting and escaping
are the same, so perhaps it is okay to change the terminology.


Escaping is a form of quoting.  There are numerous places where the
standard uses "unquoted" to mean that a character is neither quoted
with single- or double-quotes nor escaped with a backslash.


Escaping can be a form of quoting, sure.  2.2.1 Escape Character 
(Backslash) is part of  2.2 Quoting, after all. Not all escaping is 
quoting though. I went over all uses of the word "unquoted" in Shell 
Command Language. Every single one refers to shell quoting, and in the 
few cases where other levels of backslash removal also apply, the 
standard does not refer to that as quoting. See 2.6.3, for old-style 
command substitutions, which have an escape mechanism independent of 
shell quoting:



The search for the matching backquote shall be satisfied by the first unquoted 
non-escaped backquote; [...]


This is written this way to say that

  echo `echo \`echo hello\``
   1  2   34

backticks 1 and 4 match, despite backticks 2 and 3 not being quoted.

Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-30 Thread Harald van Dijk

On 28/06/2019 16:05, Joerg Schilling wrote:

Harald van Dijk  wrote:


That aside, I asked you last time you made this claim about POSIX to
back it up. There is no requirement for standard utilities to be
implemented portably. You responded then:


POSIX intends to create portability at source code level.

Code that is not portable does not follow the POSIX way.


That's not a requirement for POSIX implementations, so it's not relevant.


Well, I like to be able to test various shells on the same platform.

This is close to impossible if I need to install a specific OS for every shell.


Agreed that portability is a nice feature to have. It has a cost, and it 
is up to the maintainers to determine whether the feature is worth the 
cost, and if it is, whether it is worth the cost right now. If they 
choose not to focus efforts on portability right now, it is 
understandable that you do not personally test that shell. It's just 
that the conclusion from that should not be "this shell should not be 
considered", it should be "for this shell to be considered, someone else 
will have to provide the details".


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-28 Thread Chet Ramey
On 6/24/19 12:31 PM, Stephane Chazelas wrote:
> 2019-06-24 11:52:56 -0400, Chet Ramey:
> [...]
>>> Before going in the details of the language, can we at least
>>> agree on what the "intention" should be?
>>
>> Your intention is obvious. It's in the part I quoted.
>>
>> Pathname expansion is performed on words that contain an unquoted
>> `*', `?', or valid unquoted bracket expression.
> [...]
> 
> Yes, though there's the question of:
> 
> echo [qwe/qwe]
> 
> Which doesn't consitute a "valid unquoted bracket expression"
> when used for globbing.

> 
> Yet:
> 
> $ bash -O nullglob -c 'echo [qwe/qwe]'
> 
> $ yash -o nullglob -c 'echo [qwe/qwe]'
> [qwe/qwe]
> $ mkdir -p '[qwe/qwe]'
> $ bash -O nullglob -c 'echo [qwe/qwe]'
> [qwe/qwe]

Thanks, this is a bug when the pattern is used for pathname expansion.
I'll fix it.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-28 Thread Chet Ramey
On 6/24/19 11:49 AM, Stephane Chazelas wrote:

> Just tried with the current head of the devel branch from today
> (5.0.7(5)-maint).
> 
> In an empty dir:
> 
> $ mkdir -m a=r readable
> $ mkdir -m a=x searchable
> 
> $ bash5 -c 'printf "%s\n" */.'
> searchable/.
> $ bash5 -c 'printf "%s\n" */\.'
> readable/.
> 
> $ bash5 -c 'printf "%s\n" */\./.'
> */./.
> $ a='*/\./.' bash5 -c 'printf "%s\n" $a'
> */\./.
> 
> Those last two are different from the one I was trying before
> (5.0.7(4) IIRC) which were (correctly) returning searchable/./.

Thanks for the report. This was an easy fix.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-28 Thread Joerg Schilling
Harald van Dijk  wrote:

> That aside, I asked you last time you made this claim about POSIX to 
> back it up. There is no requirement for standard utilities to be 
> implemented portably. You responded then:
>
> > POSIX intends to create portability at source code level.
> > 
> > Code that is not portable does not follow the POSIX way.
>
> That's not a requirement for POSIX implementations, so it's not relevant.

Well, I like to be able to test various shells on the same platform.

This is close to impossible if I need to install a specific OS for every shell.



Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-28 Thread Geoff Clare
Harald van Dijk  wrote, on 27 Jun 2019:
>
> On 27/06/2019 10:04, Geoff Clare wrote:
> >Stephane Chazelas  wrote, on 26 Jun 2019:
> >>
> >>Or again, forget all about it and treat the ksh93 behaviour as
> >>non-compliant as is already the case.
> >
> >I'm starting to think that this is what we should do, given the number
> >of oddities you have identified and the potential to break existing
> >applications that use parentheses in find -name, fnmatch(), etc.
> >
> >The primary aim (of those of us discussing the issue in teleconferences)
> >in resolving bug 1234 is consistency.  I was hoping that we could bring
> >some consistency between contexts where *(...) etc. are syntax errors in
> >POSIX and those where they aren't by limiting which cases can be
> >considered special.  But that doesn't look workable now.
> >
> >So here's a new proposal which just clarifies that *(...) etc. can
> >only be special when they would otherwise be a syntax error.
> 
> I'm not objecting, but even if you limit it to this, it's still a change,
> not a clarification, no? It came as a surprise to some people, but I do not
> see anything ambiguous in the current standard.

It's unclear what, precisely, "The shell special characters always
require quoting" is intended to mean.

In particular, XRAT's explanation of it is "Conforming applications
are required to quote or escape the shell special characters
(sometimes called metacharacters). If used without this protection,
syntax errors can result or implementation extensions can be triggered."
The fact that this mentions syntax errors implies that the statement
in 2.13.1 was intended only to apply to patterns that are used directly
in shell commands.

> This would disallow the ksh extensions (other than where they would be a
> syntax error) everywhere, including fnmatch() and utilities doing pattern
> matching, if I am reading it correctly. If so, the pax example in the
> rationale I referenced, the one that shows or at least suggests that ( needs
> to be escaped, could use updating too:
> 
> >pax -r ... "*a\(\?"
> >
> >to extract a filename ending with "a(?".
> 
> could be changed to
> 
> >pax -r ... "*a\?"
> >
> >to extract a filename ending with "a?".
> 
> or even
> 
> >pax -r ... "*a(\?"
> >
> >to extract a filename ending with "a(?".
> 
> to be explicit about the new requirement.

I agree, this example should change.

> I think I see a small wording issue:
> 
> >   [...] If any character (ordinary, shell
> >special, or pattern special) is quoted, using either shell quoting
> >or (where shell quoting is not in effect) a  escape, that
> >pattern shall match the character itself. [...]
> 
> You excluded the bits in this proposal that would change the handling of
> backslash,

The email you replied to is not the complete proposed resolution of
bug 1234; it is just the parts relating to ksh extended glob patterns.

> Less important, under the current wording, backslash escapes the next
> character, it does not quote it. The requirements of quoting and escaping
> are the same, so perhaps it is okay to change the terminology.

Escaping is a form of quoting.  There are numerous places where the
standard uses "unquoted" to mean that a character is neither quoted
with single- or double-quotes nor escaped with a backslash.

So I don't see a problem with using "unquoted", with the clarification
in parentheses above to remove any doubt about whether it is intended
to include non-shell-quoting backslash escaping, in that context.

> Worth mentioning is that this change, and the recommendation to
> implementations to not implement extensions to pattern matching other than
> under non-standard options, contradicts the last comment on
> :
> 
> > During May 27 2010 conf call, general consensus is that ksh93 filename
> >generation appears to have many useful extensions, and we should move in
> >that direction. See
> >http://www2.research.att.com/sw/download/man/man1/ksh.html [^] for man
> >page details. New wording invited.

Well spotted.  We should update that bugnote in the light of the
issues that have come to light in this discussion.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-28 Thread Stephane Chazelas
2019-06-27 10:48:20 -0400, Chet Ramey:
> On 6/27/19 2:15 AM, Stephane Chazelas wrote:
> 
> > I could be convinced that it makes sense for the ksh93 X(...)
> > operators to be allowed if there was one non-anecdotal
> > implementation of fnmatch() that implemented it, but I don't
> > think there it. 
> 
> All glibc versions going back a number of years implement ksh and bash
> extended matching patterns with FNM_EXTMATCH.
[...]

Thanks. I wasn't aware of that.

It makes sense that it's with an optional flag so the decision
whether to use it rests with the tool using fnmatch() like for
the "extglob" option in bash or "kshglob" in zsh (the default
being shglob in sh mode).

I've not managed to find any software using that flag though.

I've tried building dash with --enable-fnmatch and that
FNM_EXTMATCH flag enabled.

Interesting to this discussion:

$ a='q)we' b='@(q\)we)' build-tmp/src/dash -c 'case $a in $b) echo yes; esac'
$ a='q)we' b='q\)we' build-tmp/src/dash -c 'case $a in $b) echo yes; esac'
yes
$ a='q)we' b='@(q[)]we)' build-tmp/src/dash -c 'case $a in $b) echo yes; esac'
yes
$ a='qwe)' b='@(q\)we)' build-tmp/src/dash -c 'case $a in $b) echo yes; esac'
yes

$ a='q)we' b='@(q\)we)' ksh93 -c 'case $a in $b) echo yes; esac'
yes
$ a='q)we' b='@(q[)]we)' ksh93 -c 'case $a in $b) echo yes; esac'
yes

$ a='q)we' b='@(q\)we)' bash -O extglob -c 'case $a in $b) echo yes; esac'
yes
$ a='q)we' b='@(q[)]we)' bash -O extglob -c 'case $a in $b) echo yes; esac'
yes

$ a='q)we' b='@(q\)we)' zsh -o globsubst -o kshglob -c 'case $a in $b) echo 
yes; esac'
yes
$ a='q)we' b='@(q[)]we)' zsh -o globsubst -o kshglob -c 'case $a in $b) echo 
yes; esac'
yes

[...] again is the portable (more reliable) way to escape a
character.

(same for @(a\|b) vs @(a[|]b))

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Harald van Dijk

On 27/06/2019 10:04, Geoff Clare wrote:

Stephane Chazelas  wrote, on 26 Jun 2019:


Or again, forget all about it and treat the ksh93 behaviour as
non-compliant as is already the case.


I'm starting to think that this is what we should do, given the number
of oddities you have identified and the potential to break existing
applications that use parentheses in find -name, fnmatch(), etc.

The primary aim (of those of us discussing the issue in teleconferences)
in resolving bug 1234 is consistency.  I was hoping that we could bring
some consistency between contexts where *(...) etc. are syntax errors in
POSIX and those where they aren't by limiting which cases can be
considered special.  But that doesn't look workable now.

So here's a new proposal which just clarifies that *(...) etc. can
only be special when they would otherwise be a syntax error.


I'm not objecting, but even if you limit it to this, it's still a 
change, not a clarification, no? It came as a surprise to some people, 
but I do not see anything ambiguous in the current standard.


This would disallow the ksh extensions (other than where they would be a 
syntax error) everywhere, including fnmatch() and utilities doing 
pattern matching, if I am reading it correctly. If so, the pax example 
in the rationale I referenced, the one that shows or at least suggests 
that ( needs to be escaped, could use updating too:



pax -r ... "*a\(\?"

to extract a filename ending with "a(?".


could be changed to


pax -r ... "*a\?"

to extract a filename ending with "a?".


or even


pax -r ... "*a(\?"

to extract a filename ending with "a(?".


to be explicit about the new requirement.

I think I see a small wording issue:


   [...] If any character (ordinary, shell
special, or pattern special) is quoted, using either shell quoting
or (where shell quoting is not in effect) a  escape, that
pattern shall match the character itself. [...]


You excluded the bits in this proposal that would change the handling of 
backslash, so the "(where shell quoting is not in effect)" doesn't look 
right. It also seems more important to include "using either shell 
quoting (where shell quoting is in effect) or [...]" to prevent someone 
from interpreting this as applying to


  find . -name '*.c'

Less important, under the current wording, backslash escapes the next 
character, it does not quote it. The requirements of quoting and 
escaping are the same, so perhaps it is okay to change the terminology.


Worth mentioning is that this change, and the recommendation to 
implementations to not implement extensions to pattern matching other 
than under non-standard options, contradicts the last comment on 
:


 During May 27 2010 conf call, general consensus is that ksh93 filename generation appears to have many useful extensions, and we should move in that direction. See http://www2.research.att.com/sw/download/man/man1/ksh.html [^] for man page details. New wording invited. 


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Harald van Dijk

On 27/06/2019 11:27, Joerg Schilling wrote:

Stephane Chazelas  wrote:

Hi,

thank you for starting a new discussion that is based on analysing the overall
results of the "proposed new behavior".


Today, by your reading of the spec and I agree it can be seen as
a valid reading, the spec is telling me that:

1.

a='\.'
printf '%s\n' $a

is a portable script that is meant to output "."


I know just one single shell that outputs "." with this code.

This is bash5. Note that POSIX is a portable source standard and other shells
that may behave like bash5 currently only compile and work on a single platform.


I had already informed you before this of two platforms my shell gets 
testing on.


That aside, I asked you last time you made this claim about POSIX to 
back it up. There is no requirement for standard utilities to be 
implemented portably. You responded then:



POSIX intends to create portability at source code level.

Code that is not portable does not follow the POSIX way.


That's not a requirement for POSIX implementations, so it's not relevant.

Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane Chazelas
2019-06-21 18:48:16 +, Austin Group Bug Tracker:
[...]
> There's another aspect which I haven't mentioned yet (I'll develop more on
> that later) where the bash5 behaviour is making things worse when character
> sets like BIG5, GB18030 that have characters that contain the encoding of
> backslash are involved. 
[...]

Sorry, I realise I forgot to follow-up on that.

My thinking was that the ASCII encoding of \ (0x5C) contrary to
other glob operators appears in many other characters in those
BIG5, BIG5HKSCS, GB18030, GBK charsets, but that's not actually
true as the encoding of [ and ] (0x5B and 0x5D) appear just as
often.

$ LC_ALL=zh_HK.big5hkscs luit
$ locale charmap
BIG5-HKSCS
$ touch η
$ a='αb' bash4 -c 'echo $a'
αb
$ a='αb' LC_ALL=C bash4 -c 'echo $a'
αb

$ a='αb' bash5 -c 'echo $a'
αb
$ a='αb' LC_ALL=C bash5 -c 'echo $a'
η

(where α is 0xa3 \ and η is 0xa3 b)

So the outputting of the content of a variable becomes dependent
on the locale. But anyway, it's already even worse with [ ] and
there's not much we can do about it except making sure no locale
with those charsets are available on our systems:

$ locale charmap
BIG5-HKSCS
$ a='Ωbβ' bash -c 'echo $a'
Ωbβ
$ a='Ωbβ' LC_ALL=C bash -c 'echo $a'
η
$ a='Ωbβ' dash -c 'echo $a'
η  (dash is not multi-byte aware)
$ zsh -c 'echo Ω'
zsh:1: no matches found: Ω (BUG)

(Ω is 0xa3 [ and β 0xa3 ])

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Chet Ramey
On 6/27/19 6:51 AM, Geoff Clare wrote:

>>> a='\**'
>>> printf '%s\n' $a
>>>
>>> is a portable script that is meant to list the filenames that
>>> start with "*" in the current directory
>>
>> See 1), there is just one shell that behaves this way.
> 
> And that shell is "bash" (not just "bash5").  All versions I tried do
> it (including bash3 on macOS).

This behavior has been in the bash pattern matcher since the pre-1.0
releases. The oldest version I have built is bash-2.05b, but the code
is there in previous versions.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Chet Ramey
On 6/27/19 2:15 AM, Stephane Chazelas wrote:

> I could be convinced that it makes sense for the ksh93 X(...)
> operators to be allowed if there was one non-anecdotal
> implementation of fnmatch() that implemented it, but I don't
> think there it. 

All glibc versions going back a number of years implement ksh and bash
extended matching patterns with FNM_EXTMATCH.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Geoff Clare
Joerg Schilling  wrote, on 27 Jun 2019:
>
> Geoff Clare  wrote:
> 
> > > > 2.
> > > >
> > > > a='\**'
> > > > printf '%s\n' $a
> > > >
> > > > is a portable script that is meant to list the filenames that
> > > > start with "*" in the current directory
> > > 
> > > See 1), there is just one shell that behaves this way.
> >
> > And that shell is "bash" (not just "bash5").  All versions I tried do
> > it (including bash3 on macOS).
> 
> OK, maybe you have something different in mind. Do you talk about this:
> 
> If there are the files "*abc.c" and "\abc.c" and you run the above command,
> then bash3 prints "*abc.c" while Bourne Shell ksh88 and ksh93 print "\abc.c".

Yes.

> This seems to be a result of the fact that the macro expansion doubles the 
> backslash before it is used for globbing and where quote removal is applied 
> after globbing.

Irrelevant internal detail. All that matters is that the result is
what POSIX requires.

> > This is simply not true in the case of POSIX.2-1992, and I have
> > corrected you on that before.  POSIX.2-1992 deliberately made a number
> > requirements that forced implementations to change, including some
> > that were invention (an obvious one being pax).
> 
> But pax is rarely used in contrary to tar and cannot be called a success 
> story.

I don't believe it's true that pax is rarely used, but in any case that's
not relevant to the point I was making, which is that you were wrong to
imply that POSIX.2-1992 did not invent anything.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane CHAZELAS
2019-06-27 14:04:18 +0200, Joerg Schilling:
[...]
> > And kresh (netbsd 8.1) and zsh in sh mode. In zshsh, that's
> 
> I cannot check "kresh" as it does not compile on UNIX.

Note that you can install NetBSD in a VM in a few minutes. I
just did that a few days ago to test that shell's behaviour.
You'd need to do something similar if you wanted to test Solaris
/usr/xpg4/bin/sh whose source code is not even available.

Whether it compiles on UNIX, whatever UNIX means is irrelevant,
the POSIX utilities don't have to be compiled let alone be
written in C let alone written in C and its source use the POSIX
API.

> > because \ is before a glob operator. And for all 3, there is
> > also another unquoted and unescaped * operator. Where zshsh
> > differs from the other 2 would be in:
> 
> With zsh, I get
> 
> \** 
> 
> for a directory that includes the files "\*abc.c" and "\abc.c".
> This does not seem to be correct.
> 
> If you talk about:
> 
>   ZSH_EMULATION=sh /usr/bin/zsh
> 
> when writing "zshsh", then this indeed prints *abc.c
[...]

Yes, I'm talking of zsh in sh emulation, I beleive I made that
clear in the email you're replying to.

When not in sh emulation, zsh doesn't do globbing nor word
splitting upon parameter expansion like most newer non-POSIX
shells (like rc, es, fish) as that's arguably a much better
design.

So even

var='*'
echo $var

would output * like in rc/es/fish. And of course:

var=(*)
echo $var

would list all the files in the current directory like
rc/es/fish ("set var *" in fish, and with variation in behaviour
between all when the glob doesn't match any file)

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Joerg Schilling
Stephane Chazelas  wrote:

> 2019-06-27 11:51:11 +0100, Geoff Clare:
> > Joerg Schilling  wrote, on 27 Jun 2019:
> [...]
> > > > 2.
> > > >
> > > > a='\**'
> > > > printf '%s\n' $a
> > > >
> > > > is a portable script that is meant to list the filenames that
> > > > start with "*" in the current directory
> > > 
> > > See 1), there is just one shell that behaves this way.
> > 
> > And that shell is "bash" (not just "bash5").  All versions I tried do
> > it (including bash3 on macOS).
>
> And kresh (netbsd 8.1) and zsh in sh mode. In zshsh, that's

I cannot check "kresh" as it does not compile on UNIX.

> because \ is before a glob operator. And for all 3, there is
> also another unquoted and unescaped * operator. Where zshsh
> differs from the other 2 would be in:

With zsh, I get

\** 

for a directory that includes the files "\*abc.c" and "\abc.c".
This does not seem to be correct.

If you talk about:

ZSH_EMULATION=sh /usr/bin/zsh

when writing "zshsh", then this indeed prints *abc.c

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Joerg Schilling
Geoff Clare  wrote:

> > > 2.
> > >
> > > a='\**'
> > > printf '%s\n' $a
> > >
> > > is a portable script that is meant to list the filenames that
> > > start with "*" in the current directory
> > 
> > See 1), there is just one shell that behaves this way.
>
> And that shell is "bash" (not just "bash5").  All versions I tried do
> it (including bash3 on macOS).

OK, maybe you have something different in mind. Do you talk about this:

If there are the files "*abc.c" and "\abc.c" and you run the above command,
then bash3 prints "*abc.c" while Bourne Shell ksh88 and ksh93 print "\abc.c".

This seems to be a result of the fact that the macro expansion doubles the 
backslash before it is used for globbing and where quote removal is applied 
after globbing.

The question here is whether POSIX should make a complex exception just in 
order to cause a specific result.

> This is simply not true in the case of POSIX.2-1992, and I have
> corrected you on that before.  POSIX.2-1992 deliberately made a number
> requirements that forced implementations to change, including some
> that were invention (an obvious one being pax).

But pax is rarely used in contrary to tar and cannot be called a success story.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane CHAZELAS
2019-06-27 12:27:55 +0200, Joerg Schilling:
[...]
> > 4 is portable in practice. 5 as well but only because of the
> > buggy fallback string comparison in ksh93.
> 
> So you wrote this because the shell that makes @ special also
> has the fallback?
[...]


Well, it may be tempting to suspect that ksh93 does the fallback
there for backward compatibility

So that 

a='@(foo)'; case $a in $a) echo yes; esac

outputs yes like it did in the Bourne shell or ksh88 which
didn't have or didn't enable that extended operator in that
case, but we know that fallback behaviour comes from Bourne
shell originally and predates ksh88.

same problem with

a='[a]'
case $a in $a) echo yes; esac

outputting yes in those cases.

But yes, I would say it's noteworthy to point-out that it's that
ksh fallback behaviour that has the side effect of making that
code more portable, if only so people don't get the wrong
impression that ksh93 disables that @(...) processing in that
case.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane Chazelas
2019-06-27 11:51:11 +0100, Geoff Clare:
> Joerg Schilling  wrote, on 27 Jun 2019:
[...]
> > > 2.
> > >
> > > a='\**'
> > > printf '%s\n' $a
> > >
> > > is a portable script that is meant to list the filenames that
> > > start with "*" in the current directory
> > 
> > See 1), there is just one shell that behaves this way.
> 
> And that shell is "bash" (not just "bash5").  All versions I tried do
> it (including bash3 on macOS).

And kresh (netbsd 8.1) and zsh in sh mode. In zshsh, that's
because \ is before a glob operator. And for all 3, there is
also another unquoted and unescaped * operator. Where zshsh
differs from the other 2 would be in:

a='\d*'
printf '%s\n' $a

Which in zshsh lists the filenames that start with \d and in
bash/kresh the filenames that start with d.

And again, where kresh differs from bash/zshsh would be in

a='\*/*'
printf '%s\n' $a

none of the 3 do globbing in:

a='\*'
printf '%s\n' $a

which is different from all other shells

While only those 3 (and Harald's shell, but I don't know that
Harald's shell is shipped with any system yet) do that second
level of backslash processing upon globbing, there are more
which do it for other cases of pattern matching (ksh93, dash,
busybox sh).

And only in bash5 is \ enough to trigger globbing (the "1" case)
(which at the moment can be seen as a bug (regression) as it's
not documented and so can easily be reverted).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Geoff Clare
Joerg Schilling  wrote, on 27 Jun 2019:
>
> Stephane Chazelas  wrote:
> 
> > Today, by your reading of the spec and I agree it can be seen as
> > a valid reading, the spec is telling me that:
> >
> > 1.
> >
> > a='\.'
> > printf '%s\n' $a
> >
> > is a portable script that is meant to output "."
> 
> I know just one single shell that outputs "." with this code.
> 
> This is bash5. Note that POSIX is a portable source standard and other shells
> that may behave like bash5 currently only compile and work on a single 
> platform.
> 
> My impression is that this is mainly supported by Geoff

That was my initial position, but we have moved on since then.  I am
willing to accept the compromise currently being discussed whereby
pathname expansion only happens when there is an unquoted '*', '?'
or '[' in the pattern, in which case the above would be required
to output '\.'  I updated the proposal in the etherpad accordingly.

> > 2.
> >
> > a='\**'
> > printf '%s\n' $a
> >
> > is a portable script that is meant to list the filenames that
> > start with "*" in the current directory
> 
> See 1), there is just one shell that behaves this way.

And that shell is "bash" (not just "bash5").  All versions I tried do
it (including bash3 on macOS).

> > 1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest
> > away from the truth. Only bash5 exhibits that behaviour and it's
> > evident it's a bad idea. It's evident that it was not the
> > intention of the spec as no shell at the time it was written did
> 
> This is very important, as POSIX does not claim to do own invention.

This is simply not true in the case of POSIX.2-1992, and I have
corrected you on that before.  POSIX.2-1992 deliberately made a number
requirements that forced implementations to change, including some
that were invention (an obvious one being pax).

> > 2 is slightly more portable, but even in those shells where it
> > does that, that's not because they implement \ processing the
> > way POSIX seems to specify it, and all do it a different way.
> > I'm not opposing POSIX *allows* a \ in an unquoted word
> > expansion to have a special meaning when it's preceding *, ? and
> > [ as that's what several implementations do and it's not
> > breaking that many common shell usages.
> 
> I see no real difference to 1). The only portable shell that behaves this way 
> is bash5.

No, all versions of bash back to at least 3.2 behave that way.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Joerg Schilling
Stephane Chazelas  wrote:

Hi,

thank you for starting a new discussion that is based on analysing the overall 
results of the "proposed new behavior".

> Today, by your reading of the spec and I agree it can be seen as
> a valid reading, the spec is telling me that:
>
> 1.
>
> a='\.'
> printf '%s\n' $a
>
> is a portable script that is meant to output "."

I know just one single shell that outputs "." with this code.

This is bash5. Note that POSIX is a portable source standard and other shells
that may behave like bash5 currently only compile and work on a single platform.

My impression is that this is mainly supported by Geoff but does not have a 
wider group of supporters. I would currently say that the related wording 
slipped into the POSIX standard and could be seen as a bug.

> 2.
>
> a='\**'
> printf '%s\n' $a
>
> is a portable script that is meant to list the filenames that
> start with "*" in the current directory

See 1), there is just one shell that behaves this way.


> 1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest
> away from the truth. Only bash5 exhibits that behaviour and it's
> evident it's a bad idea. It's evident that it was not the
> intention of the spec as no shell at the time it was written did

This is very important, as POSIX does not claim to do own invention.

> it. Even if POSIX made it very explicit that 1 is required to
> behave as described above, I could probably not call it a
> portable script in a million year, as I'd expect shell
> implementations would rather keep their backward
> compatibility than implement that unreasonable requirement
> (which IMO doesn't help at all with consistency). So the spec is
> wrong and needs to be fixed.

I support that.

> 2 is slightly more portable, but even in those shells where it
> does that, that's not because they implement \ processing the
> way POSIX seems to specify it, and all do it a different way.
> I'm not opposing POSIX *allows* a \ in an unquoted word
> expansion to have a special meaning when it's preceding *, ? and
> [ as that's what several implementations do and it's not
> breaking that many common shell usages.

I see no real difference to 1). The only portable shell that behaves this way 
is bash5.

Do you see a major difference because in 2) the backslash is before a glob 
character while it is before an ordinary character in 1)?

> 4 is portable in practice. 5 as well but only because of the
> buggy fallback string comparison in ksh93.

So you wrote this because the shell that makes @ special also has the fallback?

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Geoff Clare
Stephane Chazelas  wrote, on 26 Jun 2019:
>
> Or again, forget all about it and treat the ksh93 behaviour as
> non-compliant as is already the case.

I'm starting to think that this is what we should do, given the number
of oddities you have identified and the potential to break existing
applications that use parentheses in find -name, fnmatch(), etc.

The primary aim (of those of us discussing the issue in teleconferences)
in resolving bug 1234 is consistency.  I was hoping that we could bring
some consistency between contexts where *(...) etc. are syntax errors in
POSIX and those where they aren't by limiting which cases can be
considered special.  But that doesn't look workable now.

So here's a new proposal which just clarifies that *(...) etc. can
only be special when they would otherwise be a syntax error.

On page 2382 line 76216 section 2.13.1 change:

An ordinary character is a pattern that shall match itself. It
can be any character in the supported character set except for
NUL, those special shell characters in [xref to 2.2] that require
quoting, and the following three special pattern characters.
Matching shall be based on the bit pattern used for encoding the
character, not on the graphic representation of the character. If
any character (ordinary, shell special, or pattern special) is
quoted, that pattern shall match the character itself. The shell
special characters always require quoting.

to:

An ordinary character is a pattern that shall match itself. Where
characters within the pattern are affected by shell quoting, an
ordinary character can be any character in the supported character
set except for NUL, those special shell characters in [xref to 2.2]
that require quoting, and the three special pattern characters
described below. Where characters within the pattern are not
affected by shell quoting, an ordinary character can be any character
in the supported character set except for NUL and the three special
pattern characters described below. Matching shall be based on the
bit pattern used for encoding the character, not on the graphic
representation of the character. If any character (ordinary, shell
special, or pattern special) is quoted, using either shell quoting
or (where shell quoting is not in effect) a  escape, that
pattern shall match the character itself. The application shall
ensure that it quotes any character that would otherwise be treated
as special, in order for it to be matched as an ordinary character.

On page 3748 line 128698 section C.2.13.1 change:

Conforming applications are required to quote or escape the shell
special characters (sometimes called metacharacters). If used
without this protection, syntax errors can result or implementation
extensions can be triggered. For example, the KornShell supports a
series of extensions based on parentheses in patterns.

to:

Where characters within a pattern are affected by shell quoting,
conforming applications are required to quote the shell special
characters (sometimes called metacharacters). If used without this
protection, syntax errors can result or implementation extensions
can be triggered.  Some shells support a series of extensions based
on parentheses in patterns that are valid extensions in this
context because they would otherwise cause syntax errors.  However,
this means that they are not allowed by this standard to be
recognized in contexts where those syntax errors would not occur
anyway, such as in:

pattern='a*(b)'; ls -- $pattern

which this standard requires to list files with names beginning
'a' and ending "(b)".  It is recommended that implementations do
not extend pattern matching in the shell in ways that are only
valid extensions because they would otherwise be syntax errors, in
order to avoid inconsistency between different pattern matching
contexts.  One way to provide an extension that is consistent
between different pattern matching contexts in the shell (although
still not consistent with find -name, fnmatch(), etc.) is to enable
the extension only when a non-standard shell option is set, or
when the shell is executed using a command name other than sh.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane Chazelas
2019-06-27 08:59:29 +0100, Harald van Dijk:
[...]
> > 2 is slightly more portable, but even in those shells where it
> > does that, that's not because they implement \ processing the
> > way POSIX seems to specify it, and all do it a different way.
> > I'm not opposing POSIX *allows* a \ in an unquoted word
> > expansion to have a special meaning when it's preceding *, ? and
> > [ as that's what several implementations do and it's not
> > breaking that many common shell usages.
> 
> It should not be limited to when it's preceding any specific character,
> though. That is something no shell has done. Shells currently vary in
> whether backslash can function as an escape character during pattern
> matching, but when it can, it does not depend on which character follows it.

That's what zsh does (and did before POSIX). That's the
intention, but as mentioned earlier it's quite buggy
(https://www.zsh.org/mla/workers/2019/msg00465.html). And in
ksh93, again \d is not an escaped d but matches a digit. But I
agree we need to allow \ to be treated specially when in front
of a non-wildcard as that's what several implementations do.

But only when pattern matching is involved. That includes
pathname expansion, but pathname expansion should only be
performed when a words contains unquoted ?, [ or * (not "(" as
even ksh93 doesn't do it).

Also note that in netbsd8.1 sh, as already pointed out:

In:

var1='\foo/bar*'
ls -d -- $var1
var2='\foo-bar*'
ls -d -- $var2

\ is only considered an escape operator in the var2 case as the
var1 case splits the word on / and the first part doesn't
contain an unquoted [, ?, *.

That would still be allowed if we made it unspecified what \x
does when a word contains an unquoted */?/[.

That also applies to:

var='\*'
ls -d -- $var

[...]
> If there is no fnmatch() implementation that behaves that way, then agreed
> that it makes sense to just specify that. That pax example in the rationale
> should then also be changed to not escape any parenthesis.
> 
> What did this pax example come from, though? Was that based on a real pax
> implementation that did have special treatment of parentheses, not just an
> invention?
[...]

That's something I also wondered.

I do have a vague recollection that some early glob()
implementations were actually calling "sh" to expand globs (so
for instance glob("*`reboot`*") would reboot (which is currently
allowed by the spec as ` is a shell special character). Could it
be linked to that? I wouldn't expect it to apply to fnmatch()
though.

Or maybe I'm confusing with perl globs that used to call csh.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Harald van Dijk

On 27/06/2019 07:15, Stephane Chazelas wrote:

2019-06-26 23:56:06 +0100, Harald van Dijk:
[...]

You are proposing a fundamental change to the design of pattern matching,
not a clarification as you previously called it, and you are now discussing
how to allow the behaviour of one specific shell that does not behave the
way you like, but not the other shells that also do not behave the way you
like, when those other shells were not only changed intentionally to get
more consistent behaviour, at least in my case as the result of a user
request, but also because that more consistent behaviour is required by the
current version of POSIX, solely because of theoretical problems with file
names specifically crafted to break scripts, file names that are not
actually used in the wild.

[...]

I'm not a shell implementer. I'm on the side of the application
writer, I want to be able to write portable shell scripts, and
POSIX (*Portable* Operating System *Interface*) is meant to work
for me. It's meant to tell me what I can and cannot write in my
script and the behaviour to expect. It's meant to help you the
implementer write your shell so that it can interpret my
portable script the way it's meant to.


Oh, I agree that there is a bug. Given that most shells do not behave 
the way POSIX specifies, POSIX should not be requiring that behaviour. 
However, if you wait until after some shells have already implemented 
what is specified, it's too late then to just change the rules to forbid 
it. Your logic works both ways: now those shells have to be taken into 
account. It is not reasonable for POSIX to say that uses are portable 
that in fact are not, or no longer are.


But in fact although the wording you talked about so far did not include 
it, you did raise that point already in your 26/06/2019 14:39 +01:00 
message:



So the only characters that need quoted (or put inside [...]
when the pattern is in the result of some word expansion --
remember that you need to move tha backslash processing out of
the shell pattern matching as its a fnmatch()/glob() thing only)
are ?, [, * and also \ to accomodate shells that have implemented
some form or another of special processing of \ independently of
quoting and (, and ) to accomodate ksh93 (in pattern matching
only, those are not a problem in pathname expansion).


Sorry for missing it the first time.


Today, by your reading of the spec and I agree it can be seen as
a valid reading, the spec is telling me that:

1.

a='\.'
printf '%s\n' $a

is a portable script that is meant to output "."

2.

a='\**'
printf '%s\n' $a

is a portable script that is meant to list the filenames that
start with "*" in the current directory

3.

pattern='*;*'
case $var in ($pattern) echo yes; esac

is a non-standard, non-portable script with unspecified
behaviour because shell implementations are free to use that ";"
as an extended glob operator.

4.

string='@(foo)'
echo $string

is a non-standard, non-portable script which is not guaranteed
to output @(foo).

5.

string='@(foo)'
case $string in $string) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".

6.

pattern='@(*)'
case "@(foo)" in $pattern) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".


Agreed with all of these that that is what I believe POSIX currently 
specifies.



1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest
away from the truth. Only bash5 exhibits that behaviour and it's
evident it's a bad idea.


If you accept that unquoted backslash behaves that way in some shells, 
even before bash 5, then changing the shell to always treat unquoted 
backslash the same way makes the shell behaviour easier to understand. I 
consider it an improvement over backslash's meaning changing in ways 
that were hard to predict.



 It's evident that it was not the
intention of the spec as no shell at the time it was written did
it. Even if POSIX made it very explicit that 1 is required to
behave as described above, I could probably not call it a
portable script in a million year, as I'd expect shell
implementations would rather keep their backward
compatibility than implement that unreasonable requirement
(which IMO doesn't help at all with consistency). So the spec is
wrong and needs to be fixed.


Yes, to document current practice, the spec should effectively say in 
some way that whether and if so to what extent backslash can act as an 
escape character (in addition to a quote character) in shells is 
unspecified.



2 is slightly more portable, but even in those shells where it
does that, that's not because they implement \ processing the
way POSIX seems to specify it, and all do it a different way.
I'm not opposing POSIX *allows* a \ in an unquoted word
expansion to have a special meaning when it's preceding *, ? and
[ as that's what several implementations do and it's not
breaking that many common shell 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane Chazelas
2019-06-26 23:56:06 +0100, Harald van Dijk:
[...]
> You are proposing a fundamental change to the design of pattern matching,
> not a clarification as you previously called it, and you are now discussing
> how to allow the behaviour of one specific shell that does not behave the
> way you like, but not the other shells that also do not behave the way you
> like, when those other shells were not only changed intentionally to get
> more consistent behaviour, at least in my case as the result of a user
> request, but also because that more consistent behaviour is required by the
> current version of POSIX, solely because of theoretical problems with file
> names specifically crafted to break scripts, file names that are not
> actually used in the wild.
[...]

I'm not a shell implementer. I'm on the side of the application
writer, I want to be able to write portable shell scripts, and
POSIX (*Portable* Operating System *Interface*) is meant to work
for me. It's meant to tell me what I can and cannot write in my
script and the behaviour to expect. It's meant to help you the
implementer write your shell so that it can interpret my
portable script the way it's meant to.

Today, by your reading of the spec and I agree it can be seen as
a valid reading, the spec is telling me that:

1.

a='\.'
printf '%s\n' $a

is a portable script that is meant to output "."

2.

a='\**'
printf '%s\n' $a

is a portable script that is meant to list the filenames that
start with "*" in the current directory

3.

pattern='*;*'
case $var in ($pattern) echo yes; esac

is a non-standard, non-portable script with unspecified
behaviour because shell implementations are free to use that ";"
as an extended glob operator.

4.

string='@(foo)'
echo $string

is a non-standard, non-portable script which is not guaranteed
to output @(foo).

5.

string='@(foo)'
case $string in $string) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".

6.

pattern='@(*)'
case "@(foo)" in $pattern) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".


1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest
away from the truth. Only bash5 exhibits that behaviour and it's
evident it's a bad idea. It's evident that it was not the
intention of the spec as no shell at the time it was written did
it. Even if POSIX made it very explicit that 1 is required to
behave as described above, I could probably not call it a
portable script in a million year, as I'd expect shell
implementations would rather keep their backward
compatibility than implement that unreasonable requirement
(which IMO doesn't help at all with consistency). So the spec is
wrong and needs to be fixed.

2 is slightly more portable, but even in those shells where it
does that, that's not because they implement \ processing the
way POSIX seems to specify it, and all do it a different way.
I'm not opposing POSIX *allows* a \ in an unquoted word
expansion to have a special meaning when it's preceding *, ? and
[ as that's what several implementations do and it's not
breaking that many common shell usages.

3 is portable in practice. And I should be able to rely on it.
I'd rather POSIX doesn't open the door for a shell (or
fnmatch()...) to choose ; to be a new glob operator, I would
rather the sh glob operators stay ?, [] and * (and \ now added
because of those shells that treat it specially), so I know
which to escape (with quoting (or \ in fnmatch()) or [...] when
in word expansions) or to look out for. Several shells have some
of those operators but they are not enabled in posix/sh mode so
they interpret sh scripts like sh is meant to.

4 is portable in practice. 5 as well but only because of the
buggy fallback string comparison in ksh93.

6 is the only one that is true. Yes, there is *one* shell (a
shell generally considered "experimental" and not in wide use)
where that won't work as expected (won't output yes) as that's
one case where ksh93's extended glob operator is conflicting
with sh compatibility. It's not consistent with 4 there. Geoff's
proposing to fix that inconsistency to allow that operator to be
used for pathname expansion, but I believe it would be more
reasonable to fix it by not allowing it for "case" (make 6 a
portable script again) to make the standard consistent and
clear. Then ksh93 could enable those extended operators wherever
it likes when called as ksh, but not when called as sh (at least
not in the result of word expansions; basically reverting to
ksh88 behaviour).

I could be convinced that it makes sense for the ksh93 X(...)
operators to be allowed if there was one non-anecdotal
implementation of fnmatch() that implemented it, but I don't
think there it. find implementations usually have a -regex
predicate to do things that basic globs can't do instead.

I also like the idea of opening up a way for shell wildcards to
be extended in the future, but it's a dangerous business. Today
in 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-26 Thread Harald van Dijk

On 26/06/2019 20:42, Stephane Chazelas wrote:

2019-06-26 17:24:49 +0100, Stephane Chazelas:

2019-06-26 15:32:44 +0100, Geoff Clare:

[...]

That could be interpreted as implying that a sequence that
includes a ( followed by two unquoted ) is required *not* to be
treated specially.

Yet, @(@(x)) is still special in ksh93, and the extended
operator spans up to the second unquoted ), @(@(x) alone is not
valid.

It would be probably be better to simplify it to:

  immediately followed by an optional unquoted '-', then an
  unquoted '(', then *zero* or more characters then an
  unquoted ')'.  The special meaning of any such sequences
  shall be implementation-defined.

[...]

Anyway,

a='@(*' ksh -c 'case "@(foo" in $a) echo yes; esac'

doesn't return "yes", so, no need to look for the closing ).

We'd need something like

The behaviour is unspecified if the pattern contains an unquoted
( that is not inside a bracket expression and is preceded by
those @/{n}/+/*... and optional unquoted -


I honestly do not understand why this is being considered.

You are proposing a fundamental change to the design of pattern 
matching, not a clarification as you previously called it, and you are 
now discussing how to allow the behaviour of one specific shell that 
does not behave the way you like, but not the other shells that also do 
not behave the way you like, when those other shells were not only 
changed intentionally to get more consistent behaviour, at least in my 
case as the result of a user request, but also because that more 
consistent behaviour is required by the current version of POSIX, solely 
because of theoretical problems with file names specifically crafted to 
break scripts, file names that are not actually used in the wild.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-26 Thread Stephane Chazelas
2019-06-26 15:32:44 +0100, Geoff Clare:
[...]
> * the unquoted sequence "{n}" where n consists of one or more digits
> 
> * the unquoted sequence "{m,n}" where m and n consist of zero or more
>   digits
[...]

More like:

 * the sequence "{n}" where n consists of one or more digits and
   neither { nor } is quoted.

$ a='(x)' ksh93 -c 'case x in {"1"}$a) echo yes; esac'
yes

Do we really want to accomodate ksh93 here though? It's the only
shell that causes problem. ksh88, bash, zsh, pdksh, mksh also
support some of those extended glob operators but each in their
own way don't impact sh compatibility. And for ksh93, that's
only for a corner case of pattern matching on word expansions,
not even for pathname expansion, so it seems overkill to put
pathname expansion, glob(), fnmatch(), find/pax in the same
basket when those don't have the problem in practice.

ksh93 has more extensions that break backward compatibility
like the already mentioned:

$ p='\d' ksh93 -c 'case 1 in $p) echo yes; esac'
yes

But when it comes to backslash it's not the only shell that
broke sh compatibility (as covered at length in this discussion)
and we'll have to leave a good deal of behaviour unspecified
around that anyway, so that one could be accomodated more
easily I would think.

We've already decided to consider its "fallback" behaviour
non-conforming. It seems more reasonable to do the same for
that corner case of ksh93's behaviour (which most probably no sh
script has ever relied on) here.

[...]
> > {n,m} is also special in ksh93 (and pdksh and derivatives) even
> > if n and m are not numbers
> 
> Brace expansion is a separate issue, which was raised a while ago.

Yes, though it explains why {} have to be unquoted. Did we not
rule the ksh93/pdksh behaviour non-conforming then? Note that
mksh doesn't do it when the "posix" option is enabled.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-26 Thread Geoff Clare
Stephane Chazelas  wrote, on 26 Jun 2019:
>
> 2019-06-26 12:24:21 +0100, Geoff Clare:
> [...]
> > On page 2383 line 76232 section 2.13.1 insert a new paragraph:
> > 
> > Implementations may also treat as a special pattern a sequence of
> > characters consisting of one of the following, unquoted and not
> > inside a bracket expression:
> > 
> > * a '?', '*', '+', '@', '!', '%' or '~' character
> > 
> > * the sequence "{n}" where n consists of one or more digits
> > 
> > * the sequence "{m,n}" where m and n consist of zero or more digits
> > 
> > immediately followed by an optional unquoted '-', then an unquoted
> > '(', then one or more characters that do not include an unquoted ')',
> > then an unquoted ')'.  The special meaning of any such sequences
> > shall be implementation-defined.
> [...]
> 
> In ksh93, it doesn't matter whether the +, @, !, %, ~ are quoted
> or not as long as "(" and ")" are unquoted (that's different for
> ?(...) and *(...)).

I suppose that sort of makes sense, given that +, @, !, %, ~ would
not be treated specially if they weren't followed by (...).  It makes
it more awkward to describe in standardese, though!

Second attempt ...

Implementations may also treat as a special pattern a sequence of
characters consisting of one of the following, not inside a bracket
expression:

* an unquoted '?' or '*' character

* a (quoted or unquoted) '+', '@', '!', '%' or '~' character

* the unquoted sequence "{n}" where n consists of one or more digits

* the unquoted sequence "{m,n}" where m and n consist of zero or more
  digits

immediately followed by an optional unquoted '-', then an unquoted
'(', then one or more characters that do not include an unquoted ')',
then an unquoted ')'.  The special meaning of any such sequences
shall be implementation-defined.

> {n,m} is also special in ksh93 (and pdksh and derivatives) even
> if n and m are not numbers

Brace expansion is a separate issue, which was raised a while ago.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-26 Thread Stephane Chazelas
2019-06-26 12:24:21 +0100, Geoff Clare:
[...]
> On page 2383 line 76232 section 2.13.1 insert a new paragraph:
> 
> Implementations may also treat as a special pattern a sequence of
> characters consisting of one of the following, unquoted and not
> inside a bracket expression:
> 
> * a '?', '*', '+', '@', '!', '%' or '~' character
> 
> * the sequence "{n}" where n consists of one or more digits
> 
> * the sequence "{m,n}" where m and n consist of zero or more digits
> 
> immediately followed by an optional unquoted '-', then an unquoted
> '(', then one or more characters that do not include an unquoted ')',
> then an unquoted ')'.  The special meaning of any such sequences
> shall be implementation-defined.
[...]

In ksh93, it doesn't matter whether the +, @, !, %, ~ are quoted
or not as long as "(" and ")" are unquoted (that's different for
?(...) and *(...)).

$ a='(x)' ksh93 -c 'case x in ( "@"$a ) echo yes; esac'
yes

So the only characters that need quoted (or put inside [...]
when the pattern is in the result of some word expansion --
remember that you need to move tha backslash processing out of
the shell pattern matching as its a fnmatch()/glob() thing only)
are ?, [, * and also \ to accomodate shells that have implemented
some form or another of special processing of \ independently of
quoting and (, and ) to accomodate ksh93 (in pattern matching
only, those are not a problem in pathname expansion).

{n,m} is also special in ksh93 (and pdksh and derivatives) even
if n and m are not numbers like in:

$ a='{n,m}' ksh93 -c 'echo $a'
n m

But those are only fixable with quoting:

$ a='{n,m}' ksh93 -c 'echo "$a"'
{n,m}

Not (non-quoting) backslash (ksh93 does do some backslash
interpretation in wildcards, but not those used for pathname
expansion).

$ a='\{n,m}' ksh93 -c 'echo $a'
\n \m

Not [...]:
$ a='[{]n,m}' ksh93  -c 'echo $a'
[]n [m

You can disable those with noglob even though they're not really
globbing operators (they don't cause the shell to read the
directory to find matches).

$ a='{n,m}' ksh93 -o noglob -c 'echo $a'
{n,m}

AFAICT, you can't store a pattern in a variable to match
filenames that start with {,}.

None of those will work:

touch '{,}foo' '{,}bar'
pattern='{,}*'
pattern='\{,}*'
pattern='[{],}*'
ls -ld -- $pattern

You'd need eval:

pattern="'{,}'*"
eval "ls -ld -- $pattern"

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-26 Thread Geoff Clare
Stephane Chazelas  wrote, on 25 Jun 2019:

> That text was probably only to accomodate ast-open/ksh, and it
> again overlooked the case of globs in word expansions. We
> should be able to limit the damage to ( and \ when not inside
> bracket expressions

Here's an attempt at fixing this issue along the lines you suggest,
limiting the damage as much as possible by specifying the patterns
involving '(' that can be treated as special.

On page 2382 line 76216 section 2.13.1 change:

An ordinary character is a pattern that shall match itself. It
can be any character in the supported character set except for
NUL, those special shell characters in [xref to 2.2] that require
quoting, and the following three special pattern characters.
Matching shall be based on the bit pattern used for encoding the
character, not on the graphic representation of the character. If
any character (ordinary, shell special, or pattern special) is
quoted, that pattern shall match the character itself. The shell
special characters always require quoting.

to:

An ordinary character is a pattern that shall match itself. Where
characters within the pattern are affected by shell quoting, an
ordinary character can be any character in the supported character
set except for NUL, those special shell characters in [xref to 2.2]
that require quoting, and characters used in the special patterns
described below. Where characters within the pattern are not
affected by shell quoting, an ordinary character can be any character
in the supported character set except for NUL and characters used in
the special patterns described below. Matching shall be based on
the bit pattern used for encoding the character, not on the graphic
representation of the character. If any character (ordinary, shell
special, or pattern special) is quoted, using either shell quoting
or (where shell quoting is not in effect) a  escape, that
pattern shall match the character itself. The application shall
ensure that it quotes any character that would otherwise be treated
as special, in order for it to be matched as an ordinary character.

On page 2383 line 76232 section 2.13.1 insert a new paragraph:

Implementations may also treat as a special pattern a sequence of
characters consisting of one of the following, unquoted and not
inside a bracket expression:

* a '?', '*', '+', '@', '!', '%' or '~' character

* the sequence "{n}" where n consists of one or more digits

* the sequence "{m,n}" where m and n consist of zero or more digits

immediately followed by an optional unquoted '-', then an unquoted
'(', then one or more characters that do not include an unquoted ')',
then an unquoted ')'.  The special meaning of any such sequences
shall be implementation-defined.

On page 3748 line 128698 section C.2.13.1 change:

Conforming applications are required to quote or escape the shell
special characters (sometimes called metacharacters). If used
without this protection, syntax errors can result or implementation
extensions can be triggered. For example, the KornShell supports a
series of extensions based on parentheses in patterns.

to:

Where characters within a pattern are affected by shell quoting,
conforming applications are required to quote the shell special
characters (sometimes called metacharacters). If used without this
protection, syntax errors can result or implementation extensions
can be triggered.  Historically the Korn Shell supported a series
of extensions based on parentheses in patterns that were valid
extensions because they would otherwise cause syntax errors.
However, this meant that they could not be used in contexts where
those syntax errors would not occur anyway, such as in:

pattern='a*(b)'; ls $pattern

which earlier versions of this standard required to list files
with names beginning 'a' and ending "(b)".  This standard now
allows, but does not require, these historical Korn Shell extended
patterns to be recognised in all pattern matching contexts so that
implementations can provide consistency.  It is recommended that
implementations do not extend pattern matching in the shell in
ways that are only valid extensions because they would otherwise
be syntax errors, in order to avoid inconsistency between
different pattern matching contexts.

> (and in the case of pathname generation,
> clarify that the presence of  \ and ( alone is not enough to
> trigger pathname expansion).

I think it can be worded in a way that makes that obvious without
actually mentioning it explicitly.  E.g.:

On page 2384 line 76271 section 2.13.3, change:

3. Specified patterns shall be matched against existing filenames
and pathnames, as appropriate.

to:

3. If a specified pattern contains any '*', '?', '[' or '(' characters
that 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-25 Thread Stephane Chazelas
2019-06-25 08:51:38 +0100, Harald van Dijk:
[...]
> > IFS='|'
> > text='foo|bar'
> > for i in $text; do...
> 
> All of these are fine, since it is about shell special characters during
> pattern matching. All the special characters in your examples are removed
> before pattern matching starts.

Well, then make it:

text='foo|bar'
echo $text

or:

IFS=
text='foo bar'
echo $text

or

text='Blah blah; blah blah.'
echo $text

or

echo $BASH_VERSION

(BASH_VERSION containing something like 4.4.19(1)-release)

A position that states that leaving parameter expansions
unquoted if they contain "shell special characters" (unless found
in $IFS) leads to unspecified results is not tenable (and
probably contradicts other parts of the standard and examples in
the spec).

> > "special character" is also not defined (or in the "Definitions"
> > chapter refers to something different). * and ? are also
> > refered to as special characters later on in the spec.
> 
> True, it should be defined, but the rationale makes it clear that
> parentheses are considered to be shell special characters in the context of
> pattern matching. See xrat/V4_xcu_chap02.html:
> 
> > To find a filename that contained shell special characters or pattern 
> > characters, both quoting and escaping are required, such as:
> > 
> > pax -r ... "*a\(\?"
> > 
> > to extract a filename ending with "a(?".
> 
> There is nothing that makes an exception for unquoted characters coming from
> expansions, and there should not be.
[...]

As the examples above show, there clearly should.

So now POSIX needs to clearly specify which are those "special
characters" that need quoted and escaped and in which
circumstance.

That text was probably only to accomodate ast-open/ksh, and it
again overlooked the case of globs in word expansions. We
should be able to limit the damage to ( and \ when not inside
bracket expressions (and in the case of pathname generation,
clarify that the presence of  \ and ( alone is not enough to
trigger pathname expansion).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-25 Thread Harald van Dijk

On 25/06/2019 08:30, Stephane Chazelas wrote:

2019-06-24 21:56:48 +0100, Harald van Dijk:

On 24/06/2019 21:15, Stephane Chazelas wrote:

But that means that those ksh extended glob operators are not
enabled in:

pattern='@(x)'; cmd $pattern
or
case string in $pattern) ...

(for the latter, that changed in ksh93 which makes it
non-compliant; ksh88, pdksh, mksh are still OK).


I do not see how it makes ksh93 non-compliant. Any use of this violates
2.13.1's "The shell special characters always require quoting.", which is a
requirement on applications. As such, shells are free to interpret it in
whatever way they wish, and consideration should be given to this extension
when coming up with new wording for POSIX.

[...]

That's not what it means. If it did, that would mean things
like:

text='foo bar'
echo $text

would be unspecified because of that "unquoted special
character" (space) after expansion (and unquoted $ before
expansion). Same for:

IFS='|'
text='foo|bar'
for i in $text; do...


All of these are fine, since it is about shell special characters during 
pattern matching. All the special characters in your examples are 
removed before pattern matching starts.



"special character" is also not defined (or in the "Definitions"
chapter refers to something different). * and ? are also
refered to as special characters later on in the spec.


True, it should be defined, but the rationale makes it clear that 
parentheses are considered to be shell special characters in the context 
of pattern matching. See xrat/V4_xcu_chap02.html:



To find a filename that contained shell special characters or pattern 
characters, both quoting and escaping are required, such as:

pax -r ... "*a\(\?"

to extract a filename ending with "a(?".


There is nothing that makes an exception for unquoted characters coming 
from expansions, and there should not be.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-25 Thread Stephane Chazelas
2019-06-24 21:56:48 +0100, Harald van Dijk:
> On 24/06/2019 21:15, Stephane Chazelas wrote:
> > But that means that those ksh extended glob operators are not
> > enabled in:
> > 
> > pattern='@(x)'; cmd $pattern
> > or
> > case string in $pattern) ...
> > 
> > (for the latter, that changed in ksh93 which makes it
> > non-compliant; ksh88, pdksh, mksh are still OK).
> 
> I do not see how it makes ksh93 non-compliant. Any use of this violates
> 2.13.1's "The shell special characters always require quoting.", which is a
> requirement on applications. As such, shells are free to interpret it in
> whatever way they wish, and consideration should be given to this extension
> when coming up with new wording for POSIX.
[...]

That's not what it means. If it did, that would mean things
like:

text='foo bar'
echo $text

would be unspecified because of that "unquoted special
character" (space) after expansion (and unquoted $ before
expansion). Same for:

IFS='|'
text='foo|bar'
for i in $text; do...

"special character" is also not defined (or in the "Definitions"
chapter refers to something different). * and ? are also
refered to as special characters later on in the spec. That
sentence makes little sense as written.

SUSv2
(https://pubs.opengroup.org/onlinepubs/7908799/xcu/chap2.html#tag_001_013_001)
again, was slightly clearer there by being more verbose which
allowed the intention of the spec to be infered.

It had this paragraph:

} Conforming applications are required to quote or escape the
} shell special characters (sometimes called metacharacters). If
} used without this protection, syntax errors can result or
} implementation extensions can be triggered. For example, the
} KornShell supports a series of extensions based on parentheses
} in patterns. 

Which explicitly refers to the @(...) and co. ksh operators.

They meant that

echo @(x)
echo ${foo#@(x)}

are unspecified for instance

text='@(x)'
echo $text

is specified, and

text='@(*)'
echo $text

is specified (to expand to the filenames that start with "@("
and end in ")".

Again, in ksh88 (and pdksh and derivatives), those extended
operators are only recognised when literal and unquoted, not when
they're in the result of a word expansion.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Harald van Dijk

On 24/06/2019 21:15, Stephane Chazelas wrote:

But that means that those ksh extended glob operators are not
enabled in:

pattern='@(x)'; cmd $pattern
or
case string in $pattern) ...

(for the latter, that changed in ksh93 which makes it
non-compliant; ksh88, pdksh, mksh are still OK).


I do not see how it makes ksh93 non-compliant. Any use of this violates 
2.13.1's "The shell special characters always require quoting.", which 
is a requirement on applications. As such, shells are free to interpret 
it in whatever way they wish, and consideration should be given to this 
extension when coming up with new wording for POSIX.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Stephane Chazelas
2019-06-24 18:45:55 +0100, Harald van Dijk:
[...]
> This particular example is already not required to behave in any particular
> way for other reasons, but I do not know whether changes to this example
> might produce something where an overly strict requirement in POSIX would
> prohibit ksh's behaviour.
[...]

If we wanted to allow the ksh93 behaviour, where as said earlier

p='+(\d)\1'
case 22 in $p) echo x; esac

would output x but

case $p in $p) echo x; esac

wouldn't, we'd need to add ( to the list of characters like \
that must be matched with [(].

(case '+(d)1' in $p) echo x; esac
would output x because of the fall-back equality comparison
(here after removing the \ !) which we've decided to explicitly
forbid)

I suppose ksh93 made the decision to break backward
compatibility (like other shells broke backward compatibility
there by starting to treat \ specially) there because it's
otherwise very rare to use expansions in case patterns.

They could have allowed extended operators in globs and "case"
without having to break backward compatibility with

case $string in @($p)...

or

cmd @($p)

but it doesn't look like they did (unless I'm missing something
as I have a vague recollection that it was how you could use
extended operators in that case there).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Stephane Chazelas
2019-06-24 18:45:55 +0100, Harald van Dijk:
[...]
> FWIW, that is not what ksh implements and it might be an unreasonable
> requirement on ksh. From its manpage:
> 
> > Following  splitting, each field is scanned for the characters ∗, ?, (,
> > and [ unless the -f option has been set. [...]
> 
> Some of ksh's special globbing features do not feature the regular
> metacharacters:
> 
>   rm -f a b
>   touch a
>   echo ~(N)a ~(N)b
> 
> This is supposed to produce 'a'.
> 
> This particular example is already not required to behave in any particular
> way for other reasons, but I do not know whether changes to this example
> might produce something where an overly strict requirement in POSIX would
> prohibit ksh's behaviour.
[...]

Yes, that is not relevant because the behaviour for things like

echo @(a)

is already unspecified. That's why ksh chose that x(...) syntax
for those operators, to keep backward compatibility with the
Bourne shell.

ksh's *(x) is more awkward to type than zsh's x#, but zsh's # is
only enabled with an "extendedglob" option, while ksh's one is
always enabled as *(x) would be a syntax error in the Bourne
shell.

But that means that those ksh extended glob operators are not
enabled in:

pattern='@(x)'; cmd $pattern
or
case string in $pattern) ...

(for the latter, that changed in ksh93 which makes it
non-compliant; ksh88, pdksh, mksh are still OK).

zsh's (x|y) or *(qualifiers) are enabled without extendedglob,
but not when the shglob is enabled (like in sh emulation).

bash also implements the ksh88 extended glob operators with the
extglob option, which are then enabled regardless of whether the
glob is literal or comes from a word expansion.

In any case, except for ksh93 which is already non-compliant, I
don't think any of those would cause problem for my proposed
"intention".

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Chet Ramey
On 6/24/19 1:53 PM, Harald van Dijk wrote:
> On 24/06/2019 15:16, Chet Ramey wrote:
>> On 6/22/19 8:57 AM, Harald van Dijk wrote:
>>
 But in bash5's

 files='/a/\b/??/x/*'
 ls -d $files

 That \ becomes a globbing operator, so we get the same list of
 files as in a literal /a/[b]/??/x/*, not a literal /a/\b/??/x/*
>>>
>>> That doesn't sound right. The backslash is removed per 2.13.1, and then the
>>> path component is just "b". This does not contain a "pattern character", so
>>> should not require search permission. I expect this to match the same thing
>>> as /a/b/??/x/*, and both in my shell and in bash that is what I see. Has
>>> this changed in one of the post-5.0 bash patches?
>>
>> Bash-5.0 patch 3 made some changes here; what version are you using?
> 
> I was checking bash 5.0 without patches, but I see this same behaviour in
> bash 4.4 patch 23, 5.0 without patches, and 5.0 patch 7.

I think what you're seeing is correct.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Harald van Dijk

On 22/06/2019 06:33, Stephane Chazelas wrote:

2019-06-21 11:15:51 +0100, Geoff Clare:
[...]

if test -n "$BASH_VERSION"; then
   eval 'as_f_echo() { printf "%s\n" "$@"; }'
   as_echo=as_f_echo
fi


Probably simpler just to put "set -f" at the top of the configure
script.  (And if globbing is needed at any point, turn it back on
temporarily.)


set -f is for something else in zsh unless in sh emulation and
you can see that script aims to also support zsh when not in sh
emulation.


One of the first thing configure scripts do is check whether 'emulate 
sh' is a supported command and execute it if so. That is enough to get 
zsh to enter POSIX mode and accept 'set -f' with the standard meaning.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Harald van Dijk

On 24/06/2019 15:16, Chet Ramey wrote:

On 6/22/19 8:57 AM, Harald van Dijk wrote:


But in bash5's

files='/a/\b/??/x/*'
ls -d $files

That \ becomes a globbing operator, so we get the same list of
files as in a literal /a/[b]/??/x/*, not a literal /a/\b/??/x/*


That doesn't sound right. The backslash is removed per 2.13.1, and then the
path component is just "b". This does not contain a "pattern character", so
should not require search permission. I expect this to match the same thing
as /a/b/??/x/*, and both in my shell and in bash that is what I see. Has
this changed in one of the post-5.0 bash patches?


Bash-5.0 patch 3 made some changes here; what version are you using?


I was checking bash 5.0 without patches, but I see this same behaviour 
in bash 4.4 patch 23, 5.0 without patches, and 5.0 patch 7.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Harald van Dijk

On 24/06/2019 11:04, Joerg Schilling wrote:

Austin Group Bug Tracker  wrote:


Where you refer to "*every* shell (except bash5 ...)", that's inaccurate
because:

1. Robert Elz and Harald van Dijk have shells that behave like bash5.


I would be happy to check these shells, but unfortunately they do not compile
on a certified POSIX platform.

-   The first shell only seems to compile on NetBSD as it requires a
file  that is not present on UNIX systems.

-   The second shell is based on "configure" but does not include the
configure script and the delivered scripts fail to create a "configure"
script.


A current version of GNU autoconf & automake is required. If they are 
installed and working, the included autogen.sh will generate a configure 
script. This works on at least typical GNU/Linux systems and Mac OS X.


If you are having problems, please mail me (off list) with details.

Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Harald van Dijk

On 24/06/2019 16:52, Chet Ramey wrote:

On 6/24/19 11:51 AM, Stephane Chazelas wrote:

2019-06-24 09:48:21 -0400, Chet Ramey:

On 6/22/19 2:51 AM, Stephane Chazelas wrote:


For them, and me, and it seems Eric as well, globbing is an
operator that is invoked whenever a word contains an unquoted
wildcard character (in "list" contexts).


If you want the standard to say that, then propose language to make the
standard say it. It doesn't say that now, and that's the root of this
entire discussion. At this point, we're just talking in circles.

[...]

Before going in the details of the language, can we at least
agree on what the "intention" should be?


Your intention is obvious. It's in the part I quoted.

Pathname expansion is performed on words that contain an unquoted
`*', `?', or valid unquoted bracket expression.


FWIW, that is not what ksh implements and it might be an unreasonable 
requirement on ksh. From its manpage:



Following  splitting, each field is scanned for the characters ∗, ?, (,
and [ unless the -f option has been set. [...]


Some of ksh's special globbing features do not feature the regular 
metacharacters:


  rm -f a b
  touch a
  echo ~(N)a ~(N)b

This is supposed to produce 'a'.

This particular example is already not required to behave in any 
particular way for other reasons, but I do not know whether changes to 
this example might produce something where an overly strict requirement 
in POSIX would prohibit ksh's behaviour.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Stephane Chazelas
2019-06-24 11:52:56 -0400, Chet Ramey:
[...]
> > Before going in the details of the language, can we at least
> > agree on what the "intention" should be?
> 
> Your intention is obvious. It's in the part I quoted.
> 
> Pathname expansion is performed on words that contain an unquoted
> `*', `?', or valid unquoted bracket expression.
[...]

Yes, though there's the question of:

echo [qwe/qwe]

Which doesn't consitute a "valid unquoted bracket expression"
when used for globbing.

Yet:

$ bash -O nullglob -c 'echo [qwe/qwe]'

$ yash -o nullglob -c 'echo [qwe/qwe]'
[qwe/qwe]
$ mkdir -p '[qwe/qwe]'
$ bash -O nullglob -c 'echo [qwe/qwe]'
[qwe/qwe]


(for zsh, that's like [qwe] as / doesn't separate it into two
parts, which is not POSIX compliant (though maybe less
surprising)).

(those "nullglob" being only used here to see when the shell
does globbing).

The yash behaviour makes a lot more sense to me.

Now, are you OK with that "intention"? It seems to me that any
shell that implements a nullglob/failglob/nomatch/cshnullglob
type of option would agree with that "intention". And we would
be paving the way for those types of options to be added to the
standard in the future (and I don't think anyone would disagree
that it is useful to be able to tell when a glob matches or
not; there was a discussion about that lately).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Geoff Clare
Chet Ramey  wrote, on 24 Jun 2019:
>
> On 6/24/19 11:51 AM, Stephane Chazelas wrote:
> > 2019-06-24 09:48:21 -0400, Chet Ramey:
> >> On 6/22/19 2:51 AM, Stephane Chazelas wrote:
> >>
> >>> For them, and me, and it seems Eric as well, globbing is an
> >>> operator that is invoked whenever a word contains an unquoted
> >>> wildcard character (in "list" contexts).
> >>
> >> If you want the standard to say that, then propose language to make the
> >> standard say it. It doesn't say that now, and that's the root of this
> >> entire discussion. At this point, we're just talking in circles.
> > [...]
> > 
> > Before going in the details of the language, can we at least
> > agree on what the "intention" should be?
> 
> Your intention is obvious. It's in the part I quoted.
> 
> Pathname expansion is performed on words that contain an unquoted
> `*', `?', or valid unquoted bracket expression.

There is a complication involving invalid bracket expressions.  The
standard currently says:

If the pattern contains an open bracket ('[') that does not
introduce a bracket expression as in XBD Section 9.3.5, it is
unspecified whether other unquoted pattern matching characters
within the same slash-delimited component of the pattern retain
their special meanings or are treated as ordinary characters. For
example, the pattern "a*[/b*" may match all filenames beginning
with 'b' in the directory "a*[" or it may match all filenames
beginning with 'b' in all directories with names beginning with
'a' and ending with '['.

So if the intention is that pathname expansion is only performed for
patterns that contain some characters that will be treated as special,
then on implementations which take the "treated as ordinary characters"
option, any special characters that are treated as ordinary because of
this should not trigger pathname expansion.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-24 Thread Chet Ramey
On 6/24/19 11:51 AM, Stephane Chazelas wrote:
> 2019-06-24 09:48:21 -0400, Chet Ramey:
>> On 6/22/19 2:51 AM, Stephane Chazelas wrote:
>>
>>> For them, and me, and it seems Eric as well, globbing is an
>>> operator that is invoked whenever a word contains an unquoted
>>> wildcard character (in "list" contexts).
>>
>> If you want the standard to say that, then propose language to make the
>> standard say it. It doesn't say that now, and that's the root of this
>> entire discussion. At this point, we're just talking in circles.
> [...]
> 
> Before going in the details of the language, can we at least
> agree on what the "intention" should be?

Your intention is obvious. It's in the part I quoted.

Pathname expansion is performed on words that contain an unquoted
`*', `?', or valid unquoted bracket expression.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



  1   2   >