Re: Should shell quoting within glob bracket patterns be effective?

2018-04-17 Thread Geoff Clare
Geoff Clare  wrote, on 16 Apr 2018:
>
> Robert Elz  wrote, on 13 Apr 2018:
> >
> > Date:Fri, 13 Apr 2018 15:07:07 +0100
> > From:Geoff Clare 
> > 
> >   | For those the only difference from REs is the '^' -> '!'  one,
> > 
> > Not for fnmatch() which can have \ to escape characters (anywhere
> > according to its description, which would include in bracket expressions,
> > as that is not excluded.
> 
> Clearly the statement in XBD 9.3.5:
> 
> The special characters '.', '*', '[', and '\\' ( ,
> , , and , respectively)
> shall lose their special meaning within a bracket expression.
> 
> is intended to apply to backslashes in fnmatch(), just as it does to
> the special meaning of backslash stated in XCU 2.13.1 (which also
> doesn't mention an exception for bracket expressions).
> 
> The whole point of adding fnmatch() to the standard was to provide a
> a function which implements XCU 2.13, so any interpretation of the
> standard which has backslash being treated differently in fnmatch()
> (without FNM_NOESCAPE) than in XCU 2.13 cannot be correct.

I tested some implementations of fnmatch() using the program below.

Solaris and HP-UX do not treat backslash as special in bracket
expressions.

MacOS and Linux (glibc) DO treat backslash as special in bracket
expressions.  However, in both cases this behaviour is inconsistent
with the behaviour of find -name on the same system, and so should be
considered to be a bug in fnmatch() for those implementations.

Conclusion: the new description of backslash handling for fnmatch()
in the resolution of bug 985 is correct and should remain as it is.

#include 
#include 

int main(void)
{
int ret;

ret = fnmatch("[a\\-c]", "b", 0);
printf("[a\\-c], b, 0: return %d\n", ret);

ret = fnmatch("[a\\-c]", "-", 0);
printf("[a\\-c], -, 0: return %d\n", ret);
}

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-16 Thread Joerg Schilling
Geoff Clare  wrote:

> Clearly the statement in XBD 9.3.5:
>
> The special characters '.', '*', '[', and '\\' ( ,
> , , and , respectively)
> shall lose their special meaning within a bracket expression.
>
> is intended to apply to backslashes in fnmatch(), just as it does to
> the special meaning of backslash stated in XCU 2.13.1 (which also
> doesn't mention an exception for bracket expressions).

It seems that everybody agrees that [a-z] should behave different to ["a-z"].
How this is implemented in the shell is not mentioned in POSIX. It seems 
however that people tend to use a prepended '\\' in strings to mark quoted 
characters in shell internal strings.

> The whole point of adding fnmatch() to the standard was to provide a
> a function which implements XCU 2.13, so any interpretation of the
> standard which has backslash being treated differently in fnmatch()
> (without FNM_NOESCAPE) than in XCU 2.13 cannot be correct.

If the intention was not to add a new interface but to add an interface that 
could be used to give the same results as seen in the shell, then I would 
expect fnmatch() to honor backslashes in [..] constructs as long as 
FNM_NOESCAPE is not in effect.

> While quoting it here, I just noticed that this statement also has
> another issue when being read in the context of XCU 2.13.1: it should
> refer to '?' losing its special meaning instead of '.'.  I'll update
> my proposed change in bug 1190 to address that.

It may be that the original intention was not to enforce people to implement 
shell internal quoting by using prepended '\\' characters in the strings that 
are used internally after tokenization in the parser. In case that a different 
mechanism is used, it would need a different implementation in fnmatch() as 
well.

I am not aware of a shell implementation that today uses a different method, so 
implementing backslash based quoting in fnmatch() seems to be the obvious 
method to recreate the behavior of the shell.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-16 Thread Geoff Clare
Robert Elz  wrote, on 13 Apr 2018:
>
> Date:Fri, 13 Apr 2018 15:07:07 +0100
> From:Geoff Clare 
> 
>   | For those the only difference from REs is the '^' -> '!'  one,
> 
> Not for fnmatch() which can have \ to escape characters (anywhere
> according to its description, which would include in bracket expressions,
> as that is not excluded.

Clearly the statement in XBD 9.3.5:

The special characters '.', '*', '[', and '\\' ( ,
, , and , respectively)
shall lose their special meaning within a bracket expression.

is intended to apply to backslashes in fnmatch(), just as it does to
the special meaning of backslash stated in XCU 2.13.1 (which also
doesn't mention an exception for bracket expressions).

The whole point of adding fnmatch() to the standard was to provide a
a function which implements XCU 2.13, so any interpretation of the
standard which has backslash being treated differently in fnmatch()
(without FNM_NOESCAPE) than in XCU 2.13 cannot be correct.

While quoting it here, I just noticed that this statement also has
another issue when being read in the context of XCU 2.13.1: it should
refer to '?' losing its special meaning instead of '.'.  I'll update
my proposed change in bug 1190 to address that.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Robert Elz
Date:Fri, 13 Apr 2018 15:07:07 +0100
From:Geoff Clare 
Message-ID:  <20180413140707.GB19570@lt2.masqnet>

  | Bracket expressions are not only used in REs and the shell.  There are
  | also fnmatch(), glob(), find and pax to consider, where shell quoting
  | does not apply. 

They are used by glob (in the generic sense) and by REs (differently).
All the other examples you cite are glob patterns, and all refer to the
sh implementation.

Sure the quoting needs to be made clear, but none of this needs to
in any way impact upon REs or baracket expressions in REs.

  | For those the only difference from REs is the '^' -> '!'  one,

Not for fnmatch() which can have \ to escape characters (anywhere
according to its description, which would include in bracket expressions,
as that is not excluded.   The others just refer to XCU 2.13 and don't
say what they expect in this regard from what I can tell.

What's more, I'm not sure what they should say, I've never wanted to
use quoting in a bracket expression, as I know how to use them wihout
that, and just always do it that way.

  | It is true of glob patterns as used by fnmatch(), glob(), find and pax.

It is certainly not true of fnmatch() unless it has the FNM_NOESCAPE
flag is set - though, and for the others, as above, at least according to
what the standard says, I just don't know.   True only sh uses quotation
marks as quoting methods, but that can be handled separately (indeed
it must be, however things are combined together.)

kre




Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Geoff Clare
Robert Elz  wrote, on 13 Apr 2018:
>
> Date:Fri, 13 Apr 2018 12:04:51 +0100
> From:Geoff Clare 
> 
>   | In the case of , this does not make clear that it is only
>   | referring to the RE-and-shell-pattern-matching special meaning of
>   |  and does not affect its shell-quoting special meaning.
> 
> This gets kind of messy, because XBD 9 is all about regular
> expressions, and the shell has none of those.
> 
> I believe that the right solution is just to remove the reference to XBD 9.3.5
> from XCU 2.13 and instead define how character classes work for the
> shell.Do that and we can get all of the quoting rules correct - and it
> just costs an extra page or so (most of the text can start out by a cut
> and paste.)

Bracket expressions are not only used in REs and the shell.  There are
also fnmatch(), glob(), find and pax to consider, where shell quoting
does not apply.  For those the only difference from REs is the '^' -> '!'
one, which is why it makes sense to refer to 9.3.5 with a statement about
that difference.

My proposed update to bug 985 (in note 3948) I think deals with the
addition of shell quoting considerations in a reasonably readable
manner without needing to duplicate 9.3.5.

> I know it is irritating to duplicate text, and if they were truly the same,
> I would not advocate it, but glob patterns and RE patterns are just
> different - only the char classes look kind of similar (and even there we
> need to do the '^' -> '!' substitution) but aren't really.   In an RE class
> the only way to get a literal '-' is to make it first (after ^ iif it is 
> there)
> or last.   That's not true of glob patterns, ...

It is true of glob patterns as used by fnmatch(), glob(), find and pax.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Geoff Clare
Robert Elz  wrote, on 13 Apr 2018:
>
> I think we have had enough of this topic, so I will not continue it
> after this message, but...
> 
>   | I maintain that the requirements of 2.2.3 are indeed universal.
> 
> If that's true, then surely those words must be read in conjunction with
> what the initial paragraph of 2.2 says ...
> 
>   Quoting is used to remove the special meaning of certain characters
>   or words to the shell.
>   Quoting can be used to preserve the literal meaning of the special
>   characters in the next paragraph   [continues about reserved words etc.]
> 
> The "special characters in the next paragraph" are ...
> 
>   | & ; < > ( ) $ ` \ " '   
> 
> and sometimes, where "depending on conditions described elsewhere"
> 
>   *  ? [ # ~  =%
> 
> Note that '-' is not in the list anywhere.   If we read that literally, it is 
> saying that quoting is not intended to remove any special meaning of
> characters other than the ones listed, which includes '-', which I would
> submit means that if you want to have quotes remove the special meaning
> of '-' in char classes in glob expressions, it needs to be explicitly stated.

Thank you for spotting this.  It looks to be an editorial oversight in 2.2.
(I think the purpose of that introductory text is to warn shell script
writers about which characters they need to think about quoting if they
want them to be treated literally.)

The lack of '-' in 2.2 doesn't change the requirements of 2.2.3, since
2.2.3 says "all characters", not "the characters listed in 2.2".

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Robert Elz
Date:Fri, 13 Apr 2018 11:51:12 +0100
From:Geoff Clare 
Message-ID:  <20180413105112.GA16858@lt2.masqnet>

I think we have had enough of this topic, so I will not continue it
after this message, but...

  | I maintain that the requirements of 2.2.3 are indeed universal.

If that's true, then surely those words must be read in conjunction with
what the initial paragraph of 2.2 says ...

Quoting is used to remove the special meaning of certain characters
or words to the shell.
Quoting can be used to preserve the literal meaning of the special
characters in the next paragraph   [continues about reserved words etc.]

The "special characters in the next paragraph" are ...

| & ; < > ( ) $ ` \ " '   

and sometimes, where "depending on conditions described elsewhere"

*  ? [ # ~  =%

Note that '-' is not in the list anywhere.   If we read that literally, it is 
saying that quoting is not intended to remove any special meaning of
characters other than the ones listed, which includes '-', which I would
submit means that if you want to have quotes remove the special meaning
of '-' in char classes in glob expressions, it needs to be explicitly stated.

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Joerg Schilling
Robert Elz  wrote:

> I know it is irritating to duplicate text, and if they were truly the same,
> I would not advocate it, but glob patterns and RE patterns are just
> different - only the char classes look kind of similar (and even there we
> need to do the '^' -> '!' substitution) but aren't really.   In an RE class
> the only way to get a literal '-' is to make it first (after ^ iif it is 
> there)
> or last.   That's not true of glob patterns, perhaps just by accident of
> that implementation in the Bourne sh (I do not recall quoting being
> possible to enter a literal '-' in 6th edition sh glob patterns but my
> memory might be lacking) - perhaps deliberate, I have no idea.

The 6th edition glob command did not implement escaping via '\\' at all.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Robert Elz
Date:Fri, 13 Apr 2018 12:04:51 +0100
From:Geoff Clare 
Message-ID:  <20180413110451.GA17286@lt2.masqnet>

  | In the case of , this does not make clear that it is only
  | referring to the RE-and-shell-pattern-matching special meaning of
  |  and does not affect its shell-quoting special meaning.

This gets kind of messy, because XBD 9 is all about regular
expressions, and the shell has none of those.

I believe that the right solution is just to remove the reference to XBD 9.3.5
from XCU 2.13 and instead define how character classes work for the
shell.Do that and we can get all of the quoting rules correct - and it
just costs an extra page or so (most of the text can start out by a cut
and paste.)

I know it is irritating to duplicate text, and if they were truly the same,
I would not advocate it, but glob patterns and RE patterns are just
different - only the char classes look kind of similar (and even there we
need to do the '^' -> '!' substitution) but aren't really.   In an RE class
the only way to get a literal '-' is to make it first (after ^ iif it is there)
or last.   That's not true of glob patterns, perhaps just by accident of
that implementation in the Bourne sh (I do not recall quoting being
possible to enter a literal '-' in 6th edition sh glob patterns but my
memory might be lacking) - perhaps deliberate, I have no idea.

It is possible that the i18n parts of the char class spec could be
moved out of XBD 9.3.5 and into a section of their own (somewhere)
and then referred to by 9.3.5 and XCU 2.13, but that would be a fairly
big change (I mean the internal [= and [: type stuff - I think that's all
the same in glob and RE char classes, most probably as it is all recently
added - comparatively recently anyway.)

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Geoff Clare
I wrote, on 13 Apr 2018:
>
> Note that whilst extra text is not *needed* regarding quoting inside
> bracket expressions, I would have no objection to some sort of explanatory
> note being added to lessen the chances that readers fail to realise that
> the quoting rules still apply inside bracket expressions.

I have spotted one problematic piece of text where such a note would be
beneficial.  It's in XBD 9.3.5 item 1:

The special characters '.', '*', '[', and '\\' (,
, , and , respectively)
shall lose their special meaning within a bracket expression.

In the case of , this does not make clear that it is only
referring to the RE-and-shell-pattern-matching special meaning of
 and does not affect its shell-quoting special meaning.
I will file a separate bug report for this.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Geoff Clare
Robert Elz  wrote, on 13 Apr 2018:
>
> That is, your comment that the text in 2.2.3 which says "shall preserve
> the literal value..." is not universal throughout the spec as you implied.

I maintain that the requirements of 2.2.3 are indeed universal.

> If it doesn't always apply, then we need extra text to say in each case
> where it matters, whether it applies or not

I disagree.  If a general rule doesn't always apply then extra text is
only needed in each case where it does not apply.  That text already
exists in 2.2.3 (where the general rule is "all enclosed characters
are literal" and the exceptions to that are explicitly stated).

Note that whilst extra text is not *needed* regarding quoting inside
bracket expressions, I would have no objection to some sort of explanatory
note being added to lessen the chances that readers fail to realise that
the quoting rules still apply inside bracket expressions.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Robert Elz
Date:Fri, 13 Apr 2018 09:32:40 +0100
From:Geoff Clare 
Message-ID:  <20180413083240.GA14937@lt2.masqnet>

  | Neither of your examples is valid because the standard already explicitly
  | describes the behaviour in those cases.

Sorry, but those sections have nothing whatever to do with the point
I was making.

That is, that to process "$(...)" you have to both take the '(' literally, not
as an operator, and also treat it as a syntax character (part of the $( 
combination).

That is, your comment that the text in 2.2.3 which says "shall preserve
the literal value..." is not universal throughout the spec as you implied.

If it doesn't always apply, then we need extra text to say in each case
where it matters, whether it applies or not - the $( (etc) cases are
handled (I am not suggesting anything is wrong with them) but the
["$var"] case is not - it just needs to be made explicit what is to
happen when the text for that is rewritten.

There are really not all that many places where quoting actually
makes a difference, and I think most of those are already handled
(there are words like "except when quoted" or "an unquoted ...") it
just happens that quoting and patterns is not really specified - and
particularly character classes - again I suspect because of the reference
to XBD 9.3.5 in which there is no quoting at all, and hence obviously
no need to say what happens.

kre

ps: and we really do need to add some text to say what "matched"
means in the context of the parameter expansion substring operators.



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Joerg Schilling
"Schwarz, Konrad"  wrote:

> As the Bourne Shell source code posted earlier showed, that implementation 
> did not clearly separate the phases: a character with its high-bit set was 
> quoted for all further purposes.

This worked with 7-bit ASCII.

As we have an 8-bit clean Bourne Shell since 1986 (SysVr3), this was replaced
by a '\\' prefix char. The code is now more complex, but the behavior is 
basically the same.

Yes, a character is passed through the shell with the quoting intact until it
calls "trim()" to remove this quoting. Where this happens influences the 
behavior.

People who implement shells usually control this by checking whether the shell 
behaves as expected. Whether the POSIX standard always mentions "quote removal"
at the right location was not yet verified as this would need a shell that was 
implemented only from reading the POSIX standard.

The reason for modifications in the POSIX standard with respect to the shell is 
to correct the current wording to follow the expectations of the users and the 
behavior of the reference shell.

Unfortunately, both POSIX and the reference shell have bugs. This is why we 
need 
to carefuly disuss issues with the shell.

For our discussion, ksh93 seems to misbehave with respect to quoting, while 
ksh88 seems to missbehave with respect to honoring quoting in the pattern 
matcher.

I believe we all agree that [a-c] and ["a-c"] should behave different.

Since the only difference in ["a-c"] is quoting, is is obvious hat the shell 
needs to honor quoting inside character classes as well in order to give a 
different behavior for [a-c] and ["a-c"].

The way the quoting and the pattern matcher has to be implemented depends on 
the expectations of the users. My impression is that we agree on how both
patterns should behave, so we just need to find a wording that matches the 
expectations.


Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Joerg Schilling
Robert Elz  wrote:

> Date:Thu, 12 Apr 2018 12:10:04 -0700
> From:Don Cragun 
> Message-ID:  
>
>   | The fact that the $ is special is what is the key.
>
> The problem is in the interpretation of just what "treated literally" means.
>
> If it just means that "the character is itself and is not transformed into
> something else" that's fine, the special $ inside the "" (which is not
> treated literally, so it remains the introducer of various expansions)
> can then look at the following character, see it is a '{' (untransformed)
> and then go on and implement parameter expansion as you describe
> -- and the various other characters that have meaning in that,
> such as ':' '-' '+' '?' '%' '#' '=') which (being treated litterally all
> represent themselves) can be part of the syntax.

$(cmd) vs "$(cmd)" is not parsed differently but just results in different 
treatment of the results.

Strings enclosed inside '"' are not passed as quoted but treated differently 
during parameter expansion when the parser in that unit sees the '"'.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Geoff Clare
Robert Elz  wrote, on 13 Apr 2018:
>
> I have also just realised that a better example than ${ in "" would have been
> $( inside "".
> 
> There, because of the double quotes, the '(' is treated literally, as a '('
> character, and not as the '(' operator.   But still when the command
> substitition (inside the "") is performed, the '(' is available to be part
> of the syttax, and is no longer treated literally at all.
> 
> Put that reasoning into the argument in the previous message, instead
> of the ${ version and I think it becomes clearer how the current text
> allows the '-' inside "a-c" to be treated literally, as the string is parsed
> (not that that one would be treated differently anyway, any more than
> the '{' would be in the ${ form) but still have its special meaning in
> character ranges, just as the '(' (or '{' retains its special meaning in the
> expansions.

Neither of your examples is valid because the standard already explicitly
describes the behaviour in those cases.  See 2.2.3 Double-Quotes in the
part about :

The input characters within the quoted string that are also
enclosed between "$(" and the matching ')' shall not be affected
by the double-quotes, ...

Within the string of characters from an enclosed "${" to the
matching '}', an even number of unescaped double-quotes or
single-quotes, if any, shall occur. A preceding 
character shall be used to escape a literal '{' or '}'.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



RE: Should shell quoting within glob bracket patterns be effective?

2018-04-13 Thread Schwarz, Konrad
> -Original Message-
> From: Robert Elz [mailto:k...@munnari.oz.au]
> Put that reasoning into the argument in the previous message, instead 
> of the ${ version and I think it becomes clearer how the current text 
> allows the '-' inside "a-c" to be treated literally, as the string is 
> parsed (not that that one would be treated differently anyway, any more than 
> the '{' would be in the ${ form) but still have its special meaning in 
> character ranges, just as the '(' (or '{' retains its special meaning in the 
> expansions.

This is probably Captain Obvious speaking, but in the C standard, I find the 
concept of -- sequentially applied -- Translation Phases (i.e., Trigraph 
elimination, backslash newline elimination, preprocessing tokenization, 
preprocessing, ...) quite illuminating.

As the Bourne Shell source code posted earlier showed, that implementation did 
not clearly separate the phases: a character with its high-bit set was quoted 
for all further purposes.

Perhaps something similar to translation phases could help here.

Konrad



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Robert Elz
I have also just realised that a better example than ${ in "" would have been
$( inside "".

There, because of the double quotes, the '(' is treated literally, as a '('
character, and not as the '(' operator.   But still when the command
substitition (inside the "") is performed, the '(' is available to be part
of the syttax, and is no longer treated literally at all.

Put that reasoning into the argument in the previous message, instead
of the ${ version and I think it becomes clearer how the current text
allows the '-' inside "a-c" to be treated literally, as the string is parsed
(not that that one would be treated differently anyway, any more than
the '{' would be in the ${ form) but still have its special meaning in
character ranges, just as the '(' (or '{' retains its special meaning in the
expansions.

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Robert Elz
Date:Thu, 12 Apr 2018 12:10:04 -0700
From:Don Cragun 
Message-ID:  

  | The fact that the $ is special is what is the key.

The problem is in the interpretation of just what "treated literally" means.

If it just means that "the character is itself and is not transformed into
something else" that's fine, the special $ inside the "" (which is not
treated literally, so it remains the introducer of various expansions)
can then look at the following character, see it is a '{' (untransformed)
and then go on and implement parameter expansion as you describe
-- and the various other characters that have meaning in that,
such as ':' '-' '+' '?' '%' '#' '=') which (being treated litterally all
represent themselves) can be part of the syntax.

On the other hand, if "treated literally" means "is itself and can have
no meaning or other interpretation other than being the character
itself" then that '{' must just be a '{' that is part of the string, and not
be co-opted into being part of the sh syntax for parameter expansions.

I am assuming here that the first interpretation is the desirable one.

Given that, then the "treated literally" '-' in the double quoted (or for
that matter single quoted) string inside a character class, is just a
'-' character, but that can still (as the '{' was in the parameter expansion)
be used as a syntax character in the class - indicating the range,
whether it was double quoted or not.

On the other hand, if you want the '-' to always represent the character
'-' itself, and not be part of the range expression, and you want to produce
that result just from the words "treated literally", you have to define
"treated literally" in the second form, and in that case, the '{' in the
"${..." form cannot be the '{' that is part of the parameter expansion.

It cannot be both ways.


But let me be clear about something - my point here is not to argue
for changes to all of the shells to meet some bizarre interpretation of
the specification, it is that the text in the standard needs to be improved,
and be explicit about things like this.

That it is possible for me to make an even semi-plausible argument in
this way means the text does not state the intentions nearly clearly
enough, and needs to be made much more precise.

There is a temptation for those who know what it should mean to read
the text, and see that it can mean what it should mean (and perhaps
not even notice that things could be interpreted differently) and then
be happy that all is OK.But when read by someone who has no idea
what the desired outcome is, and has only the words to reply upon,
we must be sure that there is no room at all for misinterpretation.

This is why standards are generally very dry, hard to read, and boring
documents - they need to be precise about every little detail (and yes,
saying something is unspecified or undefined is precise, provided it
is clear what the "something" is).

When 985 is revisited, and the wording for how pattern matching is done
gets revised, just specify all of this precisely - it does not need to be in the
form of some algorithm to achieve the desired result (although that is one
method) but it must make it absolutely clear what the desired result is,
for every possible input.

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Don Cragun


> On Apr 12, 2018, at 8:07 AM, Robert Elz  wrote:
> 
>Date:Thu, 12 Apr 2018 14:25:32 +0100
>From:Geoff Clare 
>Message-ID:  <20180412132532.GA9483@lt2.masqnet>
> 
>  | It treats them as literal characters, just as 2.2.3 says.
> 
> I thought that might have been the response, in that case in
> 
>   "${xxx}"
> 
> The '{' has to be treated as a literal character, as inside double
> quotes, and not being one of the magic few, that's what the text
> you quoted says, and apparently, everywhere else in the shell
> is supposed to follow that same interpretation.
> 
> That is, the '{' above cannot be treated the same as the one in
> 
>   ${xxx}
> 
> (unquoted) where it is a part of the syntax of the variable expansion,
> because then it would not be being treated literally.
> 
> Which way do you want it?
> 
> kre

The fact that the $ is special is what is the key.  Since $ is
special and parameter expansion and command substitution are
performed inside double-quotes, sections 2.6.2 and 2.6.3 come
into play... and that is where {, #, ##, %, %%, and } in
parameter expansions may become special and where ( and ) may
become special in command substitutions, respectively.

Cheers,
Don



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Robert Elz
Date:Thu, 12 Apr 2018 14:25:32 +0100
From:Geoff Clare 
Message-ID:  <20180412132532.GA9483@lt2.masqnet>

  | It treats them as literal characters, just as 2.2.3 says.

I thought that might have been the response, in that case in

"${xxx}"

The '{' has to be treated as a literal character, as inside double
quotes, and not being one of the magic few, that's what the text
you quoted says, and apparently, everywhere else in the shell
is supposed to follow that same interpretation.

That is, the '{' above cannot be treated the same as the one in

${xxx}

(unquoted) where it is a part of the syntax of the variable expansion,
because then it would not be being treated literally.

Which way do you want it?

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Geoff Clare
Robert Elz  wrote, on 12 Apr 2018:
>
> Date:Thu, 12 Apr 2018 13:25:01 +0100
> From:Geoff Clare 
> 
>   | Yes there is.  I quoted it earlier in this thread.
> 
> I know that, but that's useless for this purpose.   We know the quoted
> (part of) the string is treated literally, and handed to the pattern matching 
> code,
> exactly as is, with no conversions performed upon it.
> 
> But now what is the pattern matching code supposed to do with that
> string?

It treats them as literal characters, just as 2.2.3 says.

> Where does it say (aside from the 985 resolution) that those characters
> mean anything different in a pattern than they would in a pattern given
> anywhere else?

The statement "shall preserve the literal value of all characters" in
2.2.3 is sufficient.

> Eg: if I do
> 
>   find . -name '[a-z]*' -print
> 
> are you suggesting that because that '[' '-' and '*' are qoted they are not to
> be given their normal pattern (class, range and "all") meanings ?

The shell passes the literal characters [a-z]* to find.  What find
does with those is specified in the description of find.

> Obviously not.
> 
> What about
> 
>   find . -name '"[a-z]*"' -print?

The shell passes the literal characters "[a-z]*" to find.  What find
does with those is specified in the description of find.

> So in
> 
>   ls "[a-z]*"
> 
> given that quote removal is not performed before the filename expansion,
> exactly what text in the standard says the quotes should be treated 
> differently
> in this than in the 2nd find example above?

The shell passes the literal characters [a-z]* to ls.  What ls does
with those is specified in the description of ls.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Robert Elz
Date:Thu, 12 Apr 2018 13:25:01 +0100
From:Geoff Clare 
Message-ID:  <20180412122501.GA8783@lt2.masqnet>

  | Yes there is.  I quoted it earlier in this thread.

I know that, but that's useless for this purpose.   We know the quoted
(part of) the string is treated literally, and handed to the pattern matching 
code,
exactly as is, with no conversions performed upon it.

But now what is the pattern matching code supposed to do with that
string?

Where does it say (aside from the 985 resolution) that those characters
mean anything different in a pattern than they would in a pattern given
anywhere else?

Eg: if I do

find . -name '[a-z]*' -print

are you suggesting that because that '[' '-' and '*' are qoted they are not to
be given their normal pattern (class, range and "all") meanings ?

Obviously not.

What about

find . -name '"[a-z]*"' -print?

There quotes get handed to find as part of the arg, but quotes mean nothing
to pattern matching normally, so this one should look for file names that begin
and end with double quote characters, and have a lower-case alpha as the
first character after the leading ".   Right?

So in

ls "[a-z]*"

given that quote removal is not performed before the filename expansion,
exactly what text in the standard says the quotes should be treated differently
in this than in the 2nd find example above?

Note in all of this I am not questioning what should be done, or what shells
actually do, but rather how I work out from the text in the standard what
should be done.

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Geoff Clare
Robert Elz  wrote, on 12 Apr 2018:
>
> Date:Thu, 12 Apr 2018 11:39:11 +0100
> From:Geoff Clare 
> 
>   | Huh?  The '-' is quoted by the double quotes and should therefore be
>   | treated literally. 
> 
> The problem is that there is nothing in either TC2 or TC2 + 985-fix that
> says that should happen.

Yes there is.  I quoted it earlier in this thread.

2.2.3 Double-Quotes

Enclosing characters in double-quotes ("") shall preserve the literal
value of all characters within the double-quotes, with the exception
of the characters backquote, , and 

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Robert Elz
Date:Thu, 12 Apr 2018 11:39:11 +0100
From:Geoff Clare 
Message-ID:  <20180412103911.GA6656@lt2.masqnet>

  | Huh?  The '-' is quoted by the double quotes and should therefore be
  | treated literally. 

The problem is that there is nothing in either TC2 or TC2 + 985-fix that
says that should happen.   And without that "should" is really just
wishing (based upon what shells actually do, or most of them).

The issue is how to specify it so that everything works correctly, for
all the cases of sh pattern matching, and for the other users of fnmatch()

Ideally:
find dir -name 'pattern' -print
should list the same filenames (in a different order/format) as
ls dir/pattern
lists, for all possible patterns (temporarily ignoring leading
dot issues, if there are any), and
ls dir | while read f
  do case "$f" in (pattern) printf '%s\n' "$f";;
  esac; done
should (again, ignoring '.' issues for now) print the same list.


kre




Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Robert Elz
Date:Thu, 12 Apr 2018 12:10:20 +0200
From:Joerg Schilling 
Message-ID:  <5acf308c.yoyva4vzwwu8t7jp%joerg.schill...@fokus.fraunhofer.de>

Jörg:

  | Since '' and "" quoting in the shell is highly complex and no longer 
present at 
  | the time the shell pattern matching is called,

That's not correct (well, "highly complex" is reasonable) at least
according to the standard (rather than how things might be
implemented in any particular implementation)

In filenames, the order is tilde expansion, (field splitting is irrelevant
for present purposes), parameter expansion (and its companions)
filename expansion, and finally, quote removal.See "The order of
word expansions" and what follows in XCU 2.6.

If it were not that way, then ls "*"* would not find files starting with a 
literal 
asterisk, but just all files.

In case patterns the old (current) standard does no quote removal on
the pattern at all - 985 tries to fix that but doesn't get it right.

In parameter expansions, the % and # (and %% and ## of course)
operators also happen before quote removal, so the pattern matching
they do also still has the quote characters.   Of course, for these ones
the standard says nothing at all about what "matched by pattern" means
and just assumes "you know it is a glob style match" and what that
means (and we all do it by comparing results from other shells and
hoping we haven't missed any weird cases...)

Are there any other uses of patterns in (standard) sh?  I can't remember
any right now/

  |  it makes no sense to add '' and "" to fnmatch().

That might be true, but assuming that we want fnmatch() to produce
the same results as sh does (given the correct flags to indicate what
kind of match it should perform) we would need to be very specific
about exactly how to translate a quoted shell string into a fnmatch
pattern.

  | To understand quoting, let me explain how the Bourne Shell does it:

Once again, this is (kind of) interesting but 100% irrelevant.

What matters is what the standard says must be done, not how some
implementation chooses to implement that.   One thing the standard
does not say that should be done is to convert one form of quoting
into another form (ever, except for the 985 bug resolution I think.)

Of course, provided the results are correct, it is fine to do that
within an implementation (ash based shells do quoting a totally
different way, but also not the posix "leave the quotes in the word")
but it is unacceptable to assume that all other implementations
must, or should, act that way - or even that their implementors
would ever consider doing it that way.

As long as posix says to leave the quotes in the word until
quote removal, and as long as quote removal happens after
pattern matching (or filename expansion for that case) the
specification of the pattern matching algorithm must handle '
and " chars in the pattern.

And if the pattern matching algorithm is just to be "call
fnmatch() with the flags..." (etc) then fnmatch needs to
handle them as well.   Alternatively, the algorithm could
be "convert quoted strings in the pattern as ..[to be
completed].. and then call fnmatch() using the modified
pattern, then fnmatch does not need to handle quotes.

Which is better largely depends upon just how flexible we
want the fnmatch() function to be - that is, must all callers
deal with quoting (if their context allows that) somehow,
before calling it ?

What the standard specifies should however match what the
implementations actually do (or at least most of them.)

kre

ps: it was interesting to see that the (ancient algol68 style) code
fragment you sent in the earlier message did not handle a ']'
as the first char of a class correctly (meaning a ']' in the class
instead of being the ending delimiter).   I don't remember ever
encountering that issue back when I used that shell - of course
wanting ']' in c char class is not common, so it is perhaps not
too surprising.   And wrt that message - for persent purposes,
it would be better to run tests using case pattern matching rather
than filename expansion - for filename expansion it is quite clear
that quote removal happens after the pattern matching, so the
shell is free to interpret the quote chars.  For case patterns it
is not so clear what should be done.




Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Joerg Schilling
Geoff Clare  wrote:

> > 
> > > ["a\-c"] the backslash is not special and should be treated literally
> > 
> > This string is converted into [\a\\-\c] by the shell macro expansion code.
> > 
> > With the shell gmatch() code, this results in a match for 'a' and '\\' .. 
> > 'c'.
>
> Huh?  The '-' is quoted by the double quotes and should therefore be
> treated literally.  It should match only 'a', backslash, '-' and 'c'
> (and that's what I observe in bash, although ksh88 for some reason only
> matches 'a', backslash and 'c' which looks like a bug).

You are right, Sorry, I missed one backslash before the '-'. The resulting 
pattern
is: [\a\\\-\c]

Here is an updated script "tsh":

-
if [ "$BASH_VERSION" != "" ]; then
echo() { command echo -e "$@"; }
fi

chk() { echo [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]; }

mkdir td && cd td || exit

printf '%s\n' '---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]'

echo ":  \c"; chk

:> a; echo "a: \c"; chk; rm a

:> b; echo "b: \c"; chk; rm b

:> ./-; echo "-: \c"; chk; rm ./-

:> c; echo "c: \c"; chk; rm c

:> _; echo "_: \c"; chk; rm _

:> \\; echo "\\: \c"; chk; rm \\

:> d; echo "d: \c"; chk; rm d

rm -f *
cd ..
rmdir td
-

and with:

for i in sh ksh ksh93 bosh bash mksh dash; do echo; echo $i:; $i ./tsh; 
done 

you get this result:

sh:
---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]
:  [a-c] [a-c] [a\-c] [a-c] [a-c]
a: a a a a a
b: b [a-c] [a\-c] [a-c] [a-c]
-: [a-c] - - - -
c: c c c c c
_: [a-c] [a-c] [a\-c] [a-c] [a-c]
\: [a-c] [a-c] \ [a-c] [a-c]
d: [a-c] [a-c] [a\-c] [a-c] [a-c]

ksh:
---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]
:  [a-c] [a-c] [a\-c] [a-c] [a-c]
a: a a a a a
b: b [a-c] [a\-c] [a-c] b
-: [a-c] [a-c] [a\-c] [a-c] [a-c]
c: c c c c c
_: [a-c] [a-c] [a\-c] [a-c] [a-c]
\: [a-c] \ \ \ \
d: [a-c] [a-c] [a\-c] [a-c] [a-c]

ksh93:
---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]
:  [a-c] [a-c] [a\-c] [a-c] [a-c]
a: a a a a a
b: b b b [a-c] [a-c]
-: [a-c] [a-c] [a\-c] - -
c: c c c c c
_: [a-c] [a-c] [a\-c] [a-c] [a-c]
\: [a-c] [a-c] \ [a-c] [a-c]
d: [a-c] [a-c] [a\-c] [a-c] [a-c]

bosh:
---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]
:  [a-c] [a-c] [a\-c] [a-c] [a-c]
a: a a a a a
b: b [a-c] [a\-c] [a-c] [a-c]
-: [a-c] - - - -
c: c c c c c
_: [a-c] [a-c] [a\-c] [a-c] [a-c]
\: [a-c] [a-c] \ [a-c] [a-c]
d: [a-c] [a-c] [a\-c] [a-c] [a-c]

bash:
---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]
:  [a-c] [a-c] [a\-c] [a-c] [a-c]
a: a a a a a
b: b [a-c] [a\-c] [a-c] [a-c]
-: [a-c] - - - -
c: c c c c c
_: [a-c] [a-c] [a\-c] [a-c] [a-c]
\: [a-c] [a-c] \ [a-c] [a-c]
d: [a-c] [a-c] [a\-c] [a-c] [a-c]

mksh:
---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]
:  [a-c] [a-c] [a\-c] [a-c] [a-c]
a: a a a a a
b: b [a-c] [a\-c] [a-c] [a-c]
-: [a-c] - - - -
c: c c c c c
_: [a-c] [a-c] [a\-c] [a-c] [a-c]
\: [a-c] [a-c] \ [a-c] [a-c]
d: [a-c] [a-c] [a\-c] [a-c] [a-c]

dash:
---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]
:  [a-c] [a-c] [a\-c] [a-c] [a-c]
a: a a a a a
b: b [a-c] b [a-c] [a-c]
-: [a-c] - [a\-c] - -
c: c c c c c
_: [a-c] [a-c] _ [a-c] [a-c]
\: [a-c] [a-c] \ [a-c] [a-c]
d: [a-c] [a-c] [a\-c] [a-c] [a-c]

So there is a new bug in "dash", as dash matches '_' for your example
and '_' is inside the range '\\' .. 'c'.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Geoff Clare
Joerg Schilling  wrote, on 12 Apr 2018:
>
> Geoff Clare  wrote:
> 
> > > Maybe, I should again mention history:
> > > 
> > >   -   fmnatch() has been introduced with issue 4 (1995). It does not
> > >   seem to be related to a historic UNIX. Since the oldest known
> > >   implementation is from IBM, fnmatch() may have been introduced
> > >   by AIX.
> >
> > It was first standardised in POSIX.2-1992 and was invented by the developers
> > of that standard.
> 
> So fnmatch() could be seen as an artificial invention and there is no need to 
> have fnmatch() to behave the same as the shell. 

It performs filename/pathname pattern matching as done by find and pax.
It is only the same as the shell when there is no shell quoting involved.

> > You are conflating two different type of backslash escape.
> >
> > The shell should honour backslash when used as shell quoting, regardless
> > of whether it is inside a bracket expression, but should not treat a
> > backslash in a bracket expression *that is part of the pattern* (i.e. not
> > shell quoting) as special.
> >
> > For example:
> >
> > [\"] the backslash quotes the "
> 
> This string is converted to [\"] by the parser and there is no PS2 prompt.
> 
> 
> > ["a\-c"] the backslash is not special and should be treated literally
> 
> This string is converted into [\a\\-\c] by the shell macro expansion code.
> 
> With the shell gmatch() code, this results in a match for 'a' and '\\' .. 'c'.

Huh?  The '-' is quoted by the double quotes and should therefore be
treated literally.  It should match only 'a', backslash, '-' and 'c'
(and that's what I observe in bash, although ksh88 for some reason only
matches 'a', backslash and 'c' which looks like a bug).

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Joerg Schilling
Geoff Clare  wrote:

> > Maybe, I should again mention history:
> > 
> > -   fmnatch() has been introduced with issue 4 (1995). It does not
> > seem to be related to a historic UNIX. Since the oldest known
> > implementation is from IBM, fnmatch() may have been introduced
> > by AIX.
>
> It was first standardised in POSIX.2-1992 and was invented by the developers
> of that standard.

So fnmatch() could be seen as an artificial invention and there is no need to 
have fnmatch() to behave the same as the shell. 

It would however be nive to be able to switch it into that mode (see my 
FNM_CLASSESC proposal.

> You are conflating two different type of backslash escape.
>
> The shell should honour backslash when used as shell quoting, regardless
> of whether it is inside a bracket expression, but should not treat a
> backslash in a bracket expression *that is part of the pattern* (i.e. not
> shell quoting) as special.
>
> For example:
>
> [\"] the backslash quotes the "

This string is converted to [\"] by the parser and there is no PS2 prompt.


> ["a\-c"] the backslash is not special and should be treated literally

This string is converted into [\a\\-\c] by the shell macro expansion code.

With the shell gmatch() code, this results in a match for 'a' and '\\' .. 'c'.

So I guess that you missinterpret my text and the results from my test script.

Let me add an updated version that includes a test for "-":

mkdir td && cd td || exit

:> a
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm a

:> b
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm b

:> ./-
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm ./-

:> c
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm c

:> _
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm _

:> \\
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm \\

:> d
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm d

rm -f *
cd ..
rmdir td

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Joerg Schilling
Robert Elz  wrote:

> And yes, in particular if a [a\-c] means a class with the three chars 'a' '-'
> and 'c' in it in sh it should mean that in fnmatch() as well, or if that 
> pattern means a class with 8 chars (0x5c .. 0x63) with 'a' there 2 ways,
> in fnmatch() then it should mean that in sh as well. 

My tests verify that all modern shells including ksh93 match three chars for
[a\-c]

> On the other hand, having sh allow '' and "" quoting in addition to \ quoting
> while not supporting that in fnmatch() is possible using a technique like
> that in what was intended to be the 985 resolution - just provided that it
> handles all of the cases correctly.

Since '' and "" quoting in the shell is highly complex and no longer present at 
the time the shell pattern matching is called, it makes no sense to add '' and 
"" to fnmatch().

To understand quoting, let me explain how the Bourne Shell does it:

1)  the parser keeps \a and converts 'a' into \a

2)  The parser retains " in strings

3)  The interpreter calls the macro expansion code and this code
replaces the extended strings inside "" by quote chars (e.g. "abc"
into \a\b\c).

4)  The file name globbing is done for command arguments and gmatch()
is called for "case" statements, using the current state of the string
that reaults in:

\aq\o\oq\a\b\c

for

'a'q'oo'q"abc"

If you like to let fnmatch() match the behavior of the shell related to 
character classes, this could be cone using a new flag FNM_CLASSESC.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Geoff Clare
Joerg Schilling  wrote, on 12 Apr 2018:
>
> It seems that we need to define how quoting works in a real shell 
> implementation. 
> 
> If we require the strings to be in the form \c\h\a\r in case of a quoted 
> string 
> at that specific part of the shell, we may explain how quoting works for 
> "case"
> statements.
> 
> Maybe, I should again mention history:
> 
>   -   fmnatch() has been introduced with issue 4 (1995). It does not
>   seem to be related to a historic UNIX. Since the oldest known
>   implementation is from IBM, fnmatch() may have been introduced
>   by AIX.

It was first standardised in POSIX.2-1992 and was invented by the developers
of that standard.

[...] 
> Now let us check the behavior of various shells with the following script:
> 

[...]
> As we can see:
> 
> - The Bourne Shell interprets backshlash escapes
>   inside character classes.
> 
> - All other (relevant) shells behave identically
>   except ksh88 and ksh93
> 
> - ksh88 does not honor backslashes inside a
>   character class. Since ksh93 changes this back to the
>   original Bourne Shell behavior, I would call it a bug.
> 
> - ksh93 interprets ["a-c"] different from all
>   other shells, but again interprets backshlash escapes
>   inside character classes.
> 
>   I remember that I received a report from someone (maybe
>   Martijn Dekker or Thorsten Glaser) that ksh93 has problems
>   with " inside some expressions.
> 
>   I would call the single deviation seen in ksh93 a bug.
>   The reason for this other behavior does not seem to be related
>   to pattern matching but to the way quote removal has been
>   implemented.
> 
> Conclusion:
> 
> Since the behavior of fnmatch() is currently not able to match the behavior
> of the shell matcher, I propose to add a new flag for fnmatch() to switch
> it into the shell mode that honors backslashes inside character sets.

You are conflating two different type of backslash escape.

The shell should honour backslash when used as shell quoting, regardless
of whether it is inside a bracket expression, but should not treat a
backslash in a bracket expression *that is part of the pattern* (i.e. not
shell quoting) as special.

For example:

[\"] the backslash quotes the "

["a\-c"] the backslash is not special and should be treated literally

If you want fnmatch() to be able to work like the shell you would need
the new flag to turn on all shell quoting (i.e. backslash, double-quotes
and single-quotes), not just backslash.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Robert Elz
Date:Thu, 12 Apr 2018 09:24:51 +0100
From:Geoff Clare 
Message-ID:  <20180412082451.GA3949@lt2.masqnet>

  | Bug 985 moves the detail from the current 2.13 into the fnmatch()
  | description and makes 2.13 refer to fnmatch().

Oh - I did not read all of it all that carefully - just the actual descriptions
of how it was to work.

I see no problem with that approach though -- that was what I intended
to say before I read the (old, TC2) fnmatch() page.

Both the shell, and the function, should act the same - if they don't the
function isn't nearly as useful.   Given that having the description in one,
and referring to it from the other seems appropriate, and I don't see that
it matters much which way it is done.

And yes, in particular if a [a\-c] means a class with the three chars 'a' '-'
and 'c' in it in sh it should mean that in fnmatch() as well, or if that 
pattern means a class with 8 chars (0x5c .. 0x63) with 'a' there 2 ways,
in fnmatch() then it should mean that in sh as well. 

On the other hand, having sh allow '' and "" quoting in addition to \ quoting
while not supporting that in fnmatch() is possible using a technique like
that in what was intended to be the 985 resolution - just provided that it
handles all of the cases correctly.

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Joerg Schilling
Robert Elz  wrote:

> Date:Wed, 11 Apr 2018 15:14:00 +0100
> From:Geoff Clare 
> Message-ID:  <20180411141400.GA32463@lt2.masqnet>
>
>   | I also have a feeling we will have to abandon the neat idea of defining
>   | shell pattern matching in terms of fnmatch(). 
>
> Yes, but for a slightly different reason - fnmatch() doesn't describe how
> the matching works, it just refers to XCU 2.13 for that info.   What it
> describes is a function that applications can call that does the same
> kind of matching as the shell does.
...

> Delete "quote removal" and in the description of how matching works, the
> quoting characters can be made to mean what they should mean for
> patterns - nothing needs to be "removed" here, as the pattern is just used
> for matching, the only result is matched or not matched.   Quoting just 
> affects
> the interpretation of the quoted characters, and otherwise matches nothing.

It seems that we need to define how quoting works in a real shell 
implementation. 

If we require the strings to be in the form \c\h\a\r in case of a quoted string 
at that specific part of the shell, we may explain how quoting works for "case"
statements.

Maybe, I should again mention history:

-   fmnatch() has been introduced with issue 4 (1995). It does not
seem to be related to a historic UNIX. Since the oldest known
implementation is from IBM, fnmatch() may have been introduced
by AIX.

-   The historic Bourne Shell used it's own implementation in 
expand.c:

case '[': 
{BOOL ok; INT lc; 
ok=0; lc=07; 
WHILE c = *p++ 
DO  IF c==']' 
THENreturn(ok?gmatch(s,p):0); 
ELIF c==MINUS 
THENIF lc<=scc ANDF scc<=(*p++) THEN ok++ FI 
ELSEIF scc==(lc=(c&STRIP)) THEN ok++ FI 
FI 
OD 
return(0); 
} 

and this shows very obviously that [a\-c] is subject to quoting
as otherwise the code needs to read: ELIF (c&STRIP)==MINUS
since the 1977 Bourne Shell did pass "[a\334c]" to the matching
function in expand.c if the command line was [a\-c].

-   A classical AT&T based UNIX in the late-1980s did have a 
library "libgen" with a function gmatch() inside that behaves 
like the code above, but by understanding "[a\-c]" instead of
"[a\334c]".

Now let us check the behavior of various shells with the following script:


mkdir td && cd td || exit

:> a
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm a

:> b
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm b

:> _
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm _

:> \\ 
echo [a-c] ["a-c"] [\a\-\c] [a\-c] 
rm \\ 

:> d
echo [a-c] ["a-c"] [\a\-\c] [a\-c]
rm d

rm -f *
cd ..
rmdir td
---

This results in the following:

Bourne Shell:
a a a a
b [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]

ksh88:
a a a a
b [a-c] [a-c] b
[a-c] [a-c] [a-c] [a-c]
[a-c] \ \ \
[a-c] [a-c] [a-c] [a-c]

ksh93:
a a a a
b b [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]

bosh:
a a a a
b [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]

bash:
a a a a
b [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]

mksh:
a a a a
b [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]

dash:
a a a a
b [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]
[a-c] [a-c] [a-c] [a-c]

As we can see:

-   The Bourne Shell interprets backshlash escapes
inside character classes.

-   All other (relevant) shells behave identically
except ksh88 and ksh93

-   ksh88 does not honor backslashes inside a
character class. Since ksh93 changes this back to the
original Bourne Shell behavior, I would call it a bug.

-   ksh93 interprets ["a-c"] different from all
other shells, but again interprets backshlash escapes
inside character classes.

I remember that I received a report from someone (maybe
Martijn Dekker or Thorsten Glaser) that ksh93 has problems
with " inside some expressions.

I would call the single deviation seen in ksh93 a bug.
The reason for this other behavior does not seem to be related
to pattern matching but to the way quote removal has been
implemented.

Conclusion:

Since the behavior of fnmatch() is currently not able to match the behavior
of the shell matcher, I propose to add a new flag for fnmatch() to switch
it into the shell mode that honors backslashes inside character sets.

Jörg

-- 
 EMail:jo...@schily.

Re: Should shell quoting within glob bracket patterns be effective?

2018-04-12 Thread Geoff Clare
Robert Elz  wrote, on 12 Apr 2018:
>
> Date:Wed, 11 Apr 2018 15:14:00 +0100
> From:Geoff Clare 
> 
>   | I also have a feeling we will have to abandon the neat idea of defining
>   | shell pattern matching in terms of fnmatch(). 
> 
> Yes, but for a slightly different reason - fnmatch() doesn't describe how
> the matching works, it just refers to XCU 2.13 for that info.

Bug 985 moves the detail from the current 2.13 into the fnmatch()
description and makes 2.13 refer to fnmatch().

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Robert Elz
Date:Wed, 11 Apr 2018 15:14:00 +0100
From:Geoff Clare 
Message-ID:  <20180411141400.GA32463@lt2.masqnet>

  | I also have a feeling we will have to abandon the neat idea of defining
  | shell pattern matching in terms of fnmatch(). 

Yes, but for a slightly different reason - fnmatch() doesn't describe how
the matching works, it just refers to XCU 2.13 for that info.   What it
describes is a function that applications can call that does the same
kind of matching as the shell does.

Describing how matching works in terms of fnmatch() is just a convoluted
path to get back to "it works like XCU 2.13 says", except that if that is
in 2.13 itself, we have infinite recursion.   All that is really gained is the
ability to use the fnmatch() flags as a shorthand for their meanings, and
that just isn'[t worth it.

I think the real problem however is the reliance on XBD 9.3.5 - delete that,
(the reference, not the section) and  describe glob character classes.

Delete "quote removal" and in the description of how matching works, the
quoting characters can be made to mean what they should mean for
patterns - nothing needs to be "removed" here, as the pattern is just used
for matching, the only result is matched or not matched.   Quoting just affects
the interpretation of the quoted characters, and otherwise matches nothing.

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Geoff Clare
Robert Elz  wrote, on 11 Apr 2018:
>
> Date:Wed, 11 Apr 2018 12:18:27 +0100
> From:Geoff Clare 
> 
>   | No, that text is very careful to say "*was* quoted", not "is quoted",
>   | for precisely this reason.  To conform to this requirement, the shell
>   | has to remember which characters were quoted when it removes the quotes.
>   | How this is done is a matter for the implementor.
> 
> Yes, I saw that, but was quoted when?
> 
> Eg:
> 
>   var='!a'
>   eval 'case b in *["$var"]*) echo match;;esac'
> 
> There in the case statement, everything "was" quoted once.
> 
> So that means we now are required to convert the case pattern to
> 
>   \*\[\!\a\]\*
> 
> does it?

Good point.

[...]
> Beyond that, to get back to the example in the original message, once we
> get past this "was quoted" stuff, we still need to deal with the later words
> in the same sentence:
> 
>   and is not in a bracket expression is prefixed by a backslash
> 
> That is, in the (approximately) original example
> 
>   case b in ["$var"]) ...
> 
> the "was quoted" is irrelevant, either way, as this is in a bracket 
> expression,
> and so the \ is not added, and we end up with
> 
>   case b in [!a]) ...
> not
>   case b in [\!\a]
> 
> and even if we somehow interpret XBD 9.3.5 as allowing the latter to mean
> a literal ! and a literal a are in the class (which is beyond stretching the
> language, it is downright breaking it) it does not matter, as that is not
> what we get, we get the former.

Ouch.  Looks like we do need to revisit bug 985.  I also have a feeling
we will have to abandon the neat idea of defining shell pattern matching
in terms of fnmatch().  I can't see any way to modify that new 2.13 text
so that it describes the correct behaviour of quoted '!', '-', "[.", etc.
in a bracket expression.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Robert Elz
Date:Wed, 11 Apr 2018 14:14:18 +0200
From:Joerg Schilling 
Message-ID:  <5acdfc1a.st0sitei1fsdsgeb%joerg.schill...@fokus.fraunhofer.de>

  | Then we should change the wording.

I agree.

  | The characters '.', '*' and '[' really lose their special meaning inside a 
  | character class.

Yes.

  | The '\\' on the other side always allowed to escape the meaning of '-' and 
the 
  | meaning of any other char, see the original code fragment from 1977:

In sh glob expressions, yes, but not in classes in RE's.   One of the issues is
that the standard is trying too hard for consistency, and so rather than
re-specify char classes for glob, it simply defers to char classes in REs,
and because of that, gets all of this wrong.

  | All modern implementations I am aware of do something similar with explicit 
'\\'
  | chars in the string.

Yes, I know - the question isn't what implementations do, or even should do,
but what the standard says they should do.  And how that is incorrect.

  | So the reason for the deviating behavior of ksh93 may be that it tries to 
  | follow 9.3.5 that does not seem to be alighed with the Bourne Shell and 
ksh88.

That very well may be.

kre




Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Robert Elz
Date:Wed, 11 Apr 2018 12:18:27 +0100
From:Geoff Clare 
Message-ID:  <2018041827.GA29286@lt2.masqnet>

  | No, that text is very careful to say "*was* quoted", not "is quoted",
  | for precisely this reason.  To conform to this requirement, the shell
  | has to remember which characters were quoted when it removes the quotes.
  | How this is done is a matter for the implementor.

Yes, I saw that, but was quoted when?

Eg:

var='!a'
eval 'case b in *["$var"]*) echo match;;esac'

There in the case statement, everything "was" quoted once.

So that means we now are required to convert the case pattern to

\*\[\!\a\]\*

does it?

It really is not  a good idea to try and craft minimal words that seem to 
achieve
the desired result - "was quoted" is just too vague.   Once again, after quote
removal, nothing is quoted.   When the code looked at the pattern, nothing was
quoted. Or it all was quoted.  Until you specify just what the "was" refers to.

There is nothing in the text that actually requires the implementation to do
what you suggest, because there's nothing to tell it how far back in time
that "was quoted" really means.   It might seem obvious to you, but
obvious to you isn't the right solution.

Beyond that, to get back to the example in the original message, once we
get past this "was quoted" stuff, we still need to deal with the later words
in the same sentence:

and is not in a bracket expression is prefixed by a backslash

That is, in the (approximately) original example

case b in ["$var"]) ...

the "was quoted" is irrelevant, either way, as this is in a bracket expression,
and so the \ is not added, and we end up with

case b in [!a]) ...
not
case b in [\!\a]

and even if we somehow interpret XBD 9.3.5 as allowing the latter to mean
a literal ! and a literal a are in the class (which is beyond stretching the
language, it is downright breaking it) it does not matter, as that is not
what we get, we get the former.

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Joerg Schilling
Robert Elz  wrote:

> Date:Wed, 11 Apr 2018 11:28:38 +0200
> From:Joerg Schilling 
> Message-ID:  
> <5acdd546.fkln7tigk21a+de6%joerg.schill...@fokus.fraunhofer.de>
>
>   | The problem is that the term "quote removal" is not related to a real 
> verified 
>   | shell implementation but rather explained by means of abstract wording 
> that 
>   | tries to avoid being too close to a real algorithm.
>
> Yes, I know that - and that's fine, provided what is specified actually works
> (so if someone were to implement it exacty as described, everything would 
> work.)   I am certainly not expecting that real, useful, implementations would
> be done that way.

Sometimes, I believe that it would help to understand the text if there was a 
real world example algorithm, e.g. what I mentioned on how Bourne Shell and ksh 
do it.

>   | In special: your example ["] does not work as your text might mean.
>   |
>   |   echo ["]
>
> No, I know that - in my first message on this topic (I think) I made that
> clear - [ and ] are just word characters to the lexer, and mean nothing
> at all (no different tnan a (except they're not alpha) or _ or %.   Quotes
> need to be paired (must have a beginning and end), a better example
> at that level would be
>
>   echo [""]
>
> which should (in some obscure theory) match a file named " if
> one exists, and in that case print just "\n (double quote followed by 
> newlline)

Well, since the argument is first passed through the macro expansion that 
removes "" and prefixes internal (currently none) characters all by a \, this 
did 
never match a file named ".

>   | > case "$x" in '*') echo found an asterisk;; esac
>   | > case "$x" in \*) echo found an asterisk;; esac
>   |
>   | Both commands are 100% equivalent:
>
> They are as currently implemented, yes, but not as specified, either before
> or after 985.

Then we should change the wording.

> The history lesson of how the Bourne shell worked, and has been changed
> over time, is interesting to read, but in no way really relevant to anything.
>
> What we need to specify is how shells (in general) actually work - what the
> users can rely upon safely using in their scripts, and what they cannot.
>
>   | and since the '-' is quoted, this does not match, as the pattern is 
> equivalent 
>   | to: [a\-c] that just lists the tree characters 'a', '-' and 'c'.
>
> Except that in char classes as defined in XBD 9.3.5 (which XCU 2.13 defers
> to, except for the change of ^ into ! for sh globs) does not treat \ as any 
> kind of quoting character:
>
>   The special characters '.', '*', '[', and '\\' (, ,
>   , and , respectively) shall lose
>   their special meaning within a bracket expression.

The characters '.', '*' and '[' really lose their special meaning inside a 
character class.

The '\\' on the other side always allowed to escape the meaning of '-' and the 
meaning of any other char, see the original code fragment from 1977:

SWITCH c = *p++ IN 
 
case '[': 
{BOOL ok; INT lc; 
ok=0; lc=07; 
WHILE c = *p++ 
DO  IF c==']' 
THENreturn(ok?gmatch(s,p):0); 
ELIF c==MINUS 
THENIF lc<=scc ANDF scc<=(*p++) THEN ok++ FI 
ELSEIF scc==(lc=(c&STRIP)) THEN ok++ FI 
FI 
OD 
return(0); 
} 

Check: "ELIF c==MINUS" here as the parser in original Bourne Shell converted
the string \- into "'-' + 0200" that does not match c==MINUS.

All modern implementations I am aware of do something similar with explicit '\\'
chars in the string.

So the reason for the deviating behavior of ksh93 may be that it tries to 
follow 9.3.5 that does not seem to be alighed with the Bourne Shell and ksh88.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Robert Elz
Date:Wed, 11 Apr 2018 11:28:38 +0200
From:Joerg Schilling 
Message-ID:  <5acdd546.fkln7tigk21a+de6%joerg.schill...@fokus.fraunhofer.de>

  | The problem is that the term "quote removal" is not related to a real 
verified 
  | shell implementation but rather explained by means of abstract wording that 
  | tries to avoid being too close to a real algorithm.

Yes, I know that - and that's fine, provided what is specified actually works
(so if someone were to implement it exacty as described, everything would 
work.)   I am certainly not expecting that real, useful, implementations would
be done that way.

  | In special: your example ["] does not work as your text might mean.
  |
  | echo ["]

No, I know that - in my first message on this topic (I think) I made that
clear - [ and ] are just word characters to the lexer, and mean nothing
at all (no different tnan a (except they're not alpha) or _ or %.   Quotes
need to be paired (must have a beginning and end), a better example
at that level would be

echo [""]

which should (in some obscure theory) match a file named " if
one exists, and in that case print just "\n (double quote followed by newlline)
or match nothing, and then print [""]\n (the arg unchanged, followed by
newline) if no file called " exists.   (substitute "printf '%s\n'" for "echo" if
you prefer, just to avoid any "echo should do..." discussions.)

Note: I don't believe any shell actually implements things that way,
and I don't think it would be useful to make them - quoting at sh
script level is more useful than character class purity - it just
needs to be specified properly, and currently, we do not have that.

  | >   case "$x" in '*') echo found an asterisk;; esac
  | >   case "$x" in \*) echo found an asterisk;; esac
  |
  | Both commands are 100% equivalent:

They are as currently implemented, yes, but not as specified, either before
or after 985.

The history lesson of how the Bourne shell worked, and has been changed
over time, is interesting to read, but in no way really relevant to anything.

What we need to specify is how shells (in general) actually work - what the
users can rely upon safely using in their scripts, and what they cannot.

  | and since the '-' is quoted, this does not match, as the pattern is 
equivalent 
  | to: [a\-c] that just lists the tree characters 'a', '-' and 'c'.

Except that in char classes as defined in XBD 9.3.5 (which XCU 2.13 defers
to, except for the change of ^ into ! for sh globs) does not treat \ as any 
kind of quoting character:

The special characters '.', '*', '[', and '\\' (, ,
, and , respectively) shall lose
their special meaning within a bracket expression.

Nothing in XCU 2.13 contradicts that or says it does not apply.   Hence,
according to the standard, that class

[a\-c]

should match any one of the characters \ ] ^ _ ` a b c
(that is, an a or anything between \ and c which in ascii
anyway, is that set of chars - 'a' matches both literally, and
as a character in the range, but that is OK, just as [ba-c] is OK.

Again, I know that's not how shells work, which is why it is under
discussion here, the text needs to be fixed to specify what the shells
actually do - properly.

kre




Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Geoff Clare
Robert Elz  wrote, on 11 Apr 2018:
>
> Incidentally, I know that this part of the 985 new text ...
> 
>   the first argument (pattern) is the same as patt, except each character
>   that was quoted in patt and is not in a bracket expression is prefixed 
> by a backslash
> 
> is intended to handle this problem, except it cannot - once we have done quote
> removal, what "was quoted" is lost, either we have the quotes, and know what 
> is
> quoted, or we don't, and don't.

No, that text is very careful to say "*was* quoted", not "is quoted",
for precisely this reason.  To conform to this requirement, the shell
has to remember which characters were quoted when it removes the quotes.
How this is done is a matter for the implementor.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Joerg Schilling
Geoff Clare  wrote:

> Here's a much simpler demonstration of the same "quoting within
> brackets" issue:
>
> $ ls
> b
>
> ksh93:
> $ echo ["a-c"]
> b
>
> ksh88 and bash:
> $ echo ["a-c"]
> [a-c]
>
> As Joerg pointed out, the intention would have been for POSIX to
> specify the ksh88 behaviour, so this should be considered to be bug
> in ksh93.

Thank you for this nice example as it helps to verify a behavior that I 
believed, it was impossible to verify.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Joerg Schilling
Geoff Clare  wrote:

> Robert Elz  wrote, on 11 Apr 2018:
> >
> > Lower down, it says ...
> > 
> > In order from the beginning to the end of the case statement, each 
> > pattern
> > that labels a compound-list shall be subjected to tilde expansion, 
> > parameter
> > expansion, command substitution, and arithmetic expansion, and the 
> > result
> > [note: no quote removal]
> > of these expansions shall be compared against the expansion of word,
>
> The missing quote removal here is a known defect in the standard.
> See http://austingroupbugs.net/view.php?id=985
>
> > Not doing quote removal on patterns is correct.
>
> No it isn't.  As bug 985 notes:
>
> $ case 'foo  bar' in "foo  bar") echo "quotes removed";; esac
> quotes removed

In the Bourne Shell, this matches the C-string

"foo  bar"

against the pattern

\f\o\o\ \ \b\a\r

since the "case pattern" is subject of macro expansion that expands '"' quoted 
strings to strings with quoted characters.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Robert Elz
Date:Wed, 11 Apr 2018 10:00:27 +0100
From:Geoff Clare 
Message-ID:  <20180411090027.GA18582@lt2.masqnet>

  | There is nothing to suggest that this does not apply to the characters
  | which, when unquoted, have a special meaning within bracket expressions
  | ('!', '-', "[.", etc.)

In file name patterns that might be correct, (because file name expansion 
happens before quote removal) but if bug 985 is correct, then in case
atterns, the quoting would already be removed before the pattern
was examined, so given

var=!a
case b in
["$var"]) whatever;;
esac

we expand (etc) the word first (nothing to do there) then each pattern (there 
is just one) in turn, first parameter expansion (etc), producing

["!a"]

then quote removal

[!a]

and then we match ('b' is not 'a').   That the quotes used to be there is now no
longer apparent.

I suspect that the text in 985 needs to be revised to allow for this, or there
is no question but that the ksh93 interpretation is correct, and every other
shell is wrong.

In general, quoting in patterns has only ever been possible using \ and in
character classes, no quoting at all ([\]] is traditionally a class containg a
backslash, followed by a literal ']' not a class containing a ']'.

Since order in a class is irrelevant, ordering of the elements has been
used to allow any character to appear in the class) without needing a
quoting mechanism.

Shells have largely not been that strict, largely because (at least for the
older shells, I don't know how more modern ones do it) the posix requirement
that the quotes in quoted words be left intact in the result from the lexer
has largely been ignored, and quoting has been indicated in other ways,
which make it easier, and faster, to tell exactly what is quoted and what is
not every time later the shell needs to know (the lexer does the scanning
once, and after that nothing ever needs to count beginning and ending
quote chars, etc).   A side effect of that is that (with quote removal not 
being done - and this is why I assume the standard did not originally
specify it for case patterns) everything just works the way it is expected
(a quoted a and an unquoted a still match, but a quoted ! is not the
"not in class" character, only an unquoted ! can be that.

I suspect ksh93 has "fixed" all of this, and implements more what the
standard actually says.

We need to be much more precise about matching, and everything related
to it than we currently are, and 985 doesn't help, it makes things worse
(though I fully understand, and agree with, the motivation for that defect 
report.)

Incidentally, I know that this part of the 985 new text ...

the first argument (pattern) is the same as patt, except each character
that was quoted in patt and is not in a bracket expression is prefixed 
by a backslash

is intended to handle this problem, except it cannot - once we have done quote
removal, what "was quoted" is lost, either we have the quotes, and know what is
quoted, or we don't, and don't.   The only way to fix this is to remove quote
removal from case patterns, and instead specify more precisely how a
(possibly quoted) string is turned into a fnmatch pattern.

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Joerg Schilling
Martijn Dekker  wrote:

> Op 10-04-18 om 15:59 schreef Joerg Schilling:
> > Whom do you call "current ksh93 lead developers"?
>
> As far as I can tell from what's going on at the github repo, Siteshwar 
> Vashisht and Kurtis Rader currently appear to be in charge of its 
> development.

I am still in hope that David will soon again be the "leader" again. He 
understands the internals of ksh and he is one of the guys at AT&T that made 
important decisions on many interfaces.

BTW: I started to become the Bourne Shell maintainer in 2006, but it took 7 
years for me to become able to make enhancements that need an in-depth 
understanding of the data flow in the shell. ... even though I maintain my own 
other shell since 1984.

Do not expect newcomers to be the right decision now and be careful about the 
changes they introduce. They did e.g. remove code just because they don't 
understand it...

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Joerg Schilling
Robert Elz  wrote:

> Date:Tue, 10 Apr 2018 13:41:25 +0100
> From:Martijn Dekker 
> Message-ID:  
>
>   | Does POSIX specify anything, either way, regarding the effect of shell 
>   | quoting within glob bracket patterns?
>
> I would say it is unclear - in general, quoting inside [] does not work
> (XCU 2.13 char classes are derived from XBD 9.3.5 char classes, and
> in the latter, quote characters are just characters ["] is a char class 
> containing just a double quote character.

The problem is that the term "quote removal" is not related to a real verified 
shell implementation but rather explained by means of abstract wording that 
tries to avoid being too close to a real algorithm.

In special: your example ["] does not work as your text might mean.

echo ["]

results in a secondary prompt with all roughly POSIX-like shells I am aware of, 
including the historic Bourne Shell.

...

> That said, in practice, shells implement, and people expect, that "" and ''
> quoting works in case patterns, at least in expressions like
>
>   case "$x" in '*') echo found an asterisk;; esac
>
> even though this seems to be against the literal interpretation of 2.13.1 
> which
> would require
>
>   case "$x" in \*) echo found an asterisk;; esac

Both commands are 100% equivalent:

The historical Bourne Shell did convert 'a' and \a into a 'a' with the top bit 
set in the parser and kept '"'s in the argument strings.


In the late 1980's Bourne Shell and ksh88 have been modified to convert 'a' and 
\a
into a \a and a string like 'abc' into \a\b\c in the parser and keep '"'s in 
the 
argument strings.

During macro expansion, the historic Bourne Shell did convert "abc" strings into
the string abc with the top bit set on all characters and modern Bourne Shells 
and ksh88 started to convert "abc" during macro expansion into \a\b\c, so this 
prevents glob expansion for the related characters.

The code fragment:

var='a-c'

case b in ["$var"]) ...

is thus equivalent to:

case b in [\a\-\c]) ...

and since the '-' is quoted, this does not match, as the pattern is equivalent 
to: [a\-c] that just lists the tree characters 'a', '-' and 'c'.



Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Joerg Schilling
Martijn Dekker  wrote:

> Op 10-04-18 om 21:06 schreef Robert Elz:
> >  Date:Tue, 10 Apr 2018 13:41:25 +0100
> >  From:Martijn Dekker 
> >  Message-ID:  
> > 
> >| Does POSIX specify anything, either way, regarding the effect of shell
> >| quoting within glob bracket patterns?
> > 
> > I would say it is unclear - in general, quoting inside [] does not work
> > (XCU 2.13 char classes are derived from XBD 9.3.5 char classes, and
> > in the latter, quote characters are just characters ["] is a char class
> > containing just a double quote character.
>
> However:
>
> $ ksh93 -c 'case \" in ["a-z"]) echo match;; *) echo no match;; esac'
> no match

See my mail from just 5 minutes ago: the '"' is handled by the parser already 
and thus ["] will cause a secondary prompt.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Geoff Clare
I wrote:
>
> And I believe the standard does clearly require the ksh88/bash
> behaviour because of this statement in 2.2.3 Double-Quotes:
> 
> Enclosing characters in double-quotes ("") shall preserve the
> literal value of all characters within the double-quotes, with the
> exception of the characters backquote, , and
> 
> 
> There is nothing to suggest that this does not apply to the characters
> which, when unquoted, have a special meaning within bracket expressions
> ('!', '-', "[.", etc.)

Furthermore, there is clear evidence from 2.13.1 that double quotes do
affect special characters within bracket expressions:

A bracket expression starting with an unquoted  character
produces unspecified results.

Note the use of "unquoted".

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Geoff Clare
Martijn Dekker  wrote, on 10 Apr 2018:
>
> Re: https://github.com/att/ast/issues/71
> 
> Consider this test script:
> 
> (set -o posix) 2>/dev/null && set -o posix
> emulate sh 2>/dev/null  # for zsh
> for var in 'a-c' '!a'; do
>   case b in
>   ( ["$var"] )echo 'quirk' ;;
>   ( [$var] )  echo 'no quirk' ;;
>   esac
> done

Here's a much simpler demonstration of the same "quoting within
brackets" issue:

$ ls
b

ksh93:
$ echo ["a-c"]
b

ksh88 and bash:
$ echo ["a-c"]
[a-c]

As Joerg pointed out, the intention would have been for POSIX to
specify the ksh88 behaviour, so this should be considered to be bug
in ksh93.

And I believe the standard does clearly require the ksh88/bash
behaviour because of this statement in 2.2.3 Double-Quotes:

Enclosing characters in double-quotes ("") shall preserve the
literal value of all characters within the double-quotes, with the
exception of the characters backquote, , and


There is nothing to suggest that this does not apply to the characters
which, when unquoted, have a special meaning within bracket expressions
('!', '-', "[.", etc.)

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-11 Thread Geoff Clare
Robert Elz  wrote, on 11 Apr 2018:
>
> Lower down, it says ...
> 
>   In order from the beginning to the end of the case statement, each 
> pattern
>   that labels a compound-list shall be subjected to tilde expansion, 
> parameter
>   expansion, command substitution, and arithmetic expansion, and the 
> result
> [note: no quote removal]
>   of these expansions shall be compared against the expansion of word,

The missing quote removal here is a known defect in the standard.
See http://austingroupbugs.net/view.php?id=985

> Not doing quote removal on patterns is correct.

No it isn't.  As bug 985 notes:

$ case 'foo  bar' in "foo  bar") echo "quotes removed";; esac
quotes removed

If quote removal were not performed on the patterns, this would not match.
You would see:

$ case '"foo  bar"' in "foo  bar") echo "quotes not removed";; esac
quotes not removed

instead.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-10 Thread Martijn Dekker

Op 10-04-18 om 22:50 schreef Jilles Tjoelker:

I prefer "no quirk" twice as output but it is indeed not fully
specified.


I agree with your preference.

Ignoring shell quoting in glob bracket patterns means removing a useful 
feature: the ability to pass an arbitrary string of characters in a 
parameter, one of which is to be matched.


OTOH, honouring shell quoting in glob bracket patterns does not remove 
any functionality, as you can simply not quote the expansion (which is 
safe, as it is not subject to split or glob in that context).


So if this is indeed not specified, I think the standard ought to be 
amended to specify the current majority behaviour (everything but ksh93).


- Martijn



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-10 Thread Jilles Tjoelker
On Tue, Apr 10, 2018 at 01:41:25PM +0100, Martijn Dekker wrote:
> Re: https://github.com/att/ast/issues/71

> Consider this test script:

> (set -o posix) 2>/dev/null && set -o posix
> emulate sh 2>/dev/null  # for zsh
> for var in 'a-c' '!a'; do
>   case b in
>   ( ["$var"] )echo 'quirk' ;;
>   ( [$var] )  echo 'no quirk' ;;
>   esac
> done

> Most shells output 'no quirk' for both values of 'var', but AT&T ksh93 
> outputs 'quirk' for both, as does zsh 5.2 and earlier (zsh-as-sh changed 
> to match the majority in 5.3). Now one of the current ksh93 lead 
> developers says this does not look like a bug.

> Does POSIX specify anything, either way, regarding the effect of shell 
> quoting within glob bracket patterns? I can't find any relevant text 
> under "2.13 Pattern Matching Notation" or anything it references, so 
> clarification would be appreciated.

The first paragraph of 2.13.1 Patterns Matching a Single Character
contains some confusing or contradictory text about backslashes; this
text was amended for http://austingroupbugs.net/view.php?id=806 but was
confusing or contradictory even before that change. The change was made
for fnmatch() and perhaps the part about backslashes in the first
paragraph was actually meant to handled in the last paragraph in the
part that explicitly says it is only about contexts such as fnmatch()
where shell quote removal is not performed.

The rest of 2.13.1 discusses "quoting" of characters in various
locations. I think it is reasonable to assume that shell quoting is
meant. Only the effect of quoting '!', '-' and ']' in a bracket
expression is not specified (but the effect of quoting '^' is: it makes
the '^' a literal part of the set).

I prefer "no quirk" twice as output but it is indeed not fully
specified.

-- 
Jilles Tjoelker



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-10 Thread Martijn Dekker

Op 10-04-18 om 15:59 schreef Joerg Schilling:

Whom do you call "current ksh93 lead developers"?


As far as I can tell from what's going on at the github repo, Siteshwar 
Vashisht and Kurtis Rader currently appear to be in charge of its 
development.


- M.



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-10 Thread Martijn Dekker

Op 10-04-18 om 21:52 schreef Robert Elz:

No, it doesn't.

Read that again, with the emphasis I am adding ...

   
|http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05
   | | The conditional construct case shall execute the/compound-list/
   | | corresponding to the first one of several/patterns/  (see Pattern
   | | Matching Notation) that is matched by the string resulting from the
   | | tilde expansion, parameter expansion, command substitution,
   | | arithmetic expansion, and quote removal **of the given word**.

That part is talking about the "case $x in" $x is the "given word",
that is certainly subject to quote removal.

Quite right, I stand corrected.

Thanks,

- M.



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-10 Thread Robert Elz
Date:Tue, 10 Apr 2018 21:28:01 +0100
From:Martijn Dekker 
Message-ID:  <6e79f3b1-732e-a7d4-1d07-a04d7a9cf...@inlv.org>

  | But it is. POSIX explicitly specifies quote removal for 'case' patterns:

No, it doesn't.

Read that again, with the emphasis I am adding ...

  | 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05
  | | The conditional construct case shall execute the /compound-list/
  | | corresponding to the first one of several /patterns/ (see Pattern
  | | Matching Notation) that is matched by the string resulting from the
  | | tilde expansion, parameter expansion, command substitution,
  | | arithmetic expansion, and quote removal **of the given word**.

That part is talking about the "case $x in" $x is the "given word", that is 
certainly
subject to quote removal.

Lower down, it says ...

In order from the beginning to the end of the case statement, each 
pattern
that labels a compound-list shall be subjected to tilde expansion, 
parameter
expansion, command substitution, and arithmetic expansion, and the 
result
[note: no quote removal]
of these expansions shall be compared against the expansion of word,
[from the sectin you quoted]
 according to the rules described in Section 2.13

If quote removal were done on patterns, then to match a literal asterisk
we would need something like

case "$x" in \\*) ...

as the quote removal would leave \* which would then be a quoted asterisk.

Similarly, '*' would be interpreted as just * (the quotes being removed) and
so "match anything" which is  also not what anyone does, or wants.

Not doing quote removal on patterns is correct.

  | I hope you won't change it to ksh93's counterintuitive behaviour. Your 
  | current behaviour is certainly consistent with POSIX (as well as every 
  | other current shell except ksh93).

I have no current plan to change that, this is an area where I believe the
standard needs some work first.   After that, if what the standard says is
different from what we implement, and is also reasonable (and unlikely to
break too much) then I might make changes.

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-10 Thread Chet Ramey
On 4/10/18 4:28 PM, Martijn Dekker wrote:

>> [this includes case patterns as quote removal is not performed on them]
> 
> But it is. POSIX explicitly specifies quote removal for 'case' patterns:
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05
> 
> | The conditional construct case shall execute the /compound-list/
> | corresponding to the first one of several /patterns/ (see Pattern
> | Matching Notation) that is matched by the string resulting from the
> | tilde expansion, parameter expansion, command substitution,
> | arithmetic expansion, and quote removal of the given word.

That text is describing the `word', not the patterns kre is talking about.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-10 Thread Martijn Dekker

Op 10-04-18 om 21:06 schreef Robert Elz:

 Date:Tue, 10 Apr 2018 13:41:25 +0100
 From:Martijn Dekker 
 Message-ID:  

   | Does POSIX specify anything, either way, regarding the effect of shell
   | quoting within glob bracket patterns?

I would say it is unclear - in general, quoting inside [] does not work
(XCU 2.13 char classes are derived from XBD 9.3.5 char classes, and
in the latter, quote characters are just characters ["] is a char class
containing just a double quote character.


However:

$ ksh93 -c 'case \" in ["a-z"]) echo match;; *) echo no match;; esac'
no match

The quotes are not considered part of the bracket expression, but 
removed by the shell, even on ksh93.



Also, 2.13.1 does say:

When pattern matching is used where shell quote removal is not performed
[...]

[this includes case patterns as quote removal is not performed on them]


But it is. POSIX explicitly specifies quote removal for 'case' patterns:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05
| The conditional construct case shall execute the /compound-list/
| corresponding to the first one of several /patterns/ (see Pattern
| Matching Notation) that is matched by the string resulting from the
| tilde expansion, parameter expansion, command substitution,
| arithmetic expansion, and quote removal of the given word.




The NetBSD sh however produces "no quirk" for both -


I hope you won't change it to ksh93's counterintuitive behaviour. Your 
current behaviour is certainly consistent with POSIX (as well as every 
other current shell except ksh93).


- M.



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-10 Thread Robert Elz
Date:Tue, 10 Apr 2018 13:41:25 +0100
From:Martijn Dekker 
Message-ID:  

  | Does POSIX specify anything, either way, regarding the effect of shell 
  | quoting within glob bracket patterns?

I would say it is unclear - in general, quoting inside [] does not work
(XCU 2.13 char classes are derived from XBD 9.3.5 char classes, and
in the latter, quote characters are just characters ["] is a char class 
containing just a double quote character.

Also, 2.13.1 does say:

When pattern matching is used where shell quote removal is not 
performed 
[...]

[this includes case patterns as quote removal is not performed on them]

special characters can be escaped to remove their special meaning by
preceding them with a  character.
[...]

and that is the only quoting method provided.   That would suggest that

["$var"]

is first parameter expanded ($var becomes !a in one of the cases) resulting in

["!a"]

which is a character class that matches a double quote, an exclamation mark,
or an 'a' (including a character twice is harmless - though the terminating "
is needed here (somewhere) so the lexer can recognise the pattern word 
properly.

That said, in practice, shells implement, and people expect, that "" and ''
quoting works in case patterns, at least in expressions like

case "$x" in '*') echo found an asterisk;; esac

even though this seems to be against the literal interpretation of 2.13.1 which
would require

case "$x" in \*) echo found an asterisk;; esac

to achieve this effect - with the earlier one matching a string that starts and
ends with single quote chars, and has anything between them.

Regardless of the POSIX wording, I think this part is set in stone (that both of
the above match a literal asterisk) and should be clarified.

The effect of quotes inside [] though is much less clear.

Joerg: I suspect that the original Bourne sh behaviour is probably just an
artifact of the (crude) way that quoting was parsed in the lexer, which is
in no way posix (nor useful, nor implemented any more).   That would have
changed the '!' and '-' into things that were not those characters, 
unconditionally - hence they don't perform as they would if they appeared
unquoted.   That is, I do not believe that provides any useful help.

My interpretation from the standard of the correct expected result is
"quirk" for a-c (as ["a-c"] is a class containing a double quote, and all
chars from a to c (which includes b) but "no qurk" for !a as 'b' is none
of a double quote, an exclamation point, nor an 'a'.

The NetBSD sh however produces "no quirk" for both - again partly because
of the quirky way that it implements quoting in the lexer (different than the
original Bourns sh, but still not the same as POSIX expects.)

kre



Re: Should shell quoting within glob bracket patterns be effective?

2018-04-10 Thread Joerg Schilling
Martijn Dekker  wrote:

> Re: https://github.com/att/ast/issues/71
>
> Consider this test script:
>
> (set -o posix) 2>/dev/null && set -o posix
> emulate sh 2>/dev/null  # for zsh
> for var in 'a-c' '!a'; do
>   case b in
>   ( ["$var"] )echo 'quirk' ;;
>   ( [$var] )  echo 'no quirk' ;;
>   esac
> done
>
> Most shells output 'no quirk' for both values of 'var', but AT&T ksh93 
> outputs 'quirk' for both, as does zsh 5.2 and earlier (zsh-as-sh changed 
> to match the majority in 5.3). Now one of the current ksh93 lead 
> developers says this does not look like a bug.

Whom do you call "current ksh93 lead developers"?

> Does POSIX specify anything, either way, regarding the effect of shell 
> quoting within glob bracket patterns? I can't find any relevant text 
> under "2.13 Pattern Matching Notation" or anything it references, so 
> clarification would be appreciated.

Given that ksh88 and the original Bourne Shell both return 'no quirk' for both 
values, this is a strong hint that ksh93 is wrong.


Given that "bosh" returned "quirk" for the first one before I fixed a bug in 
the gmatch() implementation, it is highly probable that ksh93 has a bug in it's 
pattern matcher.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'