Re: [1003.1(2016)/Issue7+TC2 0001295]: Left brackets in shell patterns may cause other pattern matching characters to be taken literally in all contexts

2019-12-14 Thread Harald van Dijk

On 14/12/2019 11:29, Robert Elz wrote:

 Date:Fri, 13 Dec 2019 08:57:28 +
 From:Harald van Dijk 
 Message-ID:  <9c1e4a98-e557-e5d1-27bc-05e7bbfe5...@gigawatt.nl>

   | Pretending such scripts are portable does a disservice to users.

Since the shells you said are "different" are pdksh, posh, bosh, and
old versions of dash, I'm not sure that is a huge problem.   [...]


This was not an exhaustive list, nor was it intended to be.


[...]
   | If you object to that decision, the
   | right course of action is not to prevent fixing the specification to
   | match that, it is to get that decision reversed and get the exception in
   | 2.13.3 removed. This inconsistency in the standard is not useful for any
   | shell or user.

For users, no, for shells, it can make life easier in the globbing case
(depending how the directory splitting gets done) - but I certainly agree
that it should be made consistent, and that in all cases [* should match
any string (filename) starting with a '['.   That's teh way bracket expressions
are specified,they begin with '[' and end with ']' - absent either of those
no bracket expression exists, and in that case, the '[' is just a char,
as is a bare ']'.  I don't thing anyone expects that "*]" (quotes just for
the e-mail) should ever match anything other than a string ending in ']'.


Then can you please open a bug asking for exactly that change to be made 
to the standard?


If it is accepted, at least there will be consistency. If it is 
rejected, if it is not considered reasonable to expect the shells that 
would require changes for this to implement those changes, then the bug 
I had opened should be reconsidered.



   | This is an inappropriate insult. Please take it back.

It wasn't intended as an insult, and wasn't aimed at anyone in particular.


You wrote "too lazy to do things correctly", which is not a criticism of 
shells, but of the motivations of their authors. Before accusing those 
authors, perhaps you could first ask for their motivations. I will 
describe my motivation below.



It was more a objective to keep in mind when deciding what the standard
should say, and (whoever says it) "we don't do that, and implementing it
would be hard" should never be a reason to allow whatever it is as acceptable
behaviour if otherwise it would not be.


"Implementing it would be hard" is not a good reason to keep behaviour 
unspecified, or to not change shell behaviour to comply with POSIX.


"Implementing it would be unreasonable", however, can be a good reason 
not to change shell behaviour to comply with POSIX depending on why the 
author considers it unreasonable, and regardless of whether those 
authors are correct, if it turns out a significant number of shells 
refuse to implement what POSIX specifies, that by itself should also be 
a reason for POSIX to change its specification to allow what the shells 
actually do.


My concern is that parsing shell patterns should have linear complexity. 
This special handling of '[', where it can be either a literal character 
or a metacharacter depending on whether a matching ']' appears later in 
the pattern, turns the complexity quadratic in all shells I have 
checked, including yours and mine, that implement that requirement, 
while it can trivially be linear in the shells that do not implement 
that. That is a very high cost for a feature that in practice portable 
shell scripts already cannot rely on. I am attempting to figure out a 
way to reduce the complexity while continuing to treat metacharacters 
after an unmatched '[' as metacharacters, but I am not sure yet whether 
it is possible at all, let alone how to implement it.


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001295]: Left brackets in shell patterns may cause other pattern matching characters to be taken literally in all contexts

2019-12-14 Thread Robert Elz
Date:Fri, 13 Dec 2019 08:57:28 +
From:Harald van Dijk 
Message-ID:  <9c1e4a98-e557-e5d1-27bc-05e7bbfe5...@gigawatt.nl>

  | Pretending such scripts are portable does a disservice to users.

Since the shells you said are "different" are pdksh, posh, bosh, and
old versions of dash, I'm not sure that is a huge problem.   We don't
change the standard to unspecified because of a bug, or difference, in
one or two shells (if we did, "cd -L" would vanish (or become unspecified),
as the NetBSD shell neither does, nor ever will, support that).

There are times when we simply have to say that some shells are not
conformant - non-conformant shells aren't expected to necessarily be
conformant in all respects (obviously) and scripts running using them
need to be aware of their differences.   Attempting to make everything
confoeming by dumbing down the standard to uesless is not a productive
endeavour.

  | The decision to allow both behaviours was taken long ago and I 
  | see the fact that it was only specified in one specific context of 
  | pattern matching as an oversight.

I would agree with that, it should be consistent (though the place
where it applies most is in globbing, because of stuff like
xyx[abc/def]pqr
so it is understandable that that case received more attention.

  | If you object to that decision, the 
  | right course of action is not to prevent fixing the specification to 
  | match that, it is to get that decision reversed and get the exception in 
  | 2.13.3 removed. This inconsistency in the standard is not useful for any 
  | shell or user.

For users, no, for shells, it can make life easier in the globbing case
(depending how the directory splitting gets done) - but I certainly agree
that it should be made consistent, and that in all cases [* should match
any string (filename) starting with a '['.   That's teh way bracket expressions
are specified,they begin with '[' and end with ']' - absent either of those
no bracket expression exists, and in that case, the '[' is just a char,
as is a bare ']'.  I don't thing anyone expects that "*]" (quotes just for
the e-mail) should ever match anything other than a string ending in ']'.

  | This is an inappropriate insult. Please take it back.

It wasn't intended as an insult, and wasn't aimed at anyone in particular.
It was more a objective to keep in mind when deciding what the standard
should say, and (whoever says it) "we don't do that, and implementing it
would be hard" should never be a reason to allow whatever it is as acceptable
behaviour if otherwise it would not be.

kre