[1003.1(2016)/Issue7+TC2 0001264]: "default locale" inadequately specified in newlocale()

2019-06-27 Thread Austin Group Bug Tracker


A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1264 
== 
Reported By:shware_systems
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1264
Category:   System Interfaces
Type:   Clarification Requested
Severity:   Objection
Priority:   normal
Status: New
Name:   Mark Ziegast 
Organization:   SHware Systems Dev. 
User Reference:  
Section:newlocale(), others 
Page Number:1392 
Line Number:46280-2 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2019-06-27 19:01 UTC
Last Modified:  2019-06-28 03:36 UTC
== 
Summary:"default locale" inadequately specified in
newlocale()
== 

-- 
 (0004456) Don Cragun (manager) - 2019-06-28 03:36
 http://austingroupbugs.net/view.php?id=1264#c4456 
-- 
Do current implementations of setlocale() and newlocale() behave as
specified in the Desired Action?

What makes you think that an implementation defined locale is in any way
required to provide definitions for all locale categories that are
extensions to the POSIX required locale categories?  What in the current
standard requires this to happen? 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-06-27 19:01 shware_systems New Issue
2019-06-27 19:01 shware_systems Name  => Mark Ziegast
2019-06-27 19:01 shware_systems Organization  => SHware Systems Dev.
2019-06-27 19:01 shware_systems Section   => newlocale(), others
2019-06-27 19:01 shware_systems Page Number   => 1392
2019-06-27 19:01 shware_systems Line Number   => 46280-2 
2019-06-28 03:36 Don Cragun Note Added: 0004456  
==




Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Harald van Dijk

On 27/06/2019 10:04, Geoff Clare wrote:

Stephane Chazelas  wrote, on 26 Jun 2019:


Or again, forget all about it and treat the ksh93 behaviour as
non-compliant as is already the case.


I'm starting to think that this is what we should do, given the number
of oddities you have identified and the potential to break existing
applications that use parentheses in find -name, fnmatch(), etc.

The primary aim (of those of us discussing the issue in teleconferences)
in resolving bug 1234 is consistency.  I was hoping that we could bring
some consistency between contexts where *(...) etc. are syntax errors in
POSIX and those where they aren't by limiting which cases can be
considered special.  But that doesn't look workable now.

So here's a new proposal which just clarifies that *(...) etc. can
only be special when they would otherwise be a syntax error.


I'm not objecting, but even if you limit it to this, it's still a 
change, not a clarification, no? It came as a surprise to some people, 
but I do not see anything ambiguous in the current standard.


This would disallow the ksh extensions (other than where they would be a 
syntax error) everywhere, including fnmatch() and utilities doing 
pattern matching, if I am reading it correctly. If so, the pax example 
in the rationale I referenced, the one that shows or at least suggests 
that ( needs to be escaped, could use updating too:



pax -r ... "*a\(\?"

to extract a filename ending with "a(?".


could be changed to


pax -r ... "*a\?"

to extract a filename ending with "a?".


or even


pax -r ... "*a(\?"

to extract a filename ending with "a(?".


to be explicit about the new requirement.

I think I see a small wording issue:


   [...] If any character (ordinary, shell
special, or pattern special) is quoted, using either shell quoting
or (where shell quoting is not in effect) a  escape, that
pattern shall match the character itself. [...]


You excluded the bits in this proposal that would change the handling of 
backslash, so the "(where shell quoting is not in effect)" doesn't look 
right. It also seems more important to include "using either shell 
quoting (where shell quoting is in effect) or [...]" to prevent someone 
from interpreting this as applying to


  find . -name '*.c'

Less important, under the current wording, backslash escapes the next 
character, it does not quote it. The requirements of quoting and 
escaping are the same, so perhaps it is okay to change the terminology.


Worth mentioning is that this change, and the recommendation to 
implementations to not implement extensions to pattern matching other 
than under non-standard options, contradicts the last comment on 
:


 During May 27 2010 conf call, general consensus is that ksh93 filename generation appears to have many useful extensions, and we should move in that direction. See http://www2.research.att.com/sw/download/man/man1/ksh.html [^] for man page details. New wording invited. 


Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Harald van Dijk

On 27/06/2019 11:27, Joerg Schilling wrote:

Stephane Chazelas  wrote:

Hi,

thank you for starting a new discussion that is based on analysing the overall
results of the "proposed new behavior".


Today, by your reading of the spec and I agree it can be seen as
a valid reading, the spec is telling me that:

1.

a='\.'
printf '%s\n' $a

is a portable script that is meant to output "."


I know just one single shell that outputs "." with this code.

This is bash5. Note that POSIX is a portable source standard and other shells
that may behave like bash5 currently only compile and work on a single platform.


I had already informed you before this of two platforms my shell gets 
testing on.


That aside, I asked you last time you made this claim about POSIX to 
back it up. There is no requirement for standard utilities to be 
implemented portably. You responded then:



POSIX intends to create portability at source code level.

Code that is not portable does not follow the POSIX way.


That's not a requirement for POSIX implementations, so it's not relevant.

Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane Chazelas
2019-06-21 18:48:16 +, Austin Group Bug Tracker:
[...]
> There's another aspect which I haven't mentioned yet (I'll develop more on
> that later) where the bash5 behaviour is making things worse when character
> sets like BIG5, GB18030 that have characters that contain the encoding of
> backslash are involved. 
[...]

Sorry, I realise I forgot to follow-up on that.

My thinking was that the ASCII encoding of \ (0x5C) contrary to
other glob operators appears in many other characters in those
BIG5, BIG5HKSCS, GB18030, GBK charsets, but that's not actually
true as the encoding of [ and ] (0x5B and 0x5D) appear just as
often.

$ LC_ALL=zh_HK.big5hkscs luit
$ locale charmap
BIG5-HKSCS
$ touch η
$ a='αb' bash4 -c 'echo $a'
αb
$ a='αb' LC_ALL=C bash4 -c 'echo $a'
αb

$ a='αb' bash5 -c 'echo $a'
αb
$ a='αb' LC_ALL=C bash5 -c 'echo $a'
η

(where α is 0xa3 \ and η is 0xa3 b)

So the outputting of the content of a variable becomes dependent
on the locale. But anyway, it's already even worse with [ ] and
there's not much we can do about it except making sure no locale
with those charsets are available on our systems:

$ locale charmap
BIG5-HKSCS
$ a='Ωbβ' bash -c 'echo $a'
Ωbβ
$ a='Ωbβ' LC_ALL=C bash -c 'echo $a'
η
$ a='Ωbβ' dash -c 'echo $a'
η  (dash is not multi-byte aware)
$ zsh -c 'echo Ω'
zsh:1: no matches found: Ω (BUG)

(Ω is 0xa3 [ and β 0xa3 ])

-- 
Stephane



[1003.1(2016)/Issue7+TC2 0001249]: fsetpos() "state indicator" text does not match C99

2019-06-27 Thread Austin Group Bug Tracker


The following issue has been RESOLVED. 
== 
http://austingroupbugs.net/view.php?id=1249 
== 
Reported By:geoffclare
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1249
Category:   System Interfaces
Type:   Error
Severity:   Objection
Priority:   normal
Status: Resolved
Name:   Geoff Clare 
Organization:   The Open Group 
User Reference:  
Section:fsetpos() 
Page Number:960 
Line Number:32633 
Interp Status:  --- 
Final Accepted Text: 
Resolution: Accepted
Fixed in Version:   
== 
Date Submitted: 2019-05-10 11:21 UTC
Last Modified:  2019-06-27 16:13 UTC
== 
Summary:fsetpos() "state indicator" text does not match C99
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-05-10 11:21 geoffclare New Issue
2019-05-10 11:21 geoffclare Name  => Geoff Clare 
2019-05-10 11:21 geoffclare Organization  => The Open Group  
2019-05-10 11:21 geoffclare Section   => fsetpos()   
2019-05-10 11:21 geoffclare Page Number   => 960 
2019-05-10 11:21 geoffclare Line Number   => 32633   
2019-05-10 11:21 geoffclare Interp Status => --- 
2019-06-27 16:13 nick   Status   New => Resolved 
2019-06-27 16:13 nick   Resolution   Open => Accepted
==




[1003.1(2016)/Issue7+TC2 0001247]: subshell execution environment and traps

2019-06-27 Thread Austin Group Bug Tracker


The following issue has been RESOLVED. 
== 
http://austingroupbugs.net/view.php?id=1247 
== 
Reported By:kre
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1247
Category:   Shell and Utilities
Type:   Error
Severity:   Editorial
Priority:   normal
Status: Resolved
Name:   Robert Elz 
Organization:
User Reference:  
Section:2.12 
Page Number:2382 
Line Number:76195-6, 76201-2 
Interp Status:  --- 
Final Accepted Text: 
Resolution: Accepted
Fixed in Version:   
== 
Date Submitted: 2019-04-19 02:08 UTC
Last Modified:  2019-06-27 16:10 UTC
== 
Summary:subshell execution environment and traps
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-04-19 02:08 kreNew Issue
2019-04-19 02:08 kreName  => Robert Elz  
2019-04-19 02:08 kreSection   => 2.12
2019-04-19 02:08 krePage Number   => 2382
2019-04-19 02:08 kreLine Number   => 76195-6, 76201-2
2019-06-27 16:10 Don Cragun Interp Status => --- 
2019-06-27 16:10 Don Cragun Status   New => Resolved 
2019-06-27 16:10 Don Cragun Resolution   Open => Accepted
==




[1003.1(2016)/Issue7+TC2 0001243]: newlocale(3) wording unintentionally permits ignoring the "base" argument

2019-06-27 Thread Austin Group Bug Tracker


A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1243 
== 
Reported By:schwarze
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1243
Category:   System Interfaces
Type:   Clarification Requested
Severity:   Comment
Priority:   normal
Status: Resolved
Name:   Ingo Schwarze 
Organization:   OpenBSD 
User Reference:  
Section:newlocale 
Page Number:1392 
Line Number:46271-46272, 46280-46282 
Interp Status:  --- 
Final Accepted Text:See
http://austingroupbugs.net/view.php?id=1243#c4347. 
Resolution: Accepted As Marked
Fixed in Version:   
== 
Date Submitted: 2019-03-29 13:00 UTC
Last Modified:  2019-06-27 15:59 UTC
== 
Summary:newlocale(3) wording unintentionally permits
ignoring the "base" argument
== 

-- 
 (0004455) eblake (manager) - 2019-06-27 15:59
 http://austingroupbugs.net/view.php?id=1243#c4455 
-- 
Change page 1392, lines 46270-2 from:The newlocale( )
function shall create a new locale object or
modify an existing one. If the base argument is (locale_t)0,
a new locale object shall be created. It is unspecified whether
the locale object pointed to by base shall be modified, or freed
and a new locale object created.to:The
newlocale( ) function shall create a new locale object or
modify an existing one. If the base argument is (locale_t)0,
a new locale object shall be created, otherwise the locale
specified by base shall be modified. In the latter case it is
unspecified whether the resulting locale object shall be that
pointed to by base modified in place, or whether that object shall be
freed after a new locale object is first created using some values from
it.

I doubt any other changes are needed. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-03-29 13:00 schwarze   New Issue
2019-03-29 13:00 schwarze   Name  => Ingo Schwarze   
2019-03-29 13:00 schwarze   Organization  => OpenBSD 
2019-03-29 13:00 schwarze   Section   => newlocale   
2019-03-29 13:00 schwarze   Page Number   => unknown because the
printed spec is not publicly available
2019-03-29 13:00 schwarze   Line Number   => unknown because the
printed spec is not publicly available
2019-03-29 14:12 kreNote Added: 0004347  
2019-03-29 14:32 schwarze   Note Added: 0004348  
2019-03-29 20:14 shware_systems Note Added: 0004349  
2019-03-29 20:25 shware_systems Note Edited: 0004349 
2019-03-30 00:48 kreNote Added: 0004350  
2019-03-30 01:19 Don Cragun Page Number  unknown because the
printed spec is not publicly available => 1392
2019-03-30 01:19 Don Cragun Line Number  unknown because the
printed spec is not publicly available => 46271-46272, 46280-46282
2019-03-30 01:19 Don Cragun Interp Status => --- 
2019-06-24 16:01 Don Cragun Final Accepted Text   => See
http://austingroupbugs.net/view.php?id=1243#c4347.
2019-06-24 16:01 Don Cragun Status   New => Resolved 
2019-06-24 16:01 Don Cragun Resolution   Open => Accepted As
Marked
2019-06-24 16:01 Don Cragun Tag Attached: tc3-2008   
2019-06-24 16:15 eblake Note Added: 0004451  
2019-06-27 15:59 eblake Note Added: 0004455  
==




[1003.1(2016)/Issue7+TC2 0001246]: environ missing.

2019-06-27 Thread Austin Group Bug Tracker


The following issue has been CLOSED. 
== 
http://austingroupbugs.net/view.php?id=1246 
== 
Reported By:dannyniu
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1246
Category:   Base Definitions and Headers
Type:   Omission
Severity:   Objection
Priority:   normal
Status: Closed
Name:   DannyNiu/NJF 
Organization:   Individual 
User Reference: Online Pub 
Section:unistd.h 
Page Number:434 
Line Number:14760 
Interp Status:  --- 
Final Accepted Text: 
Resolution: Duplicate
Duplicate:  0
Fixed in Version:   
== 
Date Submitted: 2019-04-16 05:48 UTC
Last Modified:  2019-06-27 16:07 UTC
== 
Summary:environ missing.
==
Relationships   ID  Summary
--
duplicate of386 environ should be declared in unist...
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-04-16 05:48 dannyniu   New Issue
2019-04-16 05:48 dannyniu   Name  => DannyNiu/NJF
2019-04-16 05:48 dannyniu   Organization  => Individual  
2019-04-16 05:48 dannyniu   User Reference=> Online Pub  
2019-04-16 05:48 dannyniu   Section   => unistd.h
2019-04-16 05:48 dannyniu   Page Number   => 434 
2019-04-16 05:48 dannyniu   Line Number   => 14760   
2019-04-16 06:22 kreNote Added: 0004363  
2019-04-16 07:38 geoffclare Note Added: 0004364  
2019-04-16 07:38 geoffclare Relationship added   duplicate of 386
2019-06-27 16:07 geoffclare Interp Status => --- 
2019-06-27 16:07 geoffclare Status   New => Closed   
2019-06-27 16:07 geoffclare Resolution   Open => Duplicate   
==




[1003.1(2016)/Issue7+TC2 0001243]: newlocale(3) wording unintentionally permits ignoring the "base" argument

2019-06-27 Thread Austin Group Bug Tracker


The following issue has been UPDATED. 
== 
http://austingroupbugs.net/view.php?id=1243 
== 
Reported By:schwarze
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1243
Category:   System Interfaces
Type:   Clarification Requested
Severity:   Comment
Priority:   normal
Status: Resolved
Name:   Ingo Schwarze 
Organization:   OpenBSD 
User Reference:  
Section:newlocale 
Page Number:1392 
Line Number:46271-46272, 46280-46282 
Interp Status:  --- 
Final Accepted Text:See
http://austingroupbugs.net/view.php?id=1243#c4455. 
Resolution: Accepted As Marked
Fixed in Version:   
== 
Date Submitted: 2019-03-29 13:00 UTC
Last Modified:  2019-06-27 16:04 UTC
== 
Summary:newlocale(3) wording unintentionally permits
ignoring the "base" argument
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-03-29 13:00 schwarze   New Issue
2019-03-29 13:00 schwarze   Name  => Ingo Schwarze   
2019-03-29 13:00 schwarze   Organization  => OpenBSD 
2019-03-29 13:00 schwarze   Section   => newlocale   
2019-03-29 13:00 schwarze   Page Number   => unknown because the
printed spec is not publicly available
2019-03-29 13:00 schwarze   Line Number   => unknown because the
printed spec is not publicly available
2019-03-29 14:12 kreNote Added: 0004347  
2019-03-29 14:32 schwarze   Note Added: 0004348  
2019-03-29 20:14 shware_systems Note Added: 0004349  
2019-03-29 20:25 shware_systems Note Edited: 0004349 
2019-03-30 00:48 kreNote Added: 0004350  
2019-03-30 01:19 Don Cragun Page Number  unknown because the
printed spec is not publicly available => 1392
2019-03-30 01:19 Don Cragun Line Number  unknown because the
printed spec is not publicly available => 46271-46272, 46280-46282
2019-03-30 01:19 Don Cragun Interp Status => --- 
2019-06-24 16:01 Don Cragun Final Accepted Text   => See
http://austingroupbugs.net/view.php?id=1243#c4347.
2019-06-24 16:01 Don Cragun Status   New => Resolved 
2019-06-24 16:01 Don Cragun Resolution   Open => Accepted As
Marked
2019-06-24 16:01 Don Cragun Tag Attached: tc3-2008   
2019-06-24 16:15 eblake Note Added: 0004451  
2019-06-27 15:59 eblake Note Added: 0004455  
2019-06-27 16:00 eblake Note Edited: 0004455 
2019-06-27 16:02 eblake Note Edited: 0004455 
2019-06-27 16:04 eblake Final Accepted Text  See
http://austingroupbugs.net/view.php?id=1243#c4347. => See
http://austingroupbugs.net/view.php?id=1243#c4455.
==




[1003.1(2016)/Issue7+TC2 0001245]: getpgrp RATIONALE refers to getpgid as "XSI extension"

2019-06-27 Thread Austin Group Bug Tracker


The following issue has been RESOLVED. 
== 
http://austingroupbugs.net/view.php?id=1245 
== 
Reported By:dennisw
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1245
Category:   System Interfaces
Type:   Error
Severity:   Editorial
Priority:   normal
Status: Resolved
Name:   Dennis Wölfing 
Organization:
User Reference:  
Section:getpgrp 
Page Number:1069 
Line Number:36344-36347 
Interp Status:  --- 
Final Accepted Text: 
Resolution: Accepted
Fixed in Version:   
== 
Date Submitted: 2019-04-04 14:52 UTC
Last Modified:  2019-06-27 15:49 UTC
== 
Summary:getpgrp RATIONALE refers to getpgid as "XSI
extension"
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-04-04 14:52 denniswNew Issue
2019-04-04 14:52 denniswName  => Dennis Wölfing 
2019-04-04 14:52 denniswSection   => getpgrp 
2019-04-04 14:52 denniswPage Number   => 1069
2019-04-04 14:52 denniswLine Number   => 36344-36347 
2019-06-27 15:49 Don Cragun Interp Status => --- 
2019-06-27 15:49 Don Cragun Status   New => Resolved 
2019-06-27 15:49 Don Cragun Resolution   Open => Accepted
==




[1003.1(2016)/Issue7+TC2 0001244]: ident string lifetime not specified

2019-06-27 Thread Austin Group Bug Tracker


The following issue NEEDS AN INTERPRETATION. 
== 
http://austingroupbugs.net/view.php?id=1244 
== 
Reported By:wahern
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1244
Category:   System Interfaces
Type:   Clarification Requested
Severity:   Comment
Priority:   normal
Status: Interpretation Required
Name:   William Ahern 
Organization:
User Reference:  
Section:openlog 
Page Number:695 
Line Number:23775-23796 
Interp Status:  Pending 
Final Accepted Text:http://austingroupbugs.net/view.php?id=1244#c4454 
== 
Date Submitted: 2019-04-03 22:02 UTC
Last Modified:  2019-06-27 15:45 UTC
== 
Summary:ident string lifetime not specified
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-04-03 22:02 wahern New Issue
2019-04-03 22:02 wahern Name  => William Ahern   
2019-04-03 22:02 wahern Section   => openlog 
2019-04-03 22:02 wahern Page Number   => (page or range of
pages)
2019-04-03 22:02 wahern Line Number   => (Line or range of
lines)
2019-06-24 16:18 Don Cragun Page Number  (page or range of
pages) => 695
2019-06-24 16:18 Don Cragun Line Number  (Line or range of
lines) => 23775-23796
2019-06-24 16:18 Don Cragun Interp Status => --- 
2019-06-27 15:44 geoffclare Note Added: 0004454  
2019-06-27 15:45 geoffclare Interp Status--- => Pending  
2019-06-27 15:45 geoffclare Final Accepted Text   =>
http://austingroupbugs.net/view.php?id=1244#c4454
2019-06-27 15:45 geoffclare Status   New => Interpretation
Required
2019-06-27 15:45 geoffclare Resolution   Open => Accepted As
Marked
==




[1003.1(2016)/Issue7+TC2 0001244]: ident string lifetime not specified

2019-06-27 Thread Austin Group Bug Tracker


A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=1244 
== 
Reported By:wahern
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1244
Category:   System Interfaces
Type:   Clarification Requested
Severity:   Comment
Priority:   normal
Status: New
Name:   William Ahern 
Organization:
User Reference:  
Section:openlog 
Page Number:695 
Line Number:23775-23796 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2019-04-03 22:02 UTC
Last Modified:  2019-06-27 15:44 UTC
== 
Summary:ident string lifetime not specified
== 

-- 
 (0004454) geoffclare (manager) - 2019-06-27 15:44
 http://austingroupbugs.net/view.php?id=1244#c4454 
-- 
Interpretation response

The standard is unclear on this issue, and no conformance distinction can
be made between alternative implementations based on this. This is being
referred to the sponsor.

Rationale:
-
None.

Notes to the Editor (not part of this interpretation):
---

On page 695 line 23775 section closelog(), change:The
ident argument is a string that is prepended to every
message.to:The ident argument is a pointer
to a null-terminated identifier that shall be prepended (without the null
terminator) to every message. The application shall ensure that the string
pointed to by ident remains valid during the syslog() calls
that will prepend this identifier; however, it is unspecified whether
changes made to the string will change the identifier prepended by later
syslog() calls. 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-04-03 22:02 wahern New Issue
2019-04-03 22:02 wahern Name  => William Ahern   
2019-04-03 22:02 wahern Section   => openlog 
2019-04-03 22:02 wahern Page Number   => (page or range of
pages)
2019-04-03 22:02 wahern Line Number   => (Line or range of
lines)
2019-06-24 16:18 Don Cragun Page Number  (page or range of
pages) => 695
2019-06-24 16:18 Don Cragun Line Number  (Line or range of
lines) => 23775-23796
2019-06-24 16:18 Don Cragun Interp Status => --- 
2019-06-27 15:44 geoffclare Note Added: 0004454  
==




Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Chet Ramey
On 6/27/19 6:51 AM, Geoff Clare wrote:

>>> a='\**'
>>> printf '%s\n' $a
>>>
>>> is a portable script that is meant to list the filenames that
>>> start with "*" in the current directory
>>
>> See 1), there is just one shell that behaves this way.
> 
> And that shell is "bash" (not just "bash5").  All versions I tried do
> it (including bash3 on macOS).

This behavior has been in the bash pattern matcher since the pre-1.0
releases. The oldest version I have built is bash-2.05b, but the code
is there in previous versions.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Chet Ramey
On 6/27/19 2:15 AM, Stephane Chazelas wrote:

> I could be convinced that it makes sense for the ksh93 X(...)
> operators to be allowed if there was one non-anecdotal
> implementation of fnmatch() that implemented it, but I don't
> think there it. 

All glibc versions going back a number of years implement ksh and bash
extended matching patterns with FNM_EXTMATCH.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Geoff Clare
Joerg Schilling  wrote, on 27 Jun 2019:
>
> Geoff Clare  wrote:
> 
> > > > 2.
> > > >
> > > > a='\**'
> > > > printf '%s\n' $a
> > > >
> > > > is a portable script that is meant to list the filenames that
> > > > start with "*" in the current directory
> > > 
> > > See 1), there is just one shell that behaves this way.
> >
> > And that shell is "bash" (not just "bash5").  All versions I tried do
> > it (including bash3 on macOS).
> 
> OK, maybe you have something different in mind. Do you talk about this:
> 
> If there are the files "*abc.c" and "\abc.c" and you run the above command,
> then bash3 prints "*abc.c" while Bourne Shell ksh88 and ksh93 print "\abc.c".

Yes.

> This seems to be a result of the fact that the macro expansion doubles the 
> backslash before it is used for globbing and where quote removal is applied 
> after globbing.

Irrelevant internal detail. All that matters is that the result is
what POSIX requires.

> > This is simply not true in the case of POSIX.2-1992, and I have
> > corrected you on that before.  POSIX.2-1992 deliberately made a number
> > requirements that forced implementations to change, including some
> > that were invention (an obvious one being pax).
> 
> But pax is rarely used in contrary to tar and cannot be called a success 
> story.

I don't believe it's true that pax is rarely used, but in any case that's
not relevant to the point I was making, which is that you were wrong to
imply that POSIX.2-1992 did not invent anything.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane CHAZELAS
2019-06-27 14:04:18 +0200, Joerg Schilling:
[...]
> > And kresh (netbsd 8.1) and zsh in sh mode. In zshsh, that's
> 
> I cannot check "kresh" as it does not compile on UNIX.

Note that you can install NetBSD in a VM in a few minutes. I
just did that a few days ago to test that shell's behaviour.
You'd need to do something similar if you wanted to test Solaris
/usr/xpg4/bin/sh whose source code is not even available.

Whether it compiles on UNIX, whatever UNIX means is irrelevant,
the POSIX utilities don't have to be compiled let alone be
written in C let alone written in C and its source use the POSIX
API.

> > because \ is before a glob operator. And for all 3, there is
> > also another unquoted and unescaped * operator. Where zshsh
> > differs from the other 2 would be in:
> 
> With zsh, I get
> 
> \** 
> 
> for a directory that includes the files "\*abc.c" and "\abc.c".
> This does not seem to be correct.
> 
> If you talk about:
> 
>   ZSH_EMULATION=sh /usr/bin/zsh
> 
> when writing "zshsh", then this indeed prints *abc.c
[...]

Yes, I'm talking of zsh in sh emulation, I beleive I made that
clear in the email you're replying to.

When not in sh emulation, zsh doesn't do globbing nor word
splitting upon parameter expansion like most newer non-POSIX
shells (like rc, es, fish) as that's arguably a much better
design.

So even

var='*'
echo $var

would output * like in rc/es/fish. And of course:

var=(*)
echo $var

would list all the files in the current directory like
rc/es/fish ("set var *" in fish, and with variation in behaviour
between all when the glob doesn't match any file)

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Joerg Schilling
Stephane Chazelas  wrote:

> 2019-06-27 11:51:11 +0100, Geoff Clare:
> > Joerg Schilling  wrote, on 27 Jun 2019:
> [...]
> > > > 2.
> > > >
> > > > a='\**'
> > > > printf '%s\n' $a
> > > >
> > > > is a portable script that is meant to list the filenames that
> > > > start with "*" in the current directory
> > > 
> > > See 1), there is just one shell that behaves this way.
> > 
> > And that shell is "bash" (not just "bash5").  All versions I tried do
> > it (including bash3 on macOS).
>
> And kresh (netbsd 8.1) and zsh in sh mode. In zshsh, that's

I cannot check "kresh" as it does not compile on UNIX.

> because \ is before a glob operator. And for all 3, there is
> also another unquoted and unescaped * operator. Where zshsh
> differs from the other 2 would be in:

With zsh, I get

\** 

for a directory that includes the files "\*abc.c" and "\abc.c".
This does not seem to be correct.

If you talk about:

ZSH_EMULATION=sh /usr/bin/zsh

when writing "zshsh", then this indeed prints *abc.c

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Joerg Schilling
Geoff Clare  wrote:

> > > 2.
> > >
> > > a='\**'
> > > printf '%s\n' $a
> > >
> > > is a portable script that is meant to list the filenames that
> > > start with "*" in the current directory
> > 
> > See 1), there is just one shell that behaves this way.
>
> And that shell is "bash" (not just "bash5").  All versions I tried do
> it (including bash3 on macOS).

OK, maybe you have something different in mind. Do you talk about this:

If there are the files "*abc.c" and "\abc.c" and you run the above command,
then bash3 prints "*abc.c" while Bourne Shell ksh88 and ksh93 print "\abc.c".

This seems to be a result of the fact that the macro expansion doubles the 
backslash before it is used for globbing and where quote removal is applied 
after globbing.

The question here is whether POSIX should make a complex exception just in 
order to cause a specific result.

> This is simply not true in the case of POSIX.2-1992, and I have
> corrected you on that before.  POSIX.2-1992 deliberately made a number
> requirements that forced implementations to change, including some
> that were invention (an obvious one being pax).

But pax is rarely used in contrary to tar and cannot be called a success story.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane CHAZELAS
2019-06-27 12:27:55 +0200, Joerg Schilling:
[...]
> > 4 is portable in practice. 5 as well but only because of the
> > buggy fallback string comparison in ksh93.
> 
> So you wrote this because the shell that makes @ special also
> has the fallback?
[...]


Well, it may be tempting to suspect that ksh93 does the fallback
there for backward compatibility

So that 

a='@(foo)'; case $a in $a) echo yes; esac

outputs yes like it did in the Bourne shell or ksh88 which
didn't have or didn't enable that extended operator in that
case, but we know that fallback behaviour comes from Bourne
shell originally and predates ksh88.

same problem with

a='[a]'
case $a in $a) echo yes; esac

outputting yes in those cases.

But yes, I would say it's noteworthy to point-out that it's that
ksh fallback behaviour that has the side effect of making that
code more portable, if only so people don't get the wrong
impression that ksh93 disables that @(...) processing in that
case.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane Chazelas
2019-06-27 11:51:11 +0100, Geoff Clare:
> Joerg Schilling  wrote, on 27 Jun 2019:
[...]
> > > 2.
> > >
> > > a='\**'
> > > printf '%s\n' $a
> > >
> > > is a portable script that is meant to list the filenames that
> > > start with "*" in the current directory
> > 
> > See 1), there is just one shell that behaves this way.
> 
> And that shell is "bash" (not just "bash5").  All versions I tried do
> it (including bash3 on macOS).

And kresh (netbsd 8.1) and zsh in sh mode. In zshsh, that's
because \ is before a glob operator. And for all 3, there is
also another unquoted and unescaped * operator. Where zshsh
differs from the other 2 would be in:

a='\d*'
printf '%s\n' $a

Which in zshsh lists the filenames that start with \d and in
bash/kresh the filenames that start with d.

And again, where kresh differs from bash/zshsh would be in

a='\*/*'
printf '%s\n' $a

none of the 3 do globbing in:

a='\*'
printf '%s\n' $a

which is different from all other shells

While only those 3 (and Harald's shell, but I don't know that
Harald's shell is shipped with any system yet) do that second
level of backslash processing upon globbing, there are more
which do it for other cases of pattern matching (ksh93, dash,
busybox sh).

And only in bash5 is \ enough to trigger globbing (the "1" case)
(which at the moment can be seen as a bug (regression) as it's
not documented and so can easily be reverted).

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Geoff Clare
Joerg Schilling  wrote, on 27 Jun 2019:
>
> Stephane Chazelas  wrote:
> 
> > Today, by your reading of the spec and I agree it can be seen as
> > a valid reading, the spec is telling me that:
> >
> > 1.
> >
> > a='\.'
> > printf '%s\n' $a
> >
> > is a portable script that is meant to output "."
> 
> I know just one single shell that outputs "." with this code.
> 
> This is bash5. Note that POSIX is a portable source standard and other shells
> that may behave like bash5 currently only compile and work on a single 
> platform.
> 
> My impression is that this is mainly supported by Geoff

That was my initial position, but we have moved on since then.  I am
willing to accept the compromise currently being discussed whereby
pathname expansion only happens when there is an unquoted '*', '?'
or '[' in the pattern, in which case the above would be required
to output '\.'  I updated the proposal in the etherpad accordingly.

> > 2.
> >
> > a='\**'
> > printf '%s\n' $a
> >
> > is a portable script that is meant to list the filenames that
> > start with "*" in the current directory
> 
> See 1), there is just one shell that behaves this way.

And that shell is "bash" (not just "bash5").  All versions I tried do
it (including bash3 on macOS).

> > 1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest
> > away from the truth. Only bash5 exhibits that behaviour and it's
> > evident it's a bad idea. It's evident that it was not the
> > intention of the spec as no shell at the time it was written did
> 
> This is very important, as POSIX does not claim to do own invention.

This is simply not true in the case of POSIX.2-1992, and I have
corrected you on that before.  POSIX.2-1992 deliberately made a number
requirements that forced implementations to change, including some
that were invention (an obvious one being pax).

> > 2 is slightly more portable, but even in those shells where it
> > does that, that's not because they implement \ processing the
> > way POSIX seems to specify it, and all do it a different way.
> > I'm not opposing POSIX *allows* a \ in an unquoted word
> > expansion to have a special meaning when it's preceding *, ? and
> > [ as that's what several implementations do and it's not
> > breaking that many common shell usages.
> 
> I see no real difference to 1). The only portable shell that behaves this way 
> is bash5.

No, all versions of bash back to at least 3.2 behave that way.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Joerg Schilling
Stephane Chazelas  wrote:

Hi,

thank you for starting a new discussion that is based on analysing the overall 
results of the "proposed new behavior".

> Today, by your reading of the spec and I agree it can be seen as
> a valid reading, the spec is telling me that:
>
> 1.
>
> a='\.'
> printf '%s\n' $a
>
> is a portable script that is meant to output "."

I know just one single shell that outputs "." with this code.

This is bash5. Note that POSIX is a portable source standard and other shells
that may behave like bash5 currently only compile and work on a single platform.

My impression is that this is mainly supported by Geoff but does not have a 
wider group of supporters. I would currently say that the related wording 
slipped into the POSIX standard and could be seen as a bug.

> 2.
>
> a='\**'
> printf '%s\n' $a
>
> is a portable script that is meant to list the filenames that
> start with "*" in the current directory

See 1), there is just one shell that behaves this way.


> 1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest
> away from the truth. Only bash5 exhibits that behaviour and it's
> evident it's a bad idea. It's evident that it was not the
> intention of the spec as no shell at the time it was written did

This is very important, as POSIX does not claim to do own invention.

> it. Even if POSIX made it very explicit that 1 is required to
> behave as described above, I could probably not call it a
> portable script in a million year, as I'd expect shell
> implementations would rather keep their backward
> compatibility than implement that unreasonable requirement
> (which IMO doesn't help at all with consistency). So the spec is
> wrong and needs to be fixed.

I support that.

> 2 is slightly more portable, but even in those shells where it
> does that, that's not because they implement \ processing the
> way POSIX seems to specify it, and all do it a different way.
> I'm not opposing POSIX *allows* a \ in an unquoted word
> expansion to have a special meaning when it's preceding *, ? and
> [ as that's what several implementations do and it's not
> breaking that many common shell usages.

I see no real difference to 1). The only portable shell that behaves this way 
is bash5.

Do you see a major difference because in 2) the backslash is before a glob 
character while it is before an ordinary character in 1)?

> 4 is portable in practice. 5 as well but only because of the
> buggy fallback string comparison in ksh93.

So you wrote this because the shell that makes @ special also has the fallback?

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Geoff Clare
Stephane Chazelas  wrote, on 26 Jun 2019:
>
> Or again, forget all about it and treat the ksh93 behaviour as
> non-compliant as is already the case.

I'm starting to think that this is what we should do, given the number
of oddities you have identified and the potential to break existing
applications that use parentheses in find -name, fnmatch(), etc.

The primary aim (of those of us discussing the issue in teleconferences)
in resolving bug 1234 is consistency.  I was hoping that we could bring
some consistency between contexts where *(...) etc. are syntax errors in
POSIX and those where they aren't by limiting which cases can be
considered special.  But that doesn't look workable now.

So here's a new proposal which just clarifies that *(...) etc. can
only be special when they would otherwise be a syntax error.

On page 2382 line 76216 section 2.13.1 change:

An ordinary character is a pattern that shall match itself. It
can be any character in the supported character set except for
NUL, those special shell characters in [xref to 2.2] that require
quoting, and the following three special pattern characters.
Matching shall be based on the bit pattern used for encoding the
character, not on the graphic representation of the character. If
any character (ordinary, shell special, or pattern special) is
quoted, that pattern shall match the character itself. The shell
special characters always require quoting.

to:

An ordinary character is a pattern that shall match itself. Where
characters within the pattern are affected by shell quoting, an
ordinary character can be any character in the supported character
set except for NUL, those special shell characters in [xref to 2.2]
that require quoting, and the three special pattern characters
described below. Where characters within the pattern are not
affected by shell quoting, an ordinary character can be any character
in the supported character set except for NUL and the three special
pattern characters described below. Matching shall be based on the
bit pattern used for encoding the character, not on the graphic
representation of the character. If any character (ordinary, shell
special, or pattern special) is quoted, using either shell quoting
or (where shell quoting is not in effect) a  escape, that
pattern shall match the character itself. The application shall
ensure that it quotes any character that would otherwise be treated
as special, in order for it to be matched as an ordinary character.

On page 3748 line 128698 section C.2.13.1 change:

Conforming applications are required to quote or escape the shell
special characters (sometimes called metacharacters). If used
without this protection, syntax errors can result or implementation
extensions can be triggered. For example, the KornShell supports a
series of extensions based on parentheses in patterns.

to:

Where characters within a pattern are affected by shell quoting,
conforming applications are required to quote the shell special
characters (sometimes called metacharacters). If used without this
protection, syntax errors can result or implementation extensions
can be triggered.  Some shells support a series of extensions based
on parentheses in patterns that are valid extensions in this
context because they would otherwise cause syntax errors.  However,
this means that they are not allowed by this standard to be
recognized in contexts where those syntax errors would not occur
anyway, such as in:

pattern='a*(b)'; ls -- $pattern

which this standard requires to list files with names beginning
'a' and ending "(b)".  It is recommended that implementations do
not extend pattern matching in the shell in ways that are only
valid extensions because they would otherwise be syntax errors, in
order to avoid inconsistency between different pattern matching
contexts.  One way to provide an extension that is consistent
between different pattern matching contexts in the shell (although
still not consistent with find -name, fnmatch(), etc.) is to enable
the extension only when a non-standard shell option is set, or
when the shell is executed using a command name other than sh.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane Chazelas
2019-06-27 08:59:29 +0100, Harald van Dijk:
[...]
> > 2 is slightly more portable, but even in those shells where it
> > does that, that's not because they implement \ processing the
> > way POSIX seems to specify it, and all do it a different way.
> > I'm not opposing POSIX *allows* a \ in an unquoted word
> > expansion to have a special meaning when it's preceding *, ? and
> > [ as that's what several implementations do and it's not
> > breaking that many common shell usages.
> 
> It should not be limited to when it's preceding any specific character,
> though. That is something no shell has done. Shells currently vary in
> whether backslash can function as an escape character during pattern
> matching, but when it can, it does not depend on which character follows it.

That's what zsh does (and did before POSIX). That's the
intention, but as mentioned earlier it's quite buggy
(https://www.zsh.org/mla/workers/2019/msg00465.html). And in
ksh93, again \d is not an escaped d but matches a digit. But I
agree we need to allow \ to be treated specially when in front
of a non-wildcard as that's what several implementations do.

But only when pattern matching is involved. That includes
pathname expansion, but pathname expansion should only be
performed when a words contains unquoted ?, [ or * (not "(" as
even ksh93 doesn't do it).

Also note that in netbsd8.1 sh, as already pointed out:

In:

var1='\foo/bar*'
ls -d -- $var1
var2='\foo-bar*'
ls -d -- $var2

\ is only considered an escape operator in the var2 case as the
var1 case splits the word on / and the first part doesn't
contain an unquoted [, ?, *.

That would still be allowed if we made it unspecified what \x
does when a word contains an unquoted */?/[.

That also applies to:

var='\*'
ls -d -- $var

[...]
> If there is no fnmatch() implementation that behaves that way, then agreed
> that it makes sense to just specify that. That pax example in the rationale
> should then also be changed to not escape any parenthesis.
> 
> What did this pax example come from, though? Was that based on a real pax
> implementation that did have special treatment of parentheses, not just an
> invention?
[...]

That's something I also wondered.

I do have a vague recollection that some early glob()
implementations were actually calling "sh" to expand globs (so
for instance glob("*`reboot`*") would reboot (which is currently
allowed by the spec as ` is a shell special character). Could it
be linked to that? I wouldn't expect it to apply to fnmatch()
though.

Or maybe I'm confusing with perl globs that used to call csh.

-- 
Stephane



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Harald van Dijk

On 27/06/2019 07:15, Stephane Chazelas wrote:

2019-06-26 23:56:06 +0100, Harald van Dijk:
[...]

You are proposing a fundamental change to the design of pattern matching,
not a clarification as you previously called it, and you are now discussing
how to allow the behaviour of one specific shell that does not behave the
way you like, but not the other shells that also do not behave the way you
like, when those other shells were not only changed intentionally to get
more consistent behaviour, at least in my case as the result of a user
request, but also because that more consistent behaviour is required by the
current version of POSIX, solely because of theoretical problems with file
names specifically crafted to break scripts, file names that are not
actually used in the wild.

[...]

I'm not a shell implementer. I'm on the side of the application
writer, I want to be able to write portable shell scripts, and
POSIX (*Portable* Operating System *Interface*) is meant to work
for me. It's meant to tell me what I can and cannot write in my
script and the behaviour to expect. It's meant to help you the
implementer write your shell so that it can interpret my
portable script the way it's meant to.


Oh, I agree that there is a bug. Given that most shells do not behave 
the way POSIX specifies, POSIX should not be requiring that behaviour. 
However, if you wait until after some shells have already implemented 
what is specified, it's too late then to just change the rules to forbid 
it. Your logic works both ways: now those shells have to be taken into 
account. It is not reasonable for POSIX to say that uses are portable 
that in fact are not, or no longer are.


But in fact although the wording you talked about so far did not include 
it, you did raise that point already in your 26/06/2019 14:39 +01:00 
message:



So the only characters that need quoted (or put inside [...]
when the pattern is in the result of some word expansion --
remember that you need to move tha backslash processing out of
the shell pattern matching as its a fnmatch()/glob() thing only)
are ?, [, * and also \ to accomodate shells that have implemented
some form or another of special processing of \ independently of
quoting and (, and ) to accomodate ksh93 (in pattern matching
only, those are not a problem in pathname expansion).


Sorry for missing it the first time.


Today, by your reading of the spec and I agree it can be seen as
a valid reading, the spec is telling me that:

1.

a='\.'
printf '%s\n' $a

is a portable script that is meant to output "."

2.

a='\**'
printf '%s\n' $a

is a portable script that is meant to list the filenames that
start with "*" in the current directory

3.

pattern='*;*'
case $var in ($pattern) echo yes; esac

is a non-standard, non-portable script with unspecified
behaviour because shell implementations are free to use that ";"
as an extended glob operator.

4.

string='@(foo)'
echo $string

is a non-standard, non-portable script which is not guaranteed
to output @(foo).

5.

string='@(foo)'
case $string in $string) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".

6.

pattern='@(*)'
case "@(foo)" in $pattern) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".


Agreed with all of these that that is what I believe POSIX currently 
specifies.



1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest
away from the truth. Only bash5 exhibits that behaviour and it's
evident it's a bad idea.


If you accept that unquoted backslash behaves that way in some shells, 
even before bash 5, then changing the shell to always treat unquoted 
backslash the same way makes the shell behaviour easier to understand. I 
consider it an improvement over backslash's meaning changing in ways 
that were hard to predict.



 It's evident that it was not the
intention of the spec as no shell at the time it was written did
it. Even if POSIX made it very explicit that 1 is required to
behave as described above, I could probably not call it a
portable script in a million year, as I'd expect shell
implementations would rather keep their backward
compatibility than implement that unreasonable requirement
(which IMO doesn't help at all with consistency). So the spec is
wrong and needs to be fixed.


Yes, to document current practice, the spec should effectively say in 
some way that whether and if so to what extent backslash can act as an 
escape character (in addition to a quote character) in shells is 
unspecified.



2 is slightly more portable, but even in those shells where it
does that, that's not because they implement \ processing the
way POSIX seems to specify it, and all do it a different way.
I'm not opposing POSIX *allows* a \ in an unquoted word
expansion to have a special meaning when it's preceding *, ? and
[ as that's what several implementations do and it's not
breaking that many common shell 

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-27 Thread Stephane Chazelas
2019-06-26 23:56:06 +0100, Harald van Dijk:
[...]
> You are proposing a fundamental change to the design of pattern matching,
> not a clarification as you previously called it, and you are now discussing
> how to allow the behaviour of one specific shell that does not behave the
> way you like, but not the other shells that also do not behave the way you
> like, when those other shells were not only changed intentionally to get
> more consistent behaviour, at least in my case as the result of a user
> request, but also because that more consistent behaviour is required by the
> current version of POSIX, solely because of theoretical problems with file
> names specifically crafted to break scripts, file names that are not
> actually used in the wild.
[...]

I'm not a shell implementer. I'm on the side of the application
writer, I want to be able to write portable shell scripts, and
POSIX (*Portable* Operating System *Interface*) is meant to work
for me. It's meant to tell me what I can and cannot write in my
script and the behaviour to expect. It's meant to help you the
implementer write your shell so that it can interpret my
portable script the way it's meant to.

Today, by your reading of the spec and I agree it can be seen as
a valid reading, the spec is telling me that:

1.

a='\.'
printf '%s\n' $a

is a portable script that is meant to output "."

2.

a='\**'
printf '%s\n' $a

is a portable script that is meant to list the filenames that
start with "*" in the current directory

3.

pattern='*;*'
case $var in ($pattern) echo yes; esac

is a non-standard, non-portable script with unspecified
behaviour because shell implementations are free to use that ";"
as an extended glob operator.

4.

string='@(foo)'
echo $string

is a non-standard, non-portable script which is not guaranteed
to output @(foo).

5.

string='@(foo)'
case $string in $string) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".

6.

pattern='@(*)'
case "@(foo)" in $pattern) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".


1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest
away from the truth. Only bash5 exhibits that behaviour and it's
evident it's a bad idea. It's evident that it was not the
intention of the spec as no shell at the time it was written did
it. Even if POSIX made it very explicit that 1 is required to
behave as described above, I could probably not call it a
portable script in a million year, as I'd expect shell
implementations would rather keep their backward
compatibility than implement that unreasonable requirement
(which IMO doesn't help at all with consistency). So the spec is
wrong and needs to be fixed.

2 is slightly more portable, but even in those shells where it
does that, that's not because they implement \ processing the
way POSIX seems to specify it, and all do it a different way.
I'm not opposing POSIX *allows* a \ in an unquoted word
expansion to have a special meaning when it's preceding *, ? and
[ as that's what several implementations do and it's not
breaking that many common shell usages.

3 is portable in practice. And I should be able to rely on it.
I'd rather POSIX doesn't open the door for a shell (or
fnmatch()...) to choose ; to be a new glob operator, I would
rather the sh glob operators stay ?, [] and * (and \ now added
because of those shells that treat it specially), so I know
which to escape (with quoting (or \ in fnmatch()) or [...] when
in word expansions) or to look out for. Several shells have some
of those operators but they are not enabled in posix/sh mode so
they interpret sh scripts like sh is meant to.

4 is portable in practice. 5 as well but only because of the
buggy fallback string comparison in ksh93.

6 is the only one that is true. Yes, there is *one* shell (a
shell generally considered "experimental" and not in wide use)
where that won't work as expected (won't output yes) as that's
one case where ksh93's extended glob operator is conflicting
with sh compatibility. It's not consistent with 4 there. Geoff's
proposing to fix that inconsistency to allow that operator to be
used for pathname expansion, but I believe it would be more
reasonable to fix it by not allowing it for "case" (make 6 a
portable script again) to make the standard consistent and
clear. Then ksh93 could enable those extended operators wherever
it likes when called as ksh, but not when called as sh (at least
not in the result of word expansions; basically reverting to
ksh88 behaviour).

I could be convinced that it makes sense for the ksh93 X(...)
operators to be allowed if there was one non-anecdotal
implementation of fnmatch() that implemented it, but I don't
think there it. find implementations usually have a -regex
predicate to do things that basic globs can't do instead.

I also like the idea of opening up a way for shell wildcards to
be extended in the future, but it's a dangerous business. Today
in