Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-30 Thread Robert Elz
Date:Thu, 27 Jun 2019 12:27:55 +0200
From:Joerg Schilling 
Message-ID:  <5d149a2b.tueush4pd3wqoutl%joerg.schill...@fokus.fraunhofer.de>

  | Note that POSIX is a portable source standard and other shells that may
  | behave like bash5 currently only compile and work on a single platform.

I haven't been paying much attention here recently (other stuff to do)
and have a lot of unread mail on this list.

But I wanted to address this point in particular when I saw it (it also
came up in some other message I saw in passing) - and apologies one and all
if others have also said what I am about to say, and I just haven't read
your messages yet.

I shall eventually (if needed) return to substantial points of the actual
issues being discussed in this thread sometime later.

First, you're absolutely right that the NetBSD sh isn't (currently)
portable to other systems - though the issue is mostly its build environment,
rather than its code (but yes,  is a nuisance).   Fixing
that is somewhere on my list of things to do one day, but it is not
nearly as high a priority as making the shell work correctly (for my,
and the NetBSD developers' and users' definition of correctly) and
then more efficiently.   [Aside: people I know have managed to build
it on other systems, it is not an impossible task - though it is certainly
not trivial either.]

But all that is 100% irrelevant to anything here - POSIX is so that
applications can be portable, not necessarily in order to make the
systems that implement it portable, or even available at all.   In fact,
when POSIX (and or the SUS before it) were initially written, the shell
which was mostly used as the basis for the XCU section, was not really
available at all - it was all proprietary sources.

Any of today's POSIX conforming systems can (could) be the same.

The is no requirement, anywhere, that any particular piece of any
of those systems be portable to any other system, or be available for
you to test in any way at all.

To be certified, as I undersand it, the whole system needs to be tested
and verified correct (plus all the documentation, blah blah ...) but I
don't believe that you are any part of that process, nor that any of the
sources for the certified system ever need be made available to anyone.

None of that makes such a system less relevant for determining what the
the actual expected operations and expectations of the shell in the
wild actually are - that is, what the standard should say.

Further, shells that are actively being distributed and used with systems
available now (particularly those that are installed as /bin/sh on the
various different systems) are much more relevant to use in deciding what
is (and should be) the standard than other random code that is used almost
nowhere - whatever its ancient lineage.   And whether you can get at them
to test their operations is 100% irrelevant.

Lastly, if you really want to test the NetBSD shell, that is easy to do - all
you need to do is install NetBSD somewhere - which is not a difficult process,
as while it doesn't always handle all the newest hardware all that well, it
is highly portable, and runs on just about every emulation environment you
can imagine (XEN, Virtualbox, VMware, Qemu, ...) as well as on a large
variety of real (bare metal) hardware of many different architectures.

kre




Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-30 Thread Harald van Dijk

On 28/06/2019 09:38, Geoff Clare wrote:

Harald van Dijk  wrote, on 27 Jun 2019:


On 27/06/2019 10:04, Geoff Clare wrote:

Stephane Chazelas  wrote, on 26 Jun 2019:


Or again, forget all about it and treat the ksh93 behaviour as
non-compliant as is already the case.


I'm starting to think that this is what we should do, given the number
of oddities you have identified and the potential to break existing
applications that use parentheses in find -name, fnmatch(), etc.

The primary aim (of those of us discussing the issue in teleconferences)
in resolving bug 1234 is consistency.  I was hoping that we could bring
some consistency between contexts where *(...) etc. are syntax errors in
POSIX and those where they aren't by limiting which cases can be
considered special.  But that doesn't look workable now.

So here's a new proposal which just clarifies that *(...) etc. can
only be special when they would otherwise be a syntax error.


I'm not objecting, but even if you limit it to this, it's still a change,
not a clarification, no? It came as a surprise to some people, but I do not
see anything ambiguous in the current standard.


It's unclear what, precisely, "The shell special characters always
require quoting" is intended to mean.

In particular, XRAT's explanation of it is "Conforming applications
are required to quote or escape the shell special characters
(sometimes called metacharacters). If used without this protection,
syntax errors can result or implementation extensions can be triggered."
The fact that this mentions syntax errors implies that the statement
in 2.13.1 was intended only to apply to patterns that are used directly
in shell commands.


Syntax errors are not limited to shell syntax errors.

I would think this means

  find . -name '('

is allowed to immediately exit with

  find: error: invalid pattern

POSIX does not even limit the concept of "syntax errors" to errors in 
the syntax, see e.g. the "shift" command:



If the n operand is invalid or is greater than "$#", this may be considered a 
syntax error and a non-interactive shell may exit; [...]


[...]

I think I see a small wording issue:


   [...] If any character (ordinary, shell
special, or pattern special) is quoted, using either shell quoting
or (where shell quoting is not in effect) a  escape, that
pattern shall match the character itself. [...]


You excluded the bits in this proposal that would change the handling of
backslash,


The email you replied to is not the complete proposed resolution of
bug 1234; it is just the parts relating to ksh extended glob patterns.


In that case it is definitely a change, not a clarification.


Less important, under the current wording, backslash escapes the next
character, it does not quote it. The requirements of quoting and escaping
are the same, so perhaps it is okay to change the terminology.


Escaping is a form of quoting.  There are numerous places where the
standard uses "unquoted" to mean that a character is neither quoted
with single- or double-quotes nor escaped with a backslash.


Escaping can be a form of quoting, sure.  2.2.1 Escape Character 
(Backslash) is part of  2.2 Quoting, after all. Not all escaping is 
quoting though. I went over all uses of the word "unquoted" in Shell 
Command Language. Every single one refers to shell quoting, and in the 
few cases where other levels of backslash removal also apply, the 
standard does not refer to that as quoting. See 2.6.3, for old-style 
command substitutions, which have an escape mechanism independent of 
shell quoting:



The search for the matching backquote shall be satisfied by the first unquoted 
non-escaped backquote; [...]


This is written this way to say that

  echo `echo \`echo hello\``
   1  2   34

backticks 1 and 4 match, despite backticks 2 and 3 not being quoted.

Cheers,
Harald van Dijk



Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

2019-06-30 Thread Harald van Dijk

On 28/06/2019 16:05, Joerg Schilling wrote:

Harald van Dijk  wrote:


That aside, I asked you last time you made this claim about POSIX to
back it up. There is no requirement for standard utilities to be
implemented portably. You responded then:


POSIX intends to create portability at source code level.

Code that is not portable does not follow the POSIX way.


That's not a requirement for POSIX implementations, so it's not relevant.


Well, I like to be able to test various shells on the same platform.

This is close to impossible if I need to install a specific OS for every shell.


Agreed that portability is a nice feature to have. It has a cost, and it 
is up to the maintainers to determine whether the feature is worth the 
cost, and if it is, whether it is worth the cost right now. If they 
choose not to focus efforts on portability right now, it is 
understandable that you do not personally test that shell. It's just 
that the conclusion from that should not be "this shell should not be 
considered", it should be "for this shell to be considered, someone else 
will have to provide the details".


Cheers,
Harald van Dijk