Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching
Date:Thu, 27 Jun 2019 12:27:55 +0200 From:Joerg Schilling Message-ID: <5d149a2b.tueush4pd3wqoutl%joerg.schill...@fokus.fraunhofer.de> | Note that POSIX is a portable source standard and other shells that may | behave like bash5 currently only compile and work on a single platform. I haven't been paying much attention here recently (other stuff to do) and have a lot of unread mail on this list. But I wanted to address this point in particular when I saw it (it also came up in some other message I saw in passing) - and apologies one and all if others have also said what I am about to say, and I just haven't read your messages yet. I shall eventually (if needed) return to substantial points of the actual issues being discussed in this thread sometime later. First, you're absolutely right that the NetBSD sh isn't (currently) portable to other systems - though the issue is mostly its build environment, rather than its code (but yes, is a nuisance). Fixing that is somewhere on my list of things to do one day, but it is not nearly as high a priority as making the shell work correctly (for my, and the NetBSD developers' and users' definition of correctly) and then more efficiently. [Aside: people I know have managed to build it on other systems, it is not an impossible task - though it is certainly not trivial either.] But all that is 100% irrelevant to anything here - POSIX is so that applications can be portable, not necessarily in order to make the systems that implement it portable, or even available at all. In fact, when POSIX (and or the SUS before it) were initially written, the shell which was mostly used as the basis for the XCU section, was not really available at all - it was all proprietary sources. Any of today's POSIX conforming systems can (could) be the same. The is no requirement, anywhere, that any particular piece of any of those systems be portable to any other system, or be available for you to test in any way at all. To be certified, as I undersand it, the whole system needs to be tested and verified correct (plus all the documentation, blah blah ...) but I don't believe that you are any part of that process, nor that any of the sources for the certified system ever need be made available to anyone. None of that makes such a system less relevant for determining what the the actual expected operations and expectations of the shell in the wild actually are - that is, what the standard should say. Further, shells that are actively being distributed and used with systems available now (particularly those that are installed as /bin/sh on the various different systems) are much more relevant to use in deciding what is (and should be) the standard than other random code that is used almost nowhere - whatever its ancient lineage. And whether you can get at them to test their operations is 100% irrelevant. Lastly, if you really want to test the NetBSD shell, that is easy to do - all you need to do is install NetBSD somewhere - which is not a difficult process, as while it doesn't always handle all the newest hardware all that well, it is highly portable, and runs on just about every emulation environment you can imagine (XEN, Virtualbox, VMware, Qemu, ...) as well as on a large variety of real (bare metal) hardware of many different architectures. kre
Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching
On 28/06/2019 09:38, Geoff Clare wrote: Harald van Dijk wrote, on 27 Jun 2019: On 27/06/2019 10:04, Geoff Clare wrote: Stephane Chazelas wrote, on 26 Jun 2019: Or again, forget all about it and treat the ksh93 behaviour as non-compliant as is already the case. I'm starting to think that this is what we should do, given the number of oddities you have identified and the potential to break existing applications that use parentheses in find -name, fnmatch(), etc. The primary aim (of those of us discussing the issue in teleconferences) in resolving bug 1234 is consistency. I was hoping that we could bring some consistency between contexts where *(...) etc. are syntax errors in POSIX and those where they aren't by limiting which cases can be considered special. But that doesn't look workable now. So here's a new proposal which just clarifies that *(...) etc. can only be special when they would otherwise be a syntax error. I'm not objecting, but even if you limit it to this, it's still a change, not a clarification, no? It came as a surprise to some people, but I do not see anything ambiguous in the current standard. It's unclear what, precisely, "The shell special characters always require quoting" is intended to mean. In particular, XRAT's explanation of it is "Conforming applications are required to quote or escape the shell special characters (sometimes called metacharacters). If used without this protection, syntax errors can result or implementation extensions can be triggered." The fact that this mentions syntax errors implies that the statement in 2.13.1 was intended only to apply to patterns that are used directly in shell commands. Syntax errors are not limited to shell syntax errors. I would think this means find . -name '(' is allowed to immediately exit with find: error: invalid pattern POSIX does not even limit the concept of "syntax errors" to errors in the syntax, see e.g. the "shift" command: If the n operand is invalid or is greater than "$#", this may be considered a syntax error and a non-interactive shell may exit; [...] [...] I think I see a small wording issue: [...] If any character (ordinary, shell special, or pattern special) is quoted, using either shell quoting or (where shell quoting is not in effect) a escape, that pattern shall match the character itself. [...] You excluded the bits in this proposal that would change the handling of backslash, The email you replied to is not the complete proposed resolution of bug 1234; it is just the parts relating to ksh extended glob patterns. In that case it is definitely a change, not a clarification. Less important, under the current wording, backslash escapes the next character, it does not quote it. The requirements of quoting and escaping are the same, so perhaps it is okay to change the terminology. Escaping is a form of quoting. There are numerous places where the standard uses "unquoted" to mean that a character is neither quoted with single- or double-quotes nor escaped with a backslash. Escaping can be a form of quoting, sure. 2.2.1 Escape Character (Backslash) is part of 2.2 Quoting, after all. Not all escaping is quoting though. I went over all uses of the word "unquoted" in Shell Command Language. Every single one refers to shell quoting, and in the few cases where other levels of backslash removal also apply, the standard does not refer to that as quoting. See 2.6.3, for old-style command substitutions, which have an escape mechanism independent of shell quoting: The search for the matching backquote shall be satisfied by the first unquoted non-escaped backquote; [...] This is written this way to say that echo `echo \`echo hello\`` 1 2 34 backticks 1 and 4 match, despite backticks 2 and 3 not being quoted. Cheers, Harald van Dijk
Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching
On 28/06/2019 16:05, Joerg Schilling wrote: Harald van Dijk wrote: That aside, I asked you last time you made this claim about POSIX to back it up. There is no requirement for standard utilities to be implemented portably. You responded then: POSIX intends to create portability at source code level. Code that is not portable does not follow the POSIX way. That's not a requirement for POSIX implementations, so it's not relevant. Well, I like to be able to test various shells on the same platform. This is close to impossible if I need to install a specific OS for every shell. Agreed that portability is a nice feature to have. It has a cost, and it is up to the maintainers to determine whether the feature is worth the cost, and if it is, whether it is worth the cost right now. If they choose not to focus efforts on portability right now, it is understandable that you do not personally test that shell. It's just that the conclusion from that should not be "this shell should not be considered", it should be "for this shell to be considered, someone else will have to provide the details". Cheers, Harald van Dijk