Re: sh: aliases in command substitutions
Yes, I do have an idea, since I was on those phone calls. It is your comments that are ill founded. The first unquoted newline terminates the recognition phase/lookahead's mentioned. Substitutions occur afterwards to determine final token classifications, not during this initial pass. That many substitutions can safely occur during this initial pass for various parser algorithms does not make them part of the model. Alias replacements occur during left to right scan of substitutions establishes a token after evaluations is not a keyword or other non-name token and the context according to the grammar to that point permits it to still be a command name, not an argument operand. Your question was 'is the standard really requiring that?' and imo due to the above the answer is 'yes', whether you want to believe it or not. On Monday, April 20, 2020 Robert Elz wrote: Date: Mon, 20 Apr 2020 21:17:12 + (UTC) From: shwaresyst Message-ID: <1050536090.3716059.1587417432...@mail.yahoo.com> | No, those are attempts at speed optimizations; I'm sad to have to reply like this, but do you have any idea at all what you're talking about? | the description before the numbered list of XCU 2.3 has line | delimiting comes first as the logical model to determine tokenizing mode. Yes, it does. Now go read it. Really read it. That distinction is to separate parsing tokens for the grammar, from here docs. Newlines appear at the switches from one mode to another. That's it. | This is continued in list items 4. and 5. 4's sole mention of newlines is that newline joining results in the \newline combination being completely deleted from the input. All 4 is saying is that a quoted string is (part of) a single token, and nothing in it ends the token. 5 doesn't mention newlines at all. That one just says that the various word expansions, once started, continue until they end, and the whole thing is (part of) one token. Neither quoted strings nor word expansions (or words containing word expansions) can be aliases, so neither 4 nor 5 is in any way relevant to alias processing. (Parsing the command inside a command substitution means recursive processing of everything - so for that the whole process starts over.) | that substitutions shall not occur during recognition. That's correct, they don't. But aliases are not that. None of the rest of what you say has anything to do with aliases either. Paramater expansions are not aliases (in ${CC} CC is not an alias). Please read 2.3.1 properly. In particular, where it says: After a token has been delimited, but before applying the grammatical rules in Section 2.10, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name. [...varipous conditions omitted, not relevant here] the word shall be replaced by the value of the alias It isn't 100% clear from that (but I believe it is in updated text that some bug number or other applies to this) that "replaced by" means that the word (which was detected to be an alias) is deleted, and the value of the alias is treated as replacement input, and put through the tokeniser as if it had been in the original input stream. This is also why aliases cannot be defined and used "close together" - the alias command has to have been executed before the use of it is parsed, for it to be effective. (unalias too). None of this is in dispute (there are some issues with technical details of how things get processed in some obscure cases, but none of that is relevant here). And once again, none of this is in any way evem slightly relevant to the question I asked. kre
Re: sh: aliases in command substitutions
Date:Mon, 20 Apr 2020 21:17:12 + (UTC) From:shwaresyst Message-ID: <1050536090.3716059.1587417432...@mail.yahoo.com> | No, those are attempts at speed optimizations; I'm sad to have to reply like this, but do you have any idea at all what you're talking about? | the description before the numbered list of XCU 2.3 has line | delimiting comes first as the logical model to determine tokenizing mode. Yes, it does. Now go read it. Really read it. That distinction is to separate parsing tokens for the grammar, from here docs. Newlines appear at the switches from one mode to another. That's it. | This is continued in list items 4. and 5. 4's sole mention of newlines is that newline joining results in the \newline combination being completely deleted from the input. All 4 is saying is that a quoted string is (part of) a single token, and nothing in it ends the token. 5 doesn't mention newlines at all. That one just says that the various word expansions, once started, continue until they end, and the whole thing is (part of) one token. Neither quoted strings nor word expansions (or words containing word expansions) can be aliases, so neither 4 nor 5 is in any way relevant to alias processing. (Parsing the command inside a command substitution means recursive processing of everything - so for that the whole process starts over.) | that substitutions shall not occur during recognition. That's correct, they don't. But aliases are not that. None of the rest of what you say has anything to do with aliases either. Paramater expansions are not aliases (in ${CC} CC is not an alias). Please read 2.3.1 properly. In particular, where it says: After a token has been delimited, but before applying the grammatical rules in Section 2.10, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name. [...varipous conditions omitted, not relevant here] the word shall be replaced by the value of the alias It isn't 100% clear from that (but I believe it is in updated text that some bug number or other applies to this) that "replaced by" means that the word (which was detected to be an alias) is deleted, and the value of the alias is treated as replacement input, and put through the tokeniser as if it had been in the original input stream. This is also why aliases cannot be defined and used "close together" - the alias command has to have been executed before the use of it is parsed, for it to be effective. (unalias too). None of this is in dispute (there are some issues with technical details of how things get processed in some obscure cases, but none of that is relevant here). And once again, none of this is in any way evem slightly relevant to the question I asked. kre
Re: sh: aliases in command substitutions
No, those are attempts at speed optimizations; the description before the numbered list of XCU 2.3 has line delimiting comes first as the logical model to determine tokenizing mode. This is continued in list items 4. and 5., that substitutions shall not occur during recognition. This makes it a requirement that a secondary pass, as the logical model, may be necessary to fully evaluate a token according to the grammar that applies for determining whether an alias name should be looked up. This model takes into account the result of a substitution may need to be classified as an assignment word or redirection when the grammar says a command prefix or keyword is the legal tokens, not a command or alias name only. This isn't obvious, but there are many scripts that rely on $CC to provide the command name for a compiler, as an example. This can't be checked whether it holds an actual name until recognition of the line as a whole has been completed. On Monday, April 20, 2020 Robert Elz wrote: Date: Mon, 20 Apr 2020 18:01:49 + (UTC) From: shwaresyst Message-ID: <1837359500.1041757.1587405709...@mail.yahoo.com> | It seems to me that what is missing, in XCU 2.3.1, is a statement that use | of keywords in alias bodies is unspecified behavior. That isn't "missing" because it isn't unspecified. What's more there is no dispute at all that this works, and works in all shells. | Alias expansion occurs after this line is identified, No, it doesn't. It occurs immediately after a word has been recognised in the command position - just the same as keyword recognition - and when a previous alias expansion has caused the next word to be a potential alias. Alias expansion (XCU 2.3.1) is in the Token recognition (XCU 2.3) section of the standard for a reason, it is not a word expansion (XCU 2.6)). But this is a general discussion of aliases, which is also not the point of my query (unless this turns into a "remove aliases entirely" discussion) which was very specific to alias recognition in command substitutions that are quoted. Joseph's message helps provide context, and it may be that now the "historically shells have not done this" is nolonger true, and the standard should revert to its earlier form. kre
Re: sh: aliases in command substitutions
It seems to me that what is missing, in XCU 2.3.1, is a statement that use of keywords in alias bodies is unspecified behavior. Even outside double quotes an initial scan collecting tokens to form a logical line distinct from a potential io-here body will have to treat an alias name as a command name and following arguments. Alias expansion occurs after this line is identified, in the context of seeing whether this line has multiple commands separated by semi-colons. For this to be reliable keywords establishing contexts where the meaning of overloaded operators such as ')' need to be disambiguated need to be recognizable as such on that initial pass, not only on a subsequent one. On Monday, April 20, 2020 Joerg Schilling wrote: Robert Elz wrote: > (lines 74718-22, Issue 7 TC2 - 2018 edition) says ... > > The input characters within the quoted string that are also enclosed > between "$(" and the matching ')' shall not be affected by the > double-quotes, but rather shall define that command whose output > replaces the "$(...)" when the word is expanded. The tokenizing rules > in Section 2.3, not including the alias substitutions in Section 2.3.1, > shall be applied recursively to find the matching ')'. ... > Not even the broken pdksh, which seems to match that ')' after the second > "foo" as terminating the command substitution, but then processes the > alias anyway (later) and cannot find a valid case statement within the > truncated command substitution, so generates a syntax error. > > But perhaps that is actually what the standard says must happen - we > don't use the alias for finding the matching ')', but then do when > parsing the command inside. That would be a recipe for disaster, > but if it is what old versions of ksh did/do then perhaps the standard > really is requiring that? If so, it is time for a change, as nothing > relevant acts like that any more (not mksh, not ksh93, not bosh, ...) I believe that David Korn at some time believed that he could write a simple parser for $(cmd) and introduced the opening '(' in case for simple counting symmetry. This however does not work for other reasons. ksh93 uses a resursive parser and Thorsten Glaser rewrote pdksh into mksh and while fixing plenty of bugs, he also started to use a recursive parser for $(cmd). bosh also uses a recursive parser and it would be of interest whether anyone did succeed to implement $(cmd) without using a recursive parser. ksh93 still uses a different method than bosh/mksh: - ksh93 recursively calls the parser to stop at the first superluous ')' and records all characters read during this attempt. - bosh and mksh recursively call the parser and tell it to stop at a superfluous ')' and then translate the binary syntax tree created by the parser back into a command text. Whether it works to use the alias switch=case is a different thing. If you like this to work, you need to have a lexer that expands aliases before detecting keywords. "bash" does not seem to do this. Jörg -- EMail:jo...@schily.net (home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: sh: aliases in command substitutions
Date:Mon, 20 Apr 2020 19:09:13 +0200 From:Joerg Schilling Message-ID: <5e9dd739.mzznyjmgko8+pliv%joerg.schill...@fokus.fraunhofer.de> | Whether it works to use the alias switch=case is a different thing. | If you like this to work, you need to have a lexer that expands | aliases before detecting keywords. No, you'd need that if you wanted to be able to do alias case=whatever but that's entirely different (and non-POSIX). | "bash" does not seem to do this. bash handles the "alias case=switch" just fine in general, it is just its command substitution parser (when unquoted) which is not quite up to the task (bash uses yacc (or bison)). kre
Re: sh: aliases in command substitutions
Robert Elz wrote: > (lines 74718-22, Issue 7 TC2 - 2018 edition) says ... > > The input characters within the quoted string that are also enclosed > between "$(" and the matching ')' shall not be affected by the > double-quotes, but rather shall define that command whose output > replaces the "$(...)" when the word is expanded. The tokenizing rules > in Section 2.3, not including the alias substitutions in Section 2.3.1, > shall be applied recursively to find the matching ')'. ... > Not even the broken pdksh, which seems to match that ')' after the second > "foo" as terminating the command substitution, but then processes the > alias anyway (later) and cannot find a valid case statement within the > truncated command substitution, so generates a syntax error. > > But perhaps that is actually what the standard says must happen - we > don't use the alias for finding the matching ')', but then do when > parsing the command inside.That would be a recipe for disaster, > but if it is what old versions of ksh did/do then perhaps the standard > really is requiring that? If so, it is time for a change, as nothing > relevant acts like that any more (not mksh, not ksh93, not bosh, ...) I believe that David Korn at some time believed that he could write a simple parser for $(cmd) and introduced the opening '(' in case for simple counting symmetry. This however does not work for other reasons. ksh93 uses a resursive parser and Thorsten Glaser rewrote pdksh into mksh and while fixing plenty of bugs, he also started to use a recursive parser for $(cmd). bosh also uses a recursive parser and it would be of interest whether anyone did succeed to implement $(cmd) without using a recursive parser. ksh93 still uses a different method than bosh/mksh: - ksh93 recursively calls the parser to stop at the first superluous ')' and records all characters read during this attempt. - bosh and mksh recursively call the parser and tell it to stop at a superfluous ')' and then translate the binary syntax tree created by the parser back into a command text. Whether it works to use the alias switch=case is a different thing. If you like this to work, you need to have a lexer that expands aliases before detecting keywords. "bash" does not seem to do this. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: aliases in command substitutions
Date:Mon, 20 Apr 2020 10:59:12 -0400 From:Chet Ramey Message-ID: <6802459a-6b68-5bd6-b535-401d5ec6b...@case.edu> | He's right, and it happened 30 years ago: Ah, OK, thanks - so it was originally for that purpose, but isn't needed for that any more (but is retained for back-compat, which makes sense, it is harmless, and for paren matching editors (which is a frill)). None of this has anything to do with whether aliases appearing in command substitutions ought to be processed when seeking the terminating ')' or whether it makes sense for that answer to depend upon whether the command sub was embedded within double quotes or not. kre
Re: aliases in command substitutions
On 4/20/20 9:34 AM, Robert Elz wrote: > | but I've always understood the > | case xxx in > | (pattern) ...;; > | esac > | > | (fully parenthesized pattern) syntax to have been invented precisely > | to allow case statements in $() subshell notation, > > First, $() is command substitution, not a subshell (not really important) > and if that was someone's intent, they did a particularly bad job of > implementing it, as what the standard says is (XCU 2.6.3) He's right, and it happened 30 years ago: "An optional open-parenthesis before pattern was added to allow numerous historical KornShell scripts to conform. At one time, using the leading parenthesis was required if the case statement were to be embedded within a $( ) command substitution; this is no longer the case with the POSIX shell. Nevertheless, many existing scripts use the open-parenthesis, if only because it makes matching-parenthesis searching easier in vi and other editors. This is a relatively simple implementation change that is fully upward compatible for all scripts." This is from 1991, and I'm certain, though I don't have it with me right now, that the same text appeared in the 1992 version of the standard. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: aliases in command substitutions
Date:Mon, 20 Apr 2020 07:12:03 + From:"Schwarz, Konrad" Message-ID: <38be7e5d52c74c9dac140f7de5105...@siemens.com> | Not sure if I understand your problem, I suspect probably not. | but I've always understood the | case xxx in | (pattern) ...;; | esac | | (fully parenthesized pattern) syntax to have been invented precisely | to allow case statements in $() subshell notation, First, $() is command substitution, not a subshell (not really important) and if that was someone's intent, they did a particularly bad job of implementing it, as what the standard says is (XCU 2.6.3) With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results. Note the "any valid shell script" (with that one exception) - a valid shell script certainly includes a case statement where the optional '(' is omitted. My guess has always been that the '(' was invented as a sop to parenthese balancing editors - to make it possible for those things to assist with balancing parentheses. But that's mere speculation, I wasn't around at the time. A workaround for broken shells is another possibility. But the case statement in my e-mail was just an easily understood (familiar) example I used to illustrate the real point, which relates to whether or not aliases are to be processed in double-quoted command substitutions (whether only for the point of finding the terminating ')' or including actual processing). A different example... alias nest='(' nest echo foo ) works everywhere (as it should in any posix shell). echo $( nest echo foo ) ) works in ash based shells, yash, bosh, mksh, and zsh, but not ksh93 or bash (or ancient pdksh). That one should work everywhere, the command substitution contains a valid shell script (which not only is not entirely redirections, it has no redirections at all). echo "$(nest echo foo ) )" This one works everywhere (including ksh93 and bash) except old pdksh. But according to the standard, shouldn't, as the standard prohibits processing the "nest" alias when looking for the end of the command substitution. Given that, the command substitution command is (defined to be) nest echo foo which should fail, either because there is no nest command (if no alias processing is done at all) or because of the unmatched parentheses if the next alias is later processed (this last is what pdksh appears to do.) In this scenario, the trailing " )" is just more data for the echo command to write. That is "my problem" - the standard is requiring processing in a way that no-one relevant does it (any more). It is time for this part to be updated (if it hasn't already been.) I'm not sure I really buy into Harald's "once a syntax error is found, the shell can do whateber it wants" and I suspect that given a bit of time to think about it, I could come up with an example which has no syntax errors but where the intrepretation differs depending upon whether aliases are processed there or not. At the very least, we need an explanation why aliases aren't to be processed when looking for the closing ')' in a double quoted command substitution, but are in an unquoted one. I suspect that the answer is "because that's how ksh88 implemented it", which in this case is a poor one - other bugs (and this really cannot be anything except that) in ksh88 were fixed or worked around in the standard, this one should have been as well. Here's another test case to play with: alias short='echo foo )' ( short works everywhere. echo $( short works in all ash based shells, bosh, yash, mksh. bash and zsh are very obviously buggy: bash$ alias short alias short='echo foo )' bash$ ( short foo bash$ echo $( short ) -bash: shor: command not found ) (bash gave a PS2 prompt, and I typed an extra ')'. No idea what it did with my 't'... zsh $ alias short short='echo foo )' zsh $ echo $( short ) zsh: parse error near `)' zsh $ echo $( short zsh: command not found: shortecho Two tests there, the first (like with bash) I typed a ')' at the PS2 prompt, or where I assume one was expected (zsh didn't write it) after which it complained about the excess ')' (which is close to correct). If instead I typed a newline where the PS2 prpmpt might have been, I got the 2nd response. That's obviously simply a bug. [I tried the newline response for bash as well, but it simply issued a new PS2 and waited for more.] Lastly, with this one echo "$( short" there are no shells that do anything that I would have expected (not even mine). All wait for more input, some write PS2 (zsh does this time) There doesn't seem to be any obvious input that will satisfy any of them
Re: XCU 2.14: Exit status by no-argument `return' in shell trap handlers
2020-04-20 1:42 Robert Elz : > Probably not, bosh is derived from that shell (more or less) and it is > also A > > [...] > > So are the FreeBSD and NetBSD shells (which is not surprising, as like > dash, they're descendants of ash). > > You can also add zsh to A: Thank you for the information. I have also tested ksh93, pdksh and oksh (OpenBSD KornShell) on FreeBSD. `ksh93' is (B), and `pdksh' and `oksh' do not implement the special treatment in trap handlers. Here is the current list: (A) zsh (zsh-5.7.1, zsh-5.6.2) ash variants (dash-0.5.10.2, busybox-1.28.3, FreeBSD sh, NetBSD sh), Bourne sh variants (heirloom-sh-050706-4, bosh-2020/04/18) (B) bash-4.4, gwsh, ksh variants (ksh-2020.0.0, ksh93.u_1, mksh-R57, mksh-R56) (C) yash-2.49 (D) none (Not Implemented) bash-4.3, pdksh-5.2.14.2, oksh-6.6.1, posh-0.13.2 I think now it is rather clear that the current wording of POSIX is somehow ambiguous and there can be different interpretations. In fact, there is a split, (A) vs (B), in shell implementations. >> 2020-04-19 23:00 Harald van Dijk : >>> It does still mean that anyone writing a function needs to beware that >>> exit and exit $? do not have the same effect, if there is any >>> possibility that the function will be invoked from a trap action. I >>> suspect most people writing functions will not be aware of that. Yes, that is the problem. Actually, the original problem is that I just want to perform `eval "$PROMPT_COMMAND"` in a trap action in Bash script where `$PROMPT_COMMAND' is provided by users. If everything is under the control, I can just always write `return $?', but the commands in `$PROMPT_COMMAND' are specified by users who are unlikely to care about this problem. This specific case is just my personal one, but I think the behavior (B) can possibly cause similar problems in other shell scripts in general. >> 2020-04-19 20:51 Robert Elz : >>> [...] >>> >>> The end result, unless we can get agreement that some >>> implementations are buggy, and will be fixed (which given the >>> split seems an unlikely outcome) is likely to simply be that all >>> of this simply becomes unspecified (or perhaps we could hope, >>> implementation defined) which will mean even more cases where it >>> becomes more difficult to write portable reliable code. >>> >>> kre >> 2020-04-19 23:00 Harald van Dijk : >>> True, and if the intent is that exit and return behave differently >>> and the standard is updated to clearly require that, I have no >>> problem changing the shell back to the prior behaviour. According to these comments, it seems like a deadlock. The standard will not change until shells change their behavior to match with each other, and shells will not change their behavior until the standard is clarified. Actually I can understand both sides, but I think we can find a point of compromise. There are already different implementations so it is already difficult to write portable and reliable code. Even some shells such as `pdksh', `oksh' and `posh' do not implement the special behavior of `return' in trap actions at all, so the standard does not describe the current situation properly. Thus I think the side effect of making it unspecified is limited. Maybe we can first let it be unspecified and then wait to see whether the shells will switch the behavior from the literal reading (B) to more sensible interpretation (A) or not. - It might be difficult to change the behavior (B) of `ksh' and `mksh' because its behavior is unchanged at least since 1993. - While, `bash' implemented the behavior (B) in bash-4.4 which is relatively recently in its long history, so maybe we can hope for a change. - Harald: The remaining shell with (B) in the list is `gwsh'. For example, if the standard changes its description to `unspecified' (instead of clearly requiring (A) or other interpretation), do you think you have a chance to change the behavior back? - I think `yash' with the interpretation (C) will follow the standard anytime if the standard clarifies the intended behavior because `yash' aims to strictly support POSIX. Thank you. -- Koichi