Re: sh: aliases in command substitutions

2020-04-20 Thread shwaresyst

Yes, I do have an idea, since I was on those phone calls. It is your comments 
that are ill founded. The first unquoted newline terminates the recognition 
phase/lookahead's mentioned. Substitutions occur afterwards to determine final 
token classifications, not during this initial pass. That many substitutions 
can safely occur during this initial pass for various parser algorithms does 
not make them part of the model. Alias replacements occur during left to right 
scan of substitutions establishes a token after evaluations is not a keyword or 
other non-name token and the context according to the grammar to that point 
permits it to still be a command name, not an argument operand. Your question 
was 'is the standard really requiring that?' and imo due to the above the 
answer is 'yes', whether you want to believe it or not.
On Monday, April 20, 2020 Robert Elz  wrote:
    Date:        Mon, 20 Apr 2020 21:17:12 + (UTC)
    From:        shwaresyst 
    Message-ID:  <1050536090.3716059.1587417432...@mail.yahoo.com>

  | No, those are attempts at speed optimizations;

I'm sad to have to reply like this, but do you have any idea at all
what you're talking about?

  | the description before the numbered list of XCU 2.3 has line
  | delimiting comes first as the logical model to determine tokenizing mode.

Yes, it does.  Now go read it.  Really read it.  That distinction is
to separate parsing tokens for the grammar, from here docs.  Newlines
appear at the switches from one mode to another.  That's it.

  | This is continued in list items 4. and 5.

4's sole mention of newlines is that newline joining results in the
\newline combination being completely deleted from the input.  All 4
is saying is that a quoted string is (part of) a single token, and
nothing in it ends the token.

5 doesn't mention newlines at all.  That one just says that the
various word expansions, once started, continue until they end, and
the whole thing is (part of) one token.

Neither quoted strings nor word expansions (or words containing word
expansions) can be aliases, so neither 4 nor 5 is in any way relevant
to alias processing.  (Parsing the command inside a command substitution
means recursive processing of everything - so for that the whole process
starts over.)

  | that substitutions shall not occur during recognition.

That's correct, they don't.  But aliases are not that.

None of the rest of what you say has anything to do with aliases either.
Paramater expansions are not aliases (in ${CC} CC is not an alias).

Please read 2.3.1 properly.  In particular, where it says:

    After a token has been delimited, but before applying the grammatical
    rules in Section 2.10, a resulting word that is identified to be the
    command name word of a simple command shall be examined to determine
    whether it is an unquoted, valid alias name.
  [...varipous conditions omitted, not relevant here]
    the word shall be replaced by the value of the alias

It isn't 100% clear from that (but I believe it is in updated text
that some bug number or other applies to this) that "replaced by"
means that the word (which was detected to be an alias) is deleted,
and the value of the alias is treated as replacement input, and put
through the tokeniser as if it had been in the original input stream.

This is also why aliases cannot be defined and used "close together" - the
alias command has to have been executed before the use of it is parsed, for
it to be effective.  (unalias too).

None of this is in dispute (there are some issues with technical details of
how things get processed in some obscure cases, but none of that is relevant
here).

And once again, none of this is in any way evem slightly relevant to the
question I asked.

kre




Re: sh: aliases in command substitutions

2020-04-20 Thread Robert Elz
Date:Mon, 20 Apr 2020 21:17:12 + (UTC)
From:shwaresyst 
Message-ID:  <1050536090.3716059.1587417432...@mail.yahoo.com>

  | No, those are attempts at speed optimizations;

I'm sad to have to reply like this, but do you have any idea at all
what you're talking about?

  | the description before the numbered list of XCU 2.3 has line
  | delimiting comes first as the logical model to determine tokenizing mode.

Yes, it does.   Now go read it.   Really read it.   That distinction is
to separate parsing tokens for the grammar, from here docs.   Newlines
appear at the switches from one mode to another.   That's it.

  | This is continued in list items 4. and 5.

4's sole mention of newlines is that newline joining results in the
\newline combination being completely deleted from the input.  All 4
is saying is that a quoted string is (part of) a single token, and
nothing in it ends the token.

5 doesn't mention newlines at all.   That one just says that the
various word expansions, once started, continue until they end, and
the whole thing is (part of) one token.

Neither quoted strings nor word expansions (or words containing word
expansions) can be aliases, so neither 4 nor 5 is in any way relevant
to alias processing.   (Parsing the command inside a command substitution
means recursive processing of everything - so for that the whole process
starts over.)

  | that substitutions shall not occur during recognition.

That's correct, they don't.   But aliases are not that.

None of the rest of what you say has anything to do with aliases either.
Paramater expansions are not aliases (in ${CC} CC is not an alias).

Please read 2.3.1 properly.  In particular, where it says:

 After a token has been delimited, but before applying the grammatical
 rules in Section 2.10, a resulting word that is identified to be the
 command name word of a simple command shall be examined to determine
 whether it is an unquoted, valid alias name.
   [...varipous conditions omitted, not relevant here]
 the word shall be replaced by the value of the alias

It isn't 100% clear from that (but I believe it is in updated text
that some bug number or other applies to this) that "replaced by"
means that the word (which was detected to be an alias) is deleted,
and the value of the alias is treated as replacement input, and put
through the tokeniser as if it had been in the original input stream.

This is also why aliases cannot be defined and used "close together" - the
alias command has to have been executed before the use of it is parsed, for
it to be effective.   (unalias too).

None of this is in dispute (there are some issues with technical details of
how things get processed in some obscure cases, but none of that is relevant
here).

And once again, none of this is in any way evem slightly relevant to the
question I asked.

kre




Re: sh: aliases in command substitutions

2020-04-20 Thread shwaresyst

No, those are attempts at speed optimizations; the description before the 
numbered list of XCU 2.3 has line delimiting comes first as the logical model 
to determine tokenizing mode. This is continued in list items 4. and 5., that 
substitutions shall not occur during recognition. 

This makes it a requirement that a secondary pass, as the logical model, may be 
necessary to fully evaluate a token according to the grammar that applies for 
determining whether an alias name should be looked up. This model takes into 
account the result of a substitution may need to be classified as an assignment 
word or redirection when the grammar says a command prefix or keyword is the 
legal tokens, not a command or alias name only.

This isn't obvious, but there are many scripts that rely on $CC to provide the 
command name for a compiler, as an example. This can't be checked whether it 
holds an actual name until recognition of the line as a whole has been 
completed.
On Monday, April 20, 2020 Robert Elz  wrote:
    Date:        Mon, 20 Apr 2020 18:01:49 + (UTC)
    From:        shwaresyst 
    Message-ID:  <1837359500.1041757.1587405709...@mail.yahoo.com>

  | It seems to me that what is missing, in XCU 2.3.1, is a statement that use
  | of keywords in alias bodies is unspecified behavior.

That isn't "missing" because it isn't unspecified.  What's more there is
no dispute at all that this works, and works in all shells.

  | Alias expansion occurs after this line is identified,

No, it doesn't.  It occurs immediately after a word has been recognised
in the command position - just the same as keyword recognition - and when
a previous alias expansion has caused the next word to be a potential alias.
Alias expansion (XCU 2.3.1) is in the Token recognition (XCU 2.3) section
of the standard for a reason, it is not a word expansion (XCU 2.6)).

But this is a general discussion of aliases, which is also not the point
of my query (unless this turns into a "remove aliases entirely" discussion)
which was very specific to alias recognition in command substitutions that
are quoted.

Joseph's message helps provide context, and it may be that now the
"historically shells have not done this" is nolonger true, and the
standard should revert to its earlier form.

kre



Re: sh: aliases in command substitutions

2020-04-20 Thread shwaresyst

It seems to me that what is missing, in XCU 2.3.1, is a statement that use of 
keywords in alias bodies is unspecified behavior. 

Even outside double quotes an initial scan collecting tokens to form a logical 
line distinct from a potential io-here body will have to treat an alias name as 
a command name and following arguments. Alias expansion occurs after this line 
is identified, in the context of seeing whether this line has multiple commands 
separated by semi-colons. For this to be reliable keywords establishing 
contexts where the meaning of overloaded operators such as ')' need to be 
disambiguated need to be recognizable as such on that initial pass, not only on 
a subsequent one.
On Monday, April 20, 2020 Joerg Schilling  
wrote:
Robert Elz  wrote:

> (lines 74718-22, Issue 7 TC2 - 2018 edition) says ...
>
>    The input characters within the quoted string that are also enclosed
>    between "$(" and the matching ')' shall not be affected by the
>    double-quotes, but rather shall define that command whose output
>    replaces the "$(...)" when the word is expanded. The tokenizing rules
>    in Section 2.3, not including the alias substitutions in Section 2.3.1,
>    shall be applied recursively to find the matching ')'.
...

> Not even the broken pdksh, which seems to match that ')' after the second
> "foo" as terminating the command substitution, but then processes the
> alias anyway (later) and cannot find a valid case statement within the
> truncated command substitution, so generates a syntax error.
>
> But perhaps that is actually what the standard says must happen - we
> don't use the alias for finding the matching ')', but then do when
> parsing the command inside.    That would be a recipe for disaster,
> but if it is what old versions of ksh did/do then perhaps the standard
> really is requiring that?  If so, it is time for a change, as nothing
> relevant acts like that any more (not mksh, not ksh93, not bosh, ...)

I believe that David Korn at some time believed that he could write a simple 
parser for $(cmd) and introduced the opening '(' in case for simple counting 
symmetry.

This however does not work for other reasons.

ksh93 uses a resursive parser and Thorsten Glaser rewrote pdksh into mksh and 
while fixing plenty of bugs, he also started to use a recursive parser for 
$(cmd).

bosh also uses a recursive parser and it would be of interest whether anyone 
did succeed to implement $(cmd) without using a recursive parser.

ksh93 still uses a different method than bosh/mksh:

-    ksh93 recursively calls the parser to stop at the first superluous ')'
    and records all characters read during this attempt.

-    bosh and mksh recursively call the parser and tell it to stop at a
    superfluous ')' and then translate the binary syntax tree created by
    the parser back into a command text.

Whether it works to use the alias switch=case is a different thing.

If you like this to work, you need to have a lexer that expands aliases before
detecting keywords. "bash" does not seem to do this.

Jörg

-- 
 EMail:jo...@schily.net                    (home) Jörg Schilling D-13353 Berlin
    joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: sh: aliases in command substitutions

2020-04-20 Thread Robert Elz
Date:Mon, 20 Apr 2020 19:09:13 +0200
From:Joerg Schilling 
Message-ID:  <5e9dd739.mzznyjmgko8+pliv%joerg.schill...@fokus.fraunhofer.de>

  | Whether it works to use the alias switch=case is a different thing.

  | If you like this to work, you need to have a lexer that expands
  | aliases before detecting keywords.

No, you'd need that if you wanted to be able to do

alias case=whatever

but that's entirely different (and non-POSIX).

  | "bash" does not seem to do this.

bash handles the "alias case=switch" just fine in general, it is just
its command substitution parser (when unquoted) which is not quite up
to the task (bash uses yacc (or bison)).

kre




Re: sh: aliases in command substitutions

2020-04-20 Thread Joerg Schilling
Robert Elz  wrote:

> (lines 74718-22, Issue 7 TC2 - 2018 edition) says ...
>
> The input characters within the quoted string that are also enclosed
> between "$(" and the matching ')' shall not be affected by the
> double-quotes, but rather shall define that command whose output
> replaces the "$(...)" when the word is expanded. The tokenizing rules
> in Section 2.3, not including the alias substitutions in Section 2.3.1,
> shall be applied recursively to find the matching ')'.
...

> Not even the broken pdksh, which seems to match that ')' after the second
> "foo" as terminating the command substitution, but then processes the
> alias anyway (later) and cannot find a valid case statement within the
> truncated command substitution, so generates a syntax error.
>
> But perhaps that is actually what the standard says must happen - we
> don't use the alias for finding the matching ')', but then do when
> parsing the command inside.That would be a recipe for disaster,
> but if it is what old versions of ksh did/do then perhaps the standard
> really is requiring that?   If so, it is time for a change, as nothing
> relevant acts like that any more (not mksh, not ksh93, not bosh, ...)

I believe that David Korn at some time believed that he could write a simple 
parser for $(cmd) and introduced the opening '(' in case for simple counting 
symmetry.

This however does not work for other reasons.

ksh93 uses a resursive parser and Thorsten Glaser rewrote pdksh into mksh and 
while fixing plenty of bugs, he also started to use a recursive parser for 
$(cmd).

bosh also uses a recursive parser and it would be of interest whether anyone 
did succeed to implement $(cmd) without using a recursive parser.

ksh93 still uses a different method than bosh/mksh:

-   ksh93 recursively calls the parser to stop at the first superluous ')'
and records all characters read during this attempt.

-   bosh and mksh recursively call the parser and tell it to stop at a
superfluous ')' and then translate the binary syntax tree created by
the parser back into a command text.

Whether it works to use the alias switch=case is a different thing.

If you like this to work, you need to have a lexer that expands aliases before
detecting keywords. "bash" does not seem to do this.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: aliases in command substitutions

2020-04-20 Thread Robert Elz
Date:Mon, 20 Apr 2020 10:59:12 -0400
From:Chet Ramey 
Message-ID:  <6802459a-6b68-5bd6-b535-401d5ec6b...@case.edu>

  | He's right, and it happened 30 years ago:

Ah, OK, thanks - so it was originally for that purpose, but isn't
needed for that any more (but is retained for back-compat, which makes
sense, it is harmless, and for paren matching editors (which is a frill)).

None of this has anything to do with whether aliases appearing in
command substitutions ought to be processed when seeking the terminating
')' or whether it makes sense for that answer to depend upon whether
the command sub was embedded within double quotes or not.

kre



Re: aliases in command substitutions

2020-04-20 Thread Chet Ramey
On 4/20/20 9:34 AM, Robert Elz wrote:

>   | but I've always understood the
>   | case xxx in
>   | (pattern) ...;;
>   | esac
>   |
>   | (fully parenthesized pattern) syntax to have been invented precisely
>   | to allow case statements in $() subshell notation,
> 
> First, $() is command substitution, not a subshell (not really important)
> and if that was someone's intent, they did a particularly bad job of
> implementing it, as what the standard says is (XCU 2.6.3)

He's right, and it happened 30 years ago:

"An optional open-parenthesis before pattern was added to allow numerous
historical KornShell scripts to conform. At one time, using the leading
parenthesis was required if the case statement were to be embedded within a
$( ) command substitution; this is no longer the case with the POSIX shell.
Nevertheless, many existing scripts use the open-parenthesis, if only
because it makes matching-parenthesis searching easier in vi and other
editors. This is a relatively simple implementation change that is fully
upward compatible for all scripts."

This is from 1991, and I'm certain, though I don't have it with me right
now, that the same text appeared in the 1992 version of the standard.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: aliases in command substitutions

2020-04-20 Thread Robert Elz
Date:Mon, 20 Apr 2020 07:12:03 +
From:"Schwarz, Konrad" 
Message-ID:  <38be7e5d52c74c9dac140f7de5105...@siemens.com>

  | Not sure if I understand your problem,

I suspect probably not.

  | but I've always understood the
  | case xxx in
  | (pattern) ...;;
  | esac
  |
  | (fully parenthesized pattern) syntax to have been invented precisely
  | to allow case statements in $() subshell notation,

First, $() is command substitution, not a subshell (not really important)
and if that was someone's intent, they did a particularly bad job of
implementing it, as what the standard says is (XCU 2.6.3)

With the $(command) form, all characters following the open
parenthesis to the matching closing parenthesis constitute the
command. Any valid shell script can be used for command, except a
script consisting solely of redirections which produces unspecified
results.

Note the "any valid shell script" (with that one exception) - a valid shell
script certainly includes a case statement where the optional '(' is
omitted.

My guess has always been that the '(' was invented as a sop to parenthese
balancing editors - to make it possible for those things to assist with
balancing parentheses.   But that's mere speculation, I wasn't around at
the time.  A workaround for broken shells is another possibility.

But the case statement in my e-mail was just an easily understood
(familiar) example I used to illustrate the real point, which relates
to whether or not aliases are to be processed in double-quoted command
substitutions (whether only for the point of finding the terminating ')'
or including actual processing).

A different example...

alias nest='('
nest echo foo )

works everywhere (as it should in any posix shell).

echo $( nest echo foo ) )

works in ash based shells, yash, bosh, mksh, and zsh, but not ksh93
or bash (or ancient pdksh).

That one should work everywhere, the command substitution contains
a valid shell script (which not only is not entirely redirections,
it has no redirections at all).

echo "$(nest echo foo ) )"

This one works everywhere (including ksh93 and bash) except old pdksh.
But according to the standard, shouldn't, as the standard prohibits
processing the "nest" alias when looking for the end of the command
substitution.

Given that, the command substitution command is (defined to be)

nest echo foo

which should fail, either because there is no nest command
(if no alias processing is done at all) or because of the
unmatched parentheses if the next alias is later processed
(this last is what pdksh appears to do.)

In this scenario, the trailing " )" is just more data for the
echo command to write.

That is "my problem" - the standard is requiring processing in
a way that no-one relevant does it (any more).  It is time for
this part to be updated (if it hasn't already been.)

I'm not sure I really buy into Harald's "once a syntax error
is found, the shell can do whateber it wants" and I suspect
that given a bit of time to think about it, I could come up with
an example which has no syntax errors but where the intrepretation
differs depending upon whether aliases are processed there or
not.

At the very least, we need an explanation why aliases aren't to be
processed when looking for the closing ')' in a double quoted command
substitution, but are in an unquoted one.   I suspect that the answer
is "because that's how ksh88 implemented it", which in this case is a
poor one - other bugs (and this really cannot be anything except that)
in ksh88 were fixed or worked around in the standard, this one should
have been as well.

Here's another test case to play with:

alias short='echo foo )'
( short

works everywhere.

echo $( short

works in all ash based shells, bosh, yash, mksh.

bash and zsh are very obviously buggy:

bash$ alias short
alias short='echo foo )'
bash$ ( short
foo
bash$ echo $( short
)
-bash: shor: command not found

)

(bash gave a PS2 prompt, and I typed an extra ')'.   No idea
what it did with my 't'...

zsh $ alias short
short='echo foo )'
zsh $ echo $( short
)
zsh: parse error near `)'
zsh $ echo $( short
zsh: command not found: shortecho

Two tests there, the first (like with bash) I typed a ')' at the
PS2 prompt, or where I assume one was expected (zsh didn't write it)
after which it complained about the excess ')' (which is close to
correct).   If instead I typed a newline where the PS2 prpmpt might
have been, I got the 2nd response.   That's obviously simply a bug.
[I tried the newline response for bash as well, but it simply issued
a new PS2 and waited for more.]

Lastly, with this one

echo "$( short"

there are no shells that do anything that I would have expected (not
even mine).   All wait for more input, some write PS2 (zsh does this time)
There doesn't seem to be any obvious input that will satisfy any of them

Re: XCU 2.14: Exit status by no-argument `return' in shell trap handlers

2020-04-20 Thread Koichi Murase
2020-04-20 1:42 Robert Elz :
> Probably not, bosh is derived from that shell (more or less) and it is
> also A
>
> [...]
>
> So are the FreeBSD and NetBSD shells (which is not surprising, as like
> dash, they're descendants of ash).
>
> You can also add zsh to A:

Thank you for the information.  I have also tested ksh93, pdksh and
oksh (OpenBSD KornShell) on FreeBSD.  `ksh93' is (B), and `pdksh' and
`oksh' do not implement the special treatment in trap handlers.  Here
is the current list:

  (A) zsh (zsh-5.7.1, zsh-5.6.2)
ash variants (dash-0.5.10.2, busybox-1.28.3, FreeBSD sh, NetBSD sh),
Bourne sh variants (heirloom-sh-050706-4, bosh-2020/04/18)
  (B) bash-4.4, gwsh,
ksh variants (ksh-2020.0.0, ksh93.u_1, mksh-R57, mksh-R56)
  (C) yash-2.49
  (D) none
  (Not Implemented) bash-4.3, pdksh-5.2.14.2, oksh-6.6.1, posh-0.13.2

I think now it is rather clear that the current wording of POSIX is
somehow ambiguous and there can be different interpretations.  In
fact, there is a split, (A) vs (B), in shell implementations.

>> 2020-04-19 23:00 Harald van Dijk :
>>> It does still mean that anyone writing a function needs to beware that
>>> exit and exit $? do not have the same effect, if there is any
>>> possibility that the function will be invoked from a trap action. I
>>> suspect most people writing functions will not be aware of that.

Yes, that is the problem.  Actually, the original problem is that I
just want to perform `eval "$PROMPT_COMMAND"` in a trap action in Bash
script where `$PROMPT_COMMAND' is provided by users.  If everything
is under the control, I can just always write `return $?', but the
commands in `$PROMPT_COMMAND' are specified by users who are unlikely
to care about this problem.  This specific case is just my personal
one, but I think the behavior (B) can possibly cause similar problems
in other shell scripts in general.

>> 2020-04-19 20:51 Robert Elz :
>>> [...]
>>>
>>> The end result, unless we can get agreement that some
>>> implementations are buggy, and will be fixed (which given the
>>> split seems an unlikely outcome) is likely to simply be that all
>>> of this simply becomes unspecified (or perhaps we could hope,
>>> implementation defined) which will mean even more cases where it
>>> becomes more difficult to write portable reliable code.
>>>
>>> kre

>> 2020-04-19 23:00 Harald van Dijk :
>>> True, and if the intent is that exit and return behave differently
>>> and the standard is updated to clearly require that, I have no
>>> problem changing the shell back to the prior behaviour.

According to these comments, it seems like a deadlock. The standard
will not change until shells change their behavior to match with each
other, and shells will not change their behavior until the standard is
clarified.  Actually I can understand both sides, but I think we can
find a point of compromise.

There are already different implementations so it is already difficult
to write portable and reliable code.  Even some shells such as
`pdksh', `oksh' and `posh' do not implement the special behavior of
`return' in trap actions at all, so the standard does not describe the
current situation properly.  Thus I think the side effect of making it
unspecified is limited.  Maybe we can first let it be unspecified and
then wait to see whether the shells will switch the behavior from the
literal reading (B) to more sensible interpretation (A) or not.

- It might be difficult to change the behavior (B) of `ksh' and `mksh'
  because its behavior is unchanged at least since 1993.

- While, `bash' implemented the behavior (B) in bash-4.4 which is
  relatively recently in its long history, so maybe we can hope for a
  change.

- Harald: The remaining shell with (B) in the list is `gwsh'.  For
  example, if the standard changes its description to `unspecified'
  (instead of clearly requiring (A) or other interpretation), do you
  think you have a chance to change the behavior back?

- I think `yash' with the interpretation (C) will follow the standard
  anytime if the standard clarifies the intended behavior because
  `yash' aims to strictly support POSIX.

Thank you.

--
Koichi