Re: [1003.1(2013)/Issue7+TC1 0000953]: Alias expansion is under-specified

2019-01-09 Thread Harald van Dijk

On 09/01/2019 12:29, Austin Group Bug Tracker wrote:

[...] > --
  (0004201) geoffclare (manager) - 2019-01-09 12:29
  http://austingroupbugs.net/view.php?id=953#c4201
--
This is a proposed new resolution which addresses comments made since
http://austingroupbugs.net/view.php?id=953#c3113 both here and on the mailing
list.  There have been a
lot of comments, so if I missed anything please reply on the
mailing list and (if I agree) I will edit this note.
[...]
On page 2351 line 74901-74904 (XCU 2.5.3 Shell Variables)
change:This variable, when and only when an interactive shell
is invoked, shall be subjected to parameter expansion (see Section 2.6.2)
by the shell and the resulting value shall be used as a pathname of a file
containing shell commands to execute in the current
environment.to:This variable, when and only when
an interactive shell is invoked, shall be subjected to parameter expansion
(see Section 2.6.2) by the shell and the resulting value shall be used as a
pathname of a file.  Before any interactive commands are read, the shell
shall tokenize (see [xref to XCU 2.3 Token Recognition]) the contents of
the file, parse the tokens as a program (see [xref to XCU 2.10 Shell
Grammar]), and execute the resulting commands in the current environment.
(In other words, the contents of the ENV file are not parsed as a single
compound_list, unlike the contents of a dot script.  This
distinction matters because it influences when aliases take
effect.)


This last bit was part of an earlier version, but it no longer fits now 
that the contents of a dot script are no longer required to be parsed as 
a compound_list. The rest of the comment still makes perfect sense if 
you take out the ", unlike the contents of a dot script" bit, I think.


Cheers,
Harald van Dijk



Re: [1003.1(2013)/Issue7+TC1 0000953]: Alias expansion is under-specified

2019-01-09 Thread Robert Elz
Date:Wed, 9 Jan 2019 17:35:10 +
From:Stephane Chazelas 
Message-ID:  <20190109173510.xn4hdeqphbffb...@chaz.gmail.com>

  | I'd rather POSIX forbade applications to use "while", "until",
  | "do", "select", "time", etc in alias names, or leave it
  | unspecified whether aliases for those are expanded.

A lot of what you say I think, which I believe to be mostly correct,
comes down to the issue of for whom the standard is intended.

As long as it is expected that the audience is script writers, then
the doc should tell them what they can expect will work, and
what will not (or may not) so that portable applications can be
created.   I think the current wording does that, as shells do actuall
allow aliases to be created for keywords (in all shells for the ones
that are also English words - or similar, like fi etc, and in some shells
even for the others (! { ...).

I don't think we are in a position to forbid anything, even if we wanted,
but I assume you mean "would result in unspecified behaviour" if
an application makes an alias for a keyword - I'd have no real problem
with that.

  |  but more about not requiring limitations of the original implementation 
when
  | they're not justified.

If that were the objective, then the audience of the standard would need
to be the shell implementors, rather than the script writers.   And in that
case I assume the objective would be to allow exactly what you wanted
to forbid just above (though in your message, the two quoted occurred in
the alternate sequence) and instead allow the shell to expand aliases that
are keywords, everywhere.   I'd have no problem with that either.

As long as we're not explicitly covering both audiences (with different
text for each when required) we cannot really do both however.  Many
(most, perhaps even all) shells do not allow an alias to replace an
actual keyword (as distinct to a word with the same spelling used elsewhere)
so we cannot suggest that it even might be OK.   Nor can we tell the
shells not to expand words that would be keywords when used elsewhere
as currently users have the ability to do that, and we cannot break
existing conforming applications.

So, rock, meet the hard place...

kre

ps: the one incorrect (but irrelevant for your points) part of your message
was the "alias 'while=until'" ... since "until" is a keyword, that's not an 
alias that expands to what could be a simple command, and any use of
it  (in any context) would be unspecified.   However you made no us
e anywhere of that :"until", it could have just as easily been "foo", so
this is an insignificant issue.




[OT] builtin to eval code with arguments (Was: Alias implementations being invalidated by proposed new wording?)

2019-01-09 Thread Stephane Chazelas
2019-01-09 18:24:47 +0100, Joerg Schilling:
[...]
> They also had the idea of implementing a shell builtin that behaves like:
> 
>   sh -c cmd args
> 
> and thus could support parameterized macros.
[...]

That can only really be used for "parameterized macros" that
could be done as functions.

In POSIX shells, you can write that builtin as a function:

eval_with_args() {
  eval "shift; $1"
}

eval_with_args 'shell code' args

Though more practical would be the lambdas of "es" (based on
the Unix variant of "rc"):

$ es -c '@ {echo $1, $2} a b c'
a, b

Or:

$ es -c '@ x y {echo $x, $y(2)} a b c'
a, c

Or the anonymous functions of zsh:

$ zsh -c '(){echo $1, $2} a b c'
a, b

But there would be little point in declaring aliases for that.
You'd define normal "named" functions instead.

-- 
Stephane



Re: [1003.1(2013)/Issue7+TC1 0000953]: Alias expansion is under-specified

2019-01-09 Thread Stephane Chazelas
One concern I have is that if I understand correctly, it
*allows* application to do:

alias 'while=until'

(though doesn't for other keywords like "{", "!")

and then *requires* implementations to expand "while" in

alias 'echo_expand=echo '
echo_expand while

and *requires* implementations *not* to expand "while" in

while true; do ...; done

Which prevents implementations from doing the kind of alias
expansion done by csh or zsh (more useful IMO, as it is then
similar to what the C preprocessor macro expansion does and was
I beleive the original intension for aliases; that can be useful
for all sorts of code instrumentation though quite limited with
out parameterized aliases).

Also, again, ksh88 (and so the POSIX sh of most commercial
Unices) does allow "select" (a keyword, as allowed by POSIX) to
be aliased.

Currently, in POSIX mode, zsh doesn't do alias expansion for
keywords, including in the echo_expand case. It's not about zsh,
I'm sure zsh will align to whatever POSIX requires for its POSIX
mode, but more about not requiring limitations of the original
implementation when they're not justified.

I'd rather POSIX forbade applications to use "while", "until",
"do", "select", "time", etc in alias names, or leave it
unspecified whether aliases for those are expanded.

-- 
Stephane



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Joerg Schilling
Robert Elz  wrote:

>   | Then in 1980, former AT people that created the company "Charles River 
> Data
>   | Systems" and the first UNIX clone "UNOS" created an alias implementation 
>   | concept that sits in the lexer and expands text. This is the most powerful
>   | alias concept that has been implemented for expansion in the lexer.
>
> That's interesting - I had the misfortune to use unos for a (short)
> while, and don't recall ever knowing of that.

They also had the idea of implementing a shell builtin that behaves like:

sh -c cmd args

and thus could support parameterized macros.

>   | For this reason, it it natural not to implement a special meaning for "\ 
> ".
>   | ksh88 and ksh93 seem to be the only shells that implement a special 
> meaning for 
>   | "\ " here.
>
> Which special meaning do you refer to there?

Not to expand further aliases.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



[1003.1(2016)/Issue7+TC2 0001224]: Conflict between 2.9.1 and 2.10.2 re simple command terminator

2019-01-09 Thread Austin Group Bug Tracker


The following issue has been SUBMITTED. 
== 
http://austingroupbugs.net/view.php?id=1224 
== 
Reported By:geoffclare
Assigned To:
== 
Project:1003.1(2016)/Issue7+TC2
Issue ID:   1224
Category:   Shell and Utilities
Type:   Error
Severity:   Editorial
Priority:   normal
Status: New
Name:   Geoff Clare 
Organization:   The Open Group 
User Reference:  
Section:2.9.1 
Page Number:2365 
Line Number:75483 
Interp Status:  --- 
Final Accepted Text: 
== 
Date Submitted: 2019-01-09 15:37 UTC
Last Modified:  2019-01-09 15:37 UTC
== 
Summary:Conflict between 2.9.1 and 2.10.2 re simple command
terminator
Description: 
Section 2.9.1 says:

A ``simple command'' is a sequence of optional variable
assignments and redirections, in any sequence, optionally followed
by words and redirections, terminated by a control operator.

This suggests that a simple command includes the terminating control
operator (in the same way that a line includes the terminating ),
but this conflicts with the grammar in 2.10.2 where the simple_command
production does not include the terminator.

Since the grammar has precedence over the text syntax description,
the erroneous text "terminated by a control operator" can be removed
from 2.9.1 as an editorial change without affecting the requirements of
the standard.
Desired Action: 
Delete ", terminated by a control operator".
== 

Issue History 
Date ModifiedUsername   FieldChange   
== 
2019-01-09 15:37 geoffclare New Issue
2019-01-09 15:37 geoffclare Name  => Geoff Clare 
2019-01-09 15:37 geoffclare Organization  => The Open Group  
2019-01-09 15:37 geoffclare Section   => 2.9.1   
2019-01-09 15:37 geoffclare Page Number   => 2365
2019-01-09 15:37 geoffclare Line Number   => 75483   
2019-01-09 15:37 geoffclare Interp Status => --- 
==




Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Chet Ramey
On 1/7/19 6:55 AM, Joerg Schilling wrote:

> The way I have the teleconference in mind where we set up the new text, the 
> above commands causes undefined results because the shell is _allowed_ but 
> not 
> required to parse scripts as a whole under some conditions.

I think Geoff's proposed resolution from today does that. The original and
revised proposals for bug 953 did not.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [1003.1(2013)/Issue7+TC1 0000953]: Alias expansion is under-specified

2019-01-09 Thread Geoff Clare
Robert Elz  wrote, on 09 Jan 2019:
>
> There are just a couple of minor points that I have with your
> wording, one where I think a little more clarity is needed, and
> one where your wording isn't quite correct.
> 
> 
>   | ... change to:
> 
>   | After a TOKEN has been delimited,
> 
> This is where I think a little extra clarity would help, and
> I'd change that to be
> 
>   After a token of type TOKEN [xref XCU 2.10.1] has
>   been delimited,
> 
> just to make it clear that TOKEN is a specific type of token,
> and not just a weird typographical convention (it helps readers
> interpret the meaning more easily).

The sentence before this (at the end of 2.3) is "Once a token is
delimited, it is categorized as required by the grammar in [xref to
2.10]", so I'd like to go with:

After a token has been categorized as type TOKEN (see [xref to 2.10.1]),

>   | If the value of the alias replacing the TOKEN ends in a  that would
>   | be unquoted after substitution, and optionally if it ends in a  
> that
>   | would be quoted after substitution, the shell shall check the next TOKEN 
> in
>   | the input for alias substitution;
> 
> This is where the wording is incorrect, it is not the next TOKEN, which
> would imply simply skipping intermediate operators, etc, but the next
> token, if and only if, it is a TOKEN, that it is considered for alias 
> substitution.
[...]
> 
> So I would change
>   shall check the next TOKEN in the input
> into
>   shall check the next token in the input, if it is a TOKEN,

Good catch - I'll make that change.

I've also just noticed that 2.10.1 and 2.10.2 have TOKEN in bold everywhere,
so I suppose I should do the same.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Robert Elz
Date:Wed, 9 Jan 2019 13:27:58 +
From:"Schwarz, Konrad" 
Message-ID:  


  | I think it would reduce confusion if it were explicitly mandated.

That is not what this group does, and not what any standards
group should do - the objective is to work out what is the
accepted standard, and document it, so that others can reply
upon that (both to use, and produce new implementations that
will satisfy users' expectations.

kre



Re: [1003.1(2013)/Issue7+TC1 0000953]: Alias expansion is under-specified

2019-01-09 Thread Robert Elz
Date:Wed, 9 Jan 2019 12:29:45 +
From:Austin Group Bug Tracker 
Message-ID:  <95df9c99cbc201dbbf9de3d53079d...@austingroupbugs.net>

  |  please reply on the
  | mailing list and (if I agree) I will edit this note.

I wish the part of all of this that really belongs in the resolution
of issue 1055 had been left for that one rather than all included
here, and then, one assumes also all included there - as that
issue covers more than aliases, yet has the exact same issues.

That said, I don't disagree with the proposed resolution of any
of that issue in your new wording as it affects aliases, it just
ought all be worded more generally so it applies to everything.


There are just a couple of minor points that I have with your
wording, one where I think a little more clarity is needed, and
one where your wording isn't quite correct.


  | ... change to:

  | After a TOKEN has been delimited,

This is where I think a little extra clarity would help, and
I'd change that to be

After a token of type TOKEN [xref XCU 2.10.1] has
been delimited,

just to make it clear that TOKEN is a specific type of token,
and not just a weird typographical convention (it helps readers
interpret the meaning more easily).


  | If the value of the alias replacing the TOKEN ends in a  that would
  | be unquoted after substitution, and optionally if it ends in a  that
  | would be quoted after substitution, the shell shall check the next TOKEN in
  | the input for alias substitution;

This is where the wording is incorrect, it is not the next TOKEN, which
would imply simply skipping intermediate operators, etc, but the next
token, if and only if, it is a TOKEN, that it is considered for alias 
substitution.

There is exactly one chance for this particular alias lookup, blink
and you miss it - the very next token needs to be a valid alias,
otherwise the whole thing stops.Only whitespace that produces
no tokens can intervene (in the original input) between the TOKEN
that was the alias with a value ending with a blank, and the
prospective new alias TOKEN.

Even a comment appearing between ends it, not because of the
comment, which the lexer just drops, but because a comment only
ends when a newline is seen, and that's an operator token, and
so cannot be aliased.   (Of course, usually then the next word would
be looked up as an alias anyway, as the word in a command word
position, but that was not because of the trailing blank.)

So I would change
shall check the next TOKEN in the input
into
shall check the next token in the input, if it is a TOKEN,

kre



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Chet Ramey
On 1/9/19 8:27 AM, Schwarz, Konrad wrote:
>> -Original Message-
>> Expressly making it defined that
>>  alias foo='whatever \ '
>> which does end in a space (but otherwise is the exact same thing as the 
>> previous one) also does not expand aliases in the following
>> word
>> seems redundant to me.   Since several shells (but not all) do expand
>> aliases in this case, it seems to me the best thing to do is to leave this 
>> as unspecified, such that no-one sane will ever use it (if
>> something is needed, just use the previous form -- but better is not to use 
>> aliases at all.)
> 
> Coming from ksh, I've always understood the alias mechanism to work at the 
> lexical level (macro expansion with rescanning); the quoting behavior above 
> is the most natural in that context.
> 
> I think it would reduce confusion if it were explicitly mandated.

Regardless, the fact that existing shells do it differently is reason
enough to not mandate a particular behavior.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Robert Elz
Date:Wed, 9 Jan 2019 13:55:16 +0100
From:Joerg Schilling 
Message-ID:  <5c35ef34.clu1godeocvhzivr%joerg.schill...@fokus.fraunhofer.de>

  | Well, the original Bourne Shell did not have aliases.

Yes, I know that...

  | I believe that csh introduced an alias concept in 1979 that works completely
  | different from what ksh implemented much later.

Yes, I think the ashell version was rather simpler, but certainly the csh
aliases were quite different.

  | The difference, I believe is that alias expansion happens at a different 
  | location in csh.

Yes, it has the whole command line of the command containing the alias
available, and can use parts of that command (or of previous commands,
using the history mechanisms) as part of the generated expansion.

  | Then in 1980, former AT people that created the company "Charles River 
Data
  | Systems" and the first UNIX clone "UNOS" created an alias implementation 
  | concept that sits in the lexer and expands text. This is the most powerful
  | alias concept that has been implemented for expansion in the lexer.

That's interesting - I had the misfortune to use unos for a (short)
while, and don't recall ever knowing of that.

  | For this reason, it it natural not to implement a special meaning for "\ ".
  | ksh88 and ksh93 seem to be the only shells that implement a special meaning 
for 
  | "\ " here.

Which special meaning do you refer to there?

  | > the replacement test would be, effectively
  | >
  | >   "ls -cF" .
  | >
  | > (the quotes would not be there, but the ls -CF
  | > part would be a single 6 character word) and that
  | > would be the command word of the generated command.
  |
  | This does not happen as the lexer is called again and creates two word 
tokens 
  | from the alias replacement.

Yes, I know that, but that isn't what the current published standard says
(which is why bug report 953 was filed I assume, what the standard said
about aliases was nothing like reality.)


  | > In the example in question here, the original text is
  | >
  | >   3>&1 command
  | >
  | > and we have "alias 3=4"

  | "3" is not a word that is in a position of a potential command name.

As far as the lexer is concerned it is.

  | If the lexer did parse the input in a way that does not connect "3" to the 
IO 
  | redirection,

It is connected, in the sense that it is an IO_NUMBER, but that is a word.

  | it would be alias expanded,

Did you mean "not" there?

  | since the knowledge about "a word at a 
  | position suitable for a command name" is not in the lexer.

Yes.it is, it has to be to implement aliasing the way the
standard requires (assuming that aliasing is done in the
lexer, which it almost always is).  It might not be available
in your shell, but it generally is in others.   The grammar
(well, the parser, using the grammar) makes it known when
it is fetching a token which could be the command name of
a simple command (then the lexer uses that info to decide
whether to do an alias lookup - usually the lexer also does
keyword lookups as well, and returns different token names
for the different keywords, but that's just an implementation
choice).

None of this matters as long as Geoff's recent new proposed
wording as the resolution of 953 is (mostly) accepted, as all
of these issues are cleaned up - it is no longer "word" that
is expanded, but TOKEN, wich is a subset of word that does
not include the IO_NUMBER.

kre




RE: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Schwarz, Konrad
> -Original Message-
> Expressly making it defined that
>   alias foo='whatever \ '
> which does end in a space (but otherwise is the exact same thing as the 
> previous one) also does not expand aliases in the following
> word
> seems redundant to me.   Since several shells (but not all) do expand
> aliases in this case, it seems to me the best thing to do is to leave this as 
> unspecified, such that no-one sane will ever use it (if
> something is needed, just use the previous form -- but better is not to use 
> aliases at all.)

Coming from ksh, I've always understood the alias mechanism to work at the 
lexical level (macro expansion with rescanning); the quoting behavior above is 
the most natural in that context.

I think it would reduce confusion if it were explicitly mandated.



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Joerg Schilling
Robert Elz  wrote:

> Date:Tue, 8 Jan 2019 23:01:05 +
> From:Stephane Chazelas 
> Message-ID:  <20190108230105.43xiupnfx4qwy...@chaz.gmail.com>
>
>   | aliases come from csh which did not do that expansion after
>   | trailing blank thing.
>
> Actually from ashell, thje precursor of csh - that I knew, though csh
> aliases were quite a different thing, closer in some respects to sh
> functions than sh aliases (but like much of csh, it really all was a
> bit of a mess.)

Thank you for this hint, so aliases are from late 1977 already.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Joerg Schilling
Robert Elz  wrote:

> Date:Tue, 8 Jan 2019 16:51:04 +
> From:Geoff Clare 
> Message-ID:  <20190108165104.GA31969@lt2.masqnet>
>
>
>   | Given Chet's reply, it looks like there may be more shells that do expand
>   | than don't.  In which case I wonder why that "unquoted" text got added
>   | in 2016.
>
> I don't know  the history of aliases in sh (Joerg?) it may be that they 

Well, the original Bourne Shell did not have aliases.

I believe that csh introduced an alias concept in 1979 that works completely
different from what ksh implemented much later.

The difference, I believe is that alias expansion happens at a different 
location in csh.

Then in 1980, former AT people that created the company "Charles River Data
Systems" and the first UNIX clone "UNOS" created an alias implementation 
concept that sits in the lexer and expands text. This is the most powerful
alias concept that has been implemented for expansion in the lexer.

I reimplemented that concept in 1984 and added it to my (non Bourne based) 
shell from that time.

ksh88 came up with a similar concept in the lexer.

What in use at that time, but what ksh88 invented in this context is:

-   automated termination of alias expansion for a specific alias if this 
alias already has been expanded from the same source "word".

-   Aliases that end in a space do not switch off alias expansion for
further words in the command.

In Summer 2012, I added my alias implementation from 1984 to the Bourne Shell 
and I did this without first looking at the ksh source.

Then I discovered that ksh has the two additional features mentioned above and 
implemented them. After that, I did some testing and did see that my alias 
implementation is more powerful than the one in ksh but compatible to what ksh
does - except for the "\ " problem that I have not been aware of.

For a descrtiption see:

http://schillix.sourceforge.net/man/man1/bosh.1.html

Section "Aliases" that currently start at page 7, the "alias" command currently 
starting at page 36 and the "unalias" command currently starting at page 68.

Given that my implementation has been done without looking at ksh, it seems 
that what bosh and ksh do very similar is the "natural behavor" of such an 
implementation.

> originally appeared back when any quoted character "looked different"
> internally to the same character, unquoted, and that the test was just
>   ch == ' '
> and not
>   (ch & ~SQUOTE)  == ' ' 
>
> so only unquoted spaces worked..   And then that behaviour was retained
> in derived shells, even after the quoting encoding method was altered,
> and that those shells were the ones mostly considered when the text was
> written.

In former times, this was the case. Now the lexer uses a mix of wide characters 
(where a quoted char is still a char with the top bit on) and multi byte chars 
(where "\ " is propagated the way it was read).

Since the location in the lexer that deals with aliases does not deal with 
single characters (except for the peek()ed char that was seen after the word 
that is going to be alias expanded), but rather uses words in strings, this is 
a place that uses "multi byte chars".

For this reason, it it natural not to implement a special meaning for "\ ".
ksh88 and ksh93 seem to be the only shells that implement a special meaning for 
"\ " here.

> Treated literally the quoted words above would mean that if we have
>
>   alias foo=bar
>
> and the input is
>
>   foo 1 2 3
>
> then the "foo" being in a command word position, and also a defined alias
> would simply be replaced by the word "bar" and we'd be done.
>
> But that would mean that in the example in the alias page in XCU 4,
> where
>   alias lf='ls -CF'
> if the input is
>
>   lf .
>
> the replacement test would be, effectively
>
>   "ls -cF" .
>
> (the quotes would not be there, but the ls -CF
> part would be a single 6 character word) and that
> would be the command word of the generated command.

This does not happen as the lexer is called again and creates two word tokens 
from the alias replacement.

> In the example in question here, the original text is
>
>   3>&1 command
>
> and we have "alias 3=4"
>
> "3" is a valid alias name (alias names are not required to start
> with an alpha) and when the tokeniser is run there, we are starting
> in the state where we are at the command word position (you have
> to assume this here from the context, but take it as a n axiom
> for this example).   There the first token produced by the lexer
> is the IO_NUMBER "3", which is a word according to XBD 3.446,
> and thus, according to the (current and proposed) spec for alias
> processing, should be subject to alias replacement.

"3" is not a word that is in a position of a potential command name.

If the lexer did parse the input in a way that does not connect "3" to the IO 
redirection, it would be alias expanded, since the knowledge about 

[1003.1(2013)/Issue7+TC1 0000953]: Alias expansion is under-specified

2019-01-09 Thread Austin Group Bug Tracker


A NOTE has been added to this issue. 
== 
http://austingroupbugs.net/view.php?id=953 
== 
Reported By:wpollock
Assigned To:ajosey
== 
Project:1003.1(2013)/Issue7+TC1
Issue ID:   953
Category:   Shell and Utilities
Type:   Clarification Requested
Severity:   Objection
Priority:   normal
Status: Interpretation Required
Name:   Wayne Pollock 
Organization:
User Reference:  
Section:2.3.1 Alias Substitution 
Page Number:2322 
Line Number:73690-73705 
Interp Status:  Pending 
Final Accepted Text:See http://austingroupbugs.net/view.php?id=953#c3113

== 
Date Submitted: 2015-06-04 00:22 UTC
Last Modified:  2019-01-09 12:29 UTC
== 
Summary:Alias expansion is under-specified
==
Relationships   ID  Summary
--
related to  736 grammatically accept zero or more Shell...
related to  0001048 deprecate alias and unalias
related to  0001055 unspecified how much is parsed before e...
== 

-- 
 (0004201) geoffclare (manager) - 2019-01-09 12:29
 http://austingroupbugs.net/view.php?id=953#c4201 
-- 
This is a proposed new resolution which addresses comments made since
http://austingroupbugs.net/view.php?id=953#c3113 both here and on the mailing
list.  There have been a
lot of comments, so if I missed anything please reply on the
mailing list and (if I agree) I will edit this note.

All page and line numbers are for the 2016 and 2018 editions.

On page 2348 line 74794-74805 (XCU 2.3.1 Alias Substitution),
change:After a token has been delimited, but before applying
the grammatical rules in Section 2.10, a resulting word that is identified
to be the command name word of a simple command shall be examined to
determine whether it is an unquoted, valid alias name.  However, reserved
words in correct grammatical context shall not be candidates for alias
substitution.  A valid alias name (see XBD Section 3.10) shall be one that
has been defined by the alias utility and not subsequently undefined
using unalias.  Implementations also may provide predefined valid
aliases that are in effect when the shell is invoked. To prevent infinite
loops in recursive aliasing, if the shell is not currently processing an
alias of the same name, the word shall be replaced by the value of the
alias; otherwise, it shall not be replaced.

If the value of the alias replacing the word ends in a , the shell
shall check the next command word for alias substitution; this process
shall continue until a word is found that is not a valid alias or an alias
value does not end in a .to:After a TOKEN
has been delimited, including (recursively) any token resulting from an
alias substitution, the TOKEN shall be subject to alias substitution if:
 the TOKEN does not contain any quoting characters, the
TOKEN is a valid alias name (see XBD Section 3.10), an alias with
that name is in effect, the TOKEN did not result from an alias
substitition of the same alias name at any earlier recursion level,
the TOKEN is not recognized as a reserved word (see [xref to 2.4
Reserved Words] and the examples in [xref to XRAT C.2.3.1]), and
the TOKEN will be parsed as the command name word of a simple command
when the grammatical rules in Section 2.10 are applied.  An
implementation may defer the effect of a change to an alias but the change
shall take effect no later than the completion of the currently executing
complete_command (see [xref to XCU 2.10 Shell Grammar]).  Changes to
aliases shall not take effect out of order.  Implementations may provide
predefined aliases that are in effect when the shell is invoked.

If the value of the alias is not a simple command (see [xref to 2.9.1]), or
contains any of:  a comment a variable assignment
a redirection unbalanced single-quotes or double-quotes
(except within a command substitution), the behavior is unspecified. When a
TOKEN is subject to alias substitution, the value of the alias shall be
processed to form tokens (see [xref to 2.3]) and the resulting tokens shall
replace the TOKEN.

If the value of the alias 

Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Robert Elz
Date:Wed, 9 Jan 2019 10:44:00 + (UTC)
From:Shware Systems 
Message-ID:  <886936614.8618146.1547030640...@mail.yahoo.com>

  | Alias bodies may include entire or partial compound statements, expansions,
  | redirections, and unclosed strings of the <">, <'>, or <$'> sort, that 
depend on
  | or can be modified by context after the alias when that is appended to the 
alias body;

They can (they can contain anything) - though I find it hard to imagine how
what comes after the alias can modify anything that the alias generated (unless
you mean token combination - the turning of '&' that ends an alias into '&&' 
when the delimiter character of the alias had been another '&'.

But...

  | the addition covers the latter cases too, not <\ > only.

no, it doesn't, as all those other cases have been made explicitly
unspecified, and it would be bizarre for the standard to specify what
should happen in a case where the results are already unspecified.

This change was made in the proposed resolution of issue 953 almost
3 years ago now, and is one part of that proposed resolution which I
do not believe that anyone disputes.

  |  Your example requires a second <'> follow the alias name in all scripts
  | using it in that following context, because it introduces an unclosed 
string.

No it didn't, you did not look carefully enough.   I am only giving examples
of uses which will remain specified in the new text - to do otherwise would
be foolish (even for me).

  |  There is no closing quote presumed at the end of an alias body,

No, but there was one explicitly there.

  | and no implementation precludes a <;> that effectively terminates the
  | command the alias represents.

Sorry, I have no idea what that means, or how it is relevant to the
current discussion - perhaps you could give an example?

  | It isn't a question of whether it's sane or not to use aliases this way, 
it's
  | what is actually permitted by implementations

No, what matters (here) is what is specified to work by the standard.
Anything that results in unspecified behaviour we can leave for the
inplementations to work out for themselves.

  | when the text of the recursively expanded alias body has the
  | following text appended to it, whether the following text comes from
  | lookahead tokens or the original source line.

Again, I am lost trying to determine the relevance of that part.

And this is from your included copy of my message (with most
of it removed)

  | On Wednesday, January 9, 2019 Robert Elz  wrote:

  | next word be subject to alias expansion, all that is needed is to define it
  | like
  |   alias foo="whatever ' '"
  | and then the last character is not a space (it is a single quote)

I know it is hard to read, but the following text actually said what was
there, the alias value is the word "whatever" followed by an unquoted
space, followed by the single quoted string ' ' (with both opening and
closing quotes present).  That's what causes "he last character is not a
space" as stated ... it is a single quote.

If that terminating quote had not been there, and the alias value had
been just

whatever ' 

(there is a space after that single quote, but nothing else) then it would
not be possible to correctly parse that as a simple command, as it
cotains an unterminated string (ie: syntax error), and consequently,
the behaviour would be unspecified.

This allows implementations to handle it however they like - which
is what users who attempt this kind of thing need to deal with, as
different shells process this kind of thing differently.

If you haven't done so recently, you should go and review the proposed
resolution of issue 953, so you understand the constraints.

kre



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Robert Elz
Date:Wed, 9 Jan 2019 10:05:21 +
From:Geoff Clare 
Message-ID:  <20190109100521.GC690@lt2.masqnet>

  | It's not obvious to me.  Alias lookup has already been done for the
  | word in that position in the input and there is nothing to suggest
  | the shell has to go back and repeat it for the replacement word.

There is also nothing to suggest that it should not - the lexer has very
little world view, all it has is its input stream of characters, and some
idea whether or not it is at a command word position - that it has previously
looked up an alias for the current position is not something it will
necessarily know, after all, that word has now been deleted.

When it tokenises the value of the alias, nothing has changed in the
state of the world, other than that we are "processing an alias" for the
word that was there before (in the old way of expressing it).   The only
difference to what was done the previous time, is that we no longer
expand that particular alias (or any others we also happen to be
currently processing.)

If the wording in the standard isn't making this clear, then it needs
fixing so that it does, as we are after all, documenting what shells
actually do, and this is something that they all do.

kre



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Geoff Clare
Robert Elz  wrote, on 09 Jan 2019:
>
>   | This surprised me.  I was previously unaware that the first word in
>   | the alias value is subject to recursive alias expansion.  There is
>   | nothing in the standard to suggest this happens!
> 
> There certainly isn't in the current (published) text, which is part of
> what is wrong with it (but really that's just a part..)
[...]
> The proposed new wording from 953 does not have this problem,
> as it is clear that the alias value is subject to tokenisation, and
> when that happens, the first token (which because of the restrictions
> we're placing on the value of the alias) must in a conformant script
> be a word, will still be in the "command name position' (we are
> yet to return anything to the grammar which could change that) and
> so is "obviously" subject to alias lookup.

It's not obvious to me.  Alias lookup has already been done for the
word in that position in the input and there is nothing to suggest
the shell has to go back and repeat it for the replacement word.

>   | If this is because IO_NUMBER is not expanded, this change would make no
>   | difference to the behaviour required by the standard (because we're saying
>   | the behaviour is unspecified if the alias value contains a redirection).
> 
> No, the IO_NUMBER of concern is not in the alias value (string), it is in the
> original text.

Okay, thanks for clarifying.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Geoff Clare
Chet Ramey  wrote, on 08 Jan 2019:
>
> On 1/8/19 11:51 AM, Geoff Clare wrote:
> > Robert Elz  wrote, on 08 Jan 2019:
> >>
> >>   | I would prefer that we not leave it unspecified when an alias ends 
> >> with "\ ".
> >>   | If there is a shell which does recursive alias expansion in this case, 
> >> we
> >>   | should ask the authors/maintainers whether they are willing to change 
> >> it
> >>   | to behave like other shells.
> >>
> >> I don't know of any, but I haven't tested them either ...
> > 
> > Given Chet's reply, it looks like there may be more shells that do expand
> > than don't.  In which case I wonder why that "unquoted" text got added
> > in 2016.
> 
> Based on the comments in issue 953, it happened on a phone conference,
> so there's likely no record unless the etherpad still happens to exist.

The relevant etherpad does still exist, but as far as I can see
doesn't provide an answer to why this was added.  It has a list of
issues to address, which includes:

 * DONE should we add "unquoted" to "ends in a " so that it
   becomes "ends in an unquoted  (after substitution)"? 

The "DONE" shows that is was discussed and a decision made, but the
etherpad doesn't record the reasons for the decision.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Alias implementations being invalidated by proposed new wording?

2019-01-09 Thread Geoff Clare
Shware Systems  wrote, on 09 Jan 2019:
>
> On Tuesday, January 8, 2019 Robert Elz  wrote:
> 
>> ps: (and this bit might be relevant to the discussions) - it js really
>> hard to imagine a use for an alias with a definition that ends "\ "
>> (the only way to get a quoted space as the final char in what is
>> to be the specified cases) so in practice I don't think it matters
>> at all what decision is made about that one.
>
> Yes, the uses that were discussed are corner cases, but the consensus
> was, and pretty strongly, not having it would lead to data loss
> with some operators so the change was added. I don't remember which
> operators were problematic at this point. This affects built-in
> aliases more than ones defined in a script, as an end user may invoke
> these without realizing it if the alias name is the same as a common
> utility.

This makes no sense.  The meeting decided to make the behaviour
unspecified if an alias value contains an operator, so there would
be no reason to require a feature that is only useful when that is
the case.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England