Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-14 Thread Robert Elz via austin-group-l at The Open Group
Date:Tue, 13 Apr 2021 10:16:26 +0100
From:Harald van Dijk 
Message-ID:  <7ab68758-b423-ae1b-4451-cd02c4b6b...@gigawatt.nl>

I think we have probably largely converged about this issue (with
Chet's assistance) so this will probably be my last message about it.

  | ...your hypothetical example is not one of them, IMO. There is nothing 
  | reasonable about saying that an activity that continues throughout the 
  | execution of the script counts as start-up activity.

Of course it could be, a start-up activity is an activity that commences
at start-up time, how long it takes to complete is not specified.   If the
text intended to say "might be added at shell initialisation" it could have
said that.

What matters here is that we have a hash table, that is effectively a cache,
which means things come, and go, more or less unpredictably (not so if you
look at the details of the implementation, but to an outside observer).
Eg: one possible thing would be for an implementation scanning PATH, to
open and read each directory in turn, stopping as soon as it finds the
command sought (appropriate perms etc) - and I mean, immediately (not
reading the rest of the directory currently being scanned) adding all
('x' permission allowed) regular files located during that scan to the
cache / hash-table.   That is, what gets added can depend upon the order
of files located in that final directory, which all depends upon how
the directory was created and how the implementation manages them.

Do remember that the standard is not legislation, it is an attempt to
specify what the implementations actually do, so readers know what they
can rely upon.

In this case, how the has table works is largely unknowable from outside
the implementation, so the standard is very wishy-washy about how things
get added, and what can be expected to be there or not be there, while
simultaneously alerting users to its likely presence, and the need to
deal with its ramifications in the odd case that something may change
which would invalidate cached data without the shell being aware that
has happened.

There is nothing better that is really possible - "the result of an
unspecified start-up activity" is just a way of saying "things might
happen that cannot be explained by anything else in here, deal with it".

  | Your interpretation 
  | would mean *all* activity can be considered start-up activity just by 
  | virtue of being performed on a different thread where that different 
  | thread was launched at shell startup,

Yes.   But the only thing that gets to vary here is what is in or not
in the cache, nothing else depends upon this.

  | and then of course as that is just 
  | an implementation detail invisible to a user of the shell, it doesn't 
  | actually have to be performed on a different thread.

Of course.

  | If that were the intent, why would the standard say "shell start-up
  | activity" in the first place?

I wasn't there when it was written, but I'd guess they were mostly
thinking of pre-seeding of the hash table from PATH ... but were wise
enough to realise that can happen any time PATH is changed, or any time
the shell notices that the mod time of any directory in PATH has changed,
or for almost any other reason, and so wrote it in words that were not
highly specific, to allow for all of these kinds of variations, and more.

Most of the rest of the issues in this e-mail are now resolved, I believe.

There are just two outcomes that I need to be clear about:

1) it is not wrong for a shell to continue to scan PATH when exec
fails (other than ENOEXEC) and exec some later file with the same
name which does succeed.   That's the original algorithm, it is
what the standard is trying to convey.

An extra point on this though is that I also have no real problem
with shells that decide to stop on the first file with 'x' permission
found in PATH, whether the exec succeeds or not.   The times when
that ever makes a difference in real world environments are so rare
to be irrelevant, and no-one should ever be depending on something
like that working (inserting a #!/bad "gcc" somewhere early on path,
and assuming it will be ignored).As Mark said (the one thing he
got correct) is that it is conforming for a shell to treat that as
a shell script containing only a comment (the #! there makes it
unspecified, which means the shell can treat it that way) on systems
that don't support #! executables.

Just don't demand that shells act that way.

2) It is impossible (given current interfaces) for any command to ever
correctly predict what will be executed from a PATH search, and be
100% accurate.   Specifying "command -v" or "type" or anything else
similar in a way which pretends that is possible is a mistake.  The only
way to know for sure is to attempt to execute it, and if the command
happens to be "halt" and you happen to be root at the time, you probably
would not like the results if all you were attempting to do 

Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-13 Thread Harald van Dijk via austin-group-l at The Open Group

On 13/04/2021 14:43, Chet Ramey wrote:
On 4/13/21 5:16 AM, Harald van Dijk via austin-group-l at The Open Group 
wrote:


Please note again that POSIX's Command Search and Execution doesn't 
say "continue until execve() doesn't fail". It says "Otherwise, the 
command shall be searched for using the PATH environment variable as 
described in XBD Environment Variables", and then what happens to the 
result of that search. It very clearly separates the search from the 
attempt to execute. 


The complicating factor is POSIX's definition of "executable file."

You search "until an executable file with the specified name and
appropriate execution permissions is found."

An executable file is a "regular file acceptable as a new process image
file by the equivalent of the exec family of functions."


Good spot. We know that it is intended for command search and execution 
to also find shell scripts that are then interpreted by the shell, shell 
scripts that execve() cannot process, and all shells implement this as 
intended rather than as specified. This must be a defect in the current 
wording.


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-13 Thread Chet Ramey via austin-group-l at The Open Group

On 4/13/21 5:16 AM, Harald van Dijk via austin-group-l at The Open Group wrote:

Please note again that POSIX's Command Search and Execution doesn't say 
"continue until execve() doesn't fail". It says "Otherwise, the command 
shall be searched for using the PATH environment variable as described in 
XBD Environment Variables", and then what happens to the result of that 
search. It very clearly separates the search from the attempt to execute. 


The complicating factor is POSIX's definition of "executable file."

You search "until an executable file with the specified name and
appropriate execution permissions is found."

An executable file is a "regular file acceptable as a new process image
file by the equivalent of the exec family of functions."

And the only way to determine that is by trying to execute it using one
of "the exec family of functions."

That said, this is the most marginal of corner cases, notwithstanding that
bash has a distinct option to handle it (disabled by default).

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-13 Thread Harald van Dijk via austin-group-l at The Open Group

On 12/04/2021 21:57, Robert Elz wrote:

 Date:Mon, 12 Apr 2021 18:42:03 +0100
 From:Harald van Dijk 
 Message-ID:  

   | No, not anything. It still has to be shell start-up activity.

And your definition of what is a shell start-up activity comes from  ?


When words don't have a specific definition, we have to assume they are 
used in their ordinary English meaning. That may in some cases not 
always be clear and unambiguous. I can imagine some examples where some 
people would reasonably say it counts as start-up activity, and others 
would reasonably say it doesn't. That said...



   | The starting a thread would be shell start-up activity. The actions
   | performed on that thread while some other thread is running the script
   | clearly aren't.

Nonsense.  They are the result of the start-up activity.  Nothing defines
how long that is allowed to take to complete, does it?   Nothing specifies
what else is allowed to execute in parallel, does it?


...your hypothetical example is not one of them, IMO. There is nothing 
reasonable about saying that an activity that continues throughout the 
execution of the script counts as start-up activity. Your interpretation 
would mean *all* activity can be considered start-up activity just by 
virtue of being performed on a different thread where that different 
thread was launched at shell startup, and then of course as that is just 
an implementation detail invisible to a user of the shell, it doesn't 
actually have to be performed on a different thread. If that were the 
intent, why would the standard say "shell start-up activity" in the 
first place?



   | It cannot do this either, parsing the whole script in advance is not
   | only not allowed

Of course it is allowed.

   | (it would break the use of aliases defined in the script, at the least)

Apart from that being (IMO) a good thing, who cares?   We're building a
bash table, if a few entries are added (words that turn out to be aliases)
that aren't needed, no problem.  If some get missed, either from alias
expansions or the results of other expansions, then big deal, those can
be looked up and added later.


In the context of a shell, parsing a script already has a clear and 
unambiguous meaning, and it seems you're using it to mean something 
else. I take no responsibility for the miscommunication that happened 
because of that.


Consider the case where the shell is reading from standard input, 
standard input is a pipe, and the first line of standard input is 
'exit'. With your idea, the shell cannot avoid reading beyond that first 
line, and cannot restore the state to pretend it had not read beyond 
that first line.


I am aware that many shells, including mine, do already read beyond that 
first line. I consider that a bug in my shell.



   | I am not convinced that that is the intent at all.

Recall, once upon a time there was no hash table, the code simply
walked $PATH trying exec until either it succeeded, or $PATH ran out.

The hash table was an invention to speed that up, not change the
algorithm.The results should be unchanged.

The language in the standard might not be written as clearly as
we might like, but it is attempting to describe that (aside from
the hash command, it should probably say nothing about the hash
table, or remembering results from previous scans of PATH at all).


Please note again that POSIX's Command Search and Execution doesn't say 
"continue until execve() doesn't fail". It says "Otherwise, the command 
shall be searched for using the PATH environment variable as described 
in XBD Environment Variables", and then what happens to the result of 
that search. It very clearly separates the search from the attempt to 
execute. If shells combine the two in a way that changes the outcome, 
that is a conformance bug in those shells. If what POSIX says does not 
match historical practice, that is a separate issue, that should be 
explained in the rationale at the very least.



   | This code is shared between ordinary command execution and the exec
   | builtin.

True, but that's irrelevant.   When there is a command for the exec
builtin (that is, when it isn't being used just to make redirections
persist in the shell, etc) it behaves identically to command execution,
just skipping the alias/function/builtin tests.   There is no difference
at all in how the PATH search happens.   When exec was used the shell
doesn't actually fork, it just jumps directly to the child side of the
normally post-fork code, so the code fragment I should would really
start something like

if (exec_was_used || fork() == 0) { 

(and of course, the real code would be saving the result from fork()
for the parent code, but we're not concerned with that stuff here).


We were talking about dash internals. In dash, the ordinary command 
execution and the exec builtin do not do lookups the same way.


And as Jörg noticed elsethread, in ksh they do not 

Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Robert Elz via austin-group-l at The Open Group
Date:Mon, 12 Apr 2021 18:42:03 +0100
From:Harald van Dijk 
Message-ID:  

  | No, not anything. It still has to be shell start-up activity.

And your definition of what is a shell start-up activity comes from  ?

  | The starting a thread would be shell start-up activity. The actions 
  | performed on that thread while some other thread is running the script 
  | clearly aren't.

Nonsense.  They are the result of the start-up activity.  Nothing defines
how long that is allowed to take to complete, does it?   Nothing specifies
what else is allowed to execute in parallel, does it?

  | It cannot do this either, parsing the whole script in advance is not 
  | only not allowed

Of course it is allowed.

  | (it would break the use of aliases defined in the script, at the least)

Apart from that being (IMO) a good thing, who cares?   We're building a
bash table, if a few entries are added (words that turn out to be aliases)
that aren't needed, no problem.  If some get missed, either from alias
expansions or the results of other expansions, then big deal, those can
be looked up and added later.

  | I am not convinced that that is the intent at all.

Recall, once upon a time there was no hash table, the code simply
walked $PATH trying exec until either it succeeded, or $PATH ran out.

The hash table was an invention to speed that up, not change the
algorithm.The results should be unchanged.

The language in the standard might not be written as clearly as
we might like, but it is attempting to describe that (aside from
the hash command, it should probably say nothing about the hash
table, or remembering results from previous scans of PATH at all).

  | This code is shared between ordinary command execution and the exec 
  | builtin.

True, but that's irrelevant.   When there is a command for the exec
builtin (that is, when it isn't being used just to make redirections
persist in the shell, etc) it behaves identically to command execution,
just skipping the alias/function/builtin tests.   There is no difference
at all in how the PATH search happens.   When exec was used the shell
doesn't actually fork, it just jumps directly to the child side of the
normally post-fork code, so the code fragment I should would really
start something like

if (exec_was_used || fork() == 0) { 

(and of course, the real code would be saving the result from fork()
for the parent code, but we're not concerned with that stuff here).

  | The former needs no second PATH lookup if the hashing was done 
  | correctly, the latter does: no hashing happens for exec, as it would be 
  | useless.

You mean adding the results to the hash table would be useless - true, but
exec still uses the results of earlier lookups in the hash table, and in the
implementation it can be easier to just use the same code path for everything,
the cost of making a meaningless hash table entry is nothing, whereas
having a different search strategy complicates the code, and that is
a real cost (maintenance cost).

Incidentally, I just ran other test, for fun.  I thought this one would
be useless and reveal nothing, and it almost did, but not quite.

This is /tmp/exec-test (between the <><> lines).

<><><><><><><><><>

rm -fr /tmp/P1 /tmp/P2

mkdir /tmp/P1 /tmp/P2

for d in P1 P2
do
cat <<-EOF >/tmp/$d/cmd
#! /bin/sh

printf '%s\n' 'executing $d/cmd'
EOF

chmod a+x /tmp/$d/cmd
done

PATH=/tmp/P1:/tmp/P2; export PATH
hash cmd
/bin/rm -fr /tmp/P1
exec cmd

<><><><><><><><><>

I then ran:

for shell in fbsh sh bash bosh dash yash ksh ksh93 mksh; do
printf "$shell: "
$shell /tmp/exec-test
done

In that, sh is nbsh (an oldish version, but for this it makes no difference)
and ksh is a version of pdksh, ksh93 is 93u.

The results.

fbsh: executing P2/cmd
sh: executing P2/cmd
bash: /tmp/exec-test: line 20: /tmp/P1/cmd: No such file or directory
bosh: executing P2/cmd
dash: executing P2/cmd
yash: executing P2/cmd
ksh: executing P2/cmd
ksh93: executing P2/cmd
mksh: executing P2/cmd

I am astounded.   But if you didn't think "exec cmd" would use the
hash table, there's absolute proof that at least one does (my guess is
that most, if not all, of the others do as well, they just do it correctly).

After that, I replaced the final line of the script ("exec cmd") with
"command -v cmd" and ran the loop again:

fbsh: /tmp/P1/cmd
sh: /tmp/P1/cmd
bash: /tmp/P1/cmd
bosh: /tmp/P1/cmd
dash: /tmp/P1/cmd
yash: /tmp/P2/cmd
ksh: /tmp/P2/cmd
ksh93: /tmp/P1/cmd
mksh: /tmp/P2/cmd

which I think illustrates something as well.


  | The code looks like the simplest way to shoehorn both into a 
  | single function based on the assumption that the first execve() in the 
  | ordinary command execution case would not fail. The fact that it can 
  | fail tells us nothing about what was intended to happen in such a case.

Sorry, I have no idea what 

Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Harald van Dijk via austin-group-l at The Open Group

On 12/04/2021 20:22, (Joerg Schilling) wrote:

Harald van Dijk  wrote:


On 12/04/2021 12:47, (Joerg Schilling) wrote:

Do you have a private variant og ksh93v?

I get the same behavior from  ksh88, the ksh93 from OpenSolaris and ksh93v.


I don't. I was testing with ksh built from
. I will try to figure
out why I am getting different results from you.


OK, it depends on usage, so you may have tested the "wrong" way:

ksh93 -c 'PATH=/tmp/:$PATH; gcc'
gcc: no input files

ksh93 -ic 'PATH=/tmp/:$PATH; gcc'
ksh93: gcc: not found [Datei oder Verzeichnis nicht gefunden]

ksh93 -c 'PATH=/tmp/:$PATH; gcc; hash'
ksh93[1]: gcc: not found [Datei oder Verzeichnis nicht gefunden]
gcc=/tmp/gcc

ksh93 -c 'PATH=/tmp/:$PATH; gcc; echo'
ksh93[1]: gcc: not found [Datei oder Verzeichnis nicht gefunden]

this looks strange...


Ah, good catch, thanks. Multiple shells have an optimisation where a 
command invocation at the last step, when no traps are active, is 
implicitly performed by exec without a fork, as if the exec builtin had 
been used. ksh apparently has a mismatch between the command lookup in 
the two cases.


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Oğuz via austin-group-l at The Open Group
12 Nisan 2021 Pazartesi tarihinde David A. Wheeler via austin-group-l at
The Open Group  yazdı:

>
> > On Apr 12, 2021, at 1:51 PM, Oğuz  wrote:
> > Taking "always double-quote your dollar variables", "eval is evil, avoid
> it", etc. as "the rule" is cargo cult programming. Average programmer's
> incompetence doesn't make the shell broken or unsafe or anything like that
> and doesn't justify parroting nonsensical advice like those.
>
> Double-quoting is VERY VERY good advice, which is why it’s so widely
> recommended & often required. For another example, Google requires it for
> their code  and
> Googlers are not stupid. Half of all programmers are BELOW average, and if
> your code lives over time, your code is likely to be maintained by them. In
> addition, even top software developers make mistakes. Assuming that “I
> cannot ever make a mistake” borders on arrogance; everyone has a bad day.
>
> “Cargo cult programming” means you do something without understanding the
> reasons for it. But in this case, we know EXACTLY why it’s done, and there
> are good reasons for it, so no cargo cult is present. You may think you
> can’t ever make mistakes, so double-quoting is not needed, but I frankly
> don’t believe you.
>
> It is wise to write code in a way that *assumes* that humans make
> mistakes, and reduce (1) the likelihood of mistakes and (2) consequences of
> those mistakes. If it doesn’t matter if your code is correct, then sure,
> don’t bother. If it *matters* that the code is correct, then take steps to
> increase that likelihood.
>
> BUT: this seems far afield of what a standards body (especially this
> group) normally does,


You started it by advertising crap software.


> There’s already “command -v COMMAND”, which is already in POSIX and
> returns true if it can find an executable COMMAND (and described in the
> spec). It may not have *exactly* the semantics the requestor wanted, but in
> *practice* I think it works very well for typical use cases. Why would
> something more exotic need to be standardized? I haven’t seen *why* it
> would matter.


No, `command' does not check if it's executable. Besides, given an external
utility name, the pathname it prints and the pathname the shell uses to
invoke that utility may not be the same on all shells (see earlier
replies). Hence the request, I guess.

Why rely on the shell and not the operating system's package manager is a
question I don't have an answer to.


>
> --- David A. Wheeler
>
>
>

-- 
Oğuz


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Joerg Schilling via austin-group-l at The Open Group
Harald van Dijk  wrote:

> On 12/04/2021 12:47, (Joerg Schilling) wrote:
> > Do you have a private variant og ksh93v?
> > 
> > I get the same behavior from  ksh88, the ksh93 from OpenSolaris and ksh93v.
> 
> I don't. I was testing with ksh built from 
> . I will try to figure 
> out why I am getting different results from you.

OK, it depends on usage, so you may have tested the "wrong" way:

ksh93 -c 'PATH=/tmp/:$PATH; gcc'
gcc: no input files

ksh93 -ic 'PATH=/tmp/:$PATH; gcc'   
ksh93: gcc: not found [Datei oder Verzeichnis nicht gefunden]

ksh93 -c 'PATH=/tmp/:$PATH; gcc; hash'
ksh93[1]: gcc: not found [Datei oder Verzeichnis nicht gefunden]
gcc=/tmp/gcc

ksh93 -c 'PATH=/tmp/:$PATH; gcc; echo'   
ksh93[1]: gcc: not found [Datei oder Verzeichnis nicht gefunden]

this looks strange...

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread David A. Wheeler via austin-group-l at The Open Group


> On Apr 12, 2021, at 1:51 PM, Oğuz  wrote:
> Taking "always double-quote your dollar variables", "eval is evil, avoid it", 
> etc. as "the rule" is cargo cult programming. Average programmer's 
> incompetence doesn't make the shell broken or unsafe or anything like that 
> and doesn't justify parroting nonsensical advice like those.

Double-quoting is VERY VERY good advice, which is why it’s so widely 
recommended & often required. For another example, Google requires it for their 
code  and Googlers are not 
stupid. Half of all programmers are BELOW average, and if your code lives over 
time, your code is likely to be maintained by them. In addition, even top 
software developers make mistakes. Assuming that “I cannot ever make a mistake” 
borders on arrogance; everyone has a bad day.

“Cargo cult programming” means you do something without understanding the 
reasons for it. But in this case, we know EXACTLY why it’s done, and there are 
good reasons for it, so no cargo cult is present. You may think you can’t ever 
make mistakes, so double-quoting is not needed, but I frankly don’t believe you.

It is wise to write code in a way that *assumes* that humans make mistakes, and 
reduce (1) the likelihood of mistakes and (2) consequences of those mistakes. 
If it doesn’t matter if your code is correct, then sure, don’t bother. If it 
*matters* that the code is correct, then take steps to increase that likelihood.

BUT: this seems far afield of what a standards body (especially this group) 
normally does, so I’ll get back to the “command -v and friends discussion”.

There’s already “command -v COMMAND”, which is already in POSIX and returns 
true if it can find an executable COMMAND (and described in the spec). It may 
not have *exactly* the semantics the requestor wanted, but in *practice* I 
think it works very well for typical use cases. Why would something more exotic 
need to be standardized? I haven’t seen *why* it would matter.

--- David A. Wheeler




Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Chet Ramey via austin-group-l at The Open Group

On 4/12/21 12:05 PM, Robert Elz via austin-group-l at The Open Group wrote:


Anything that the system can run, no matter how it does that, is acceptable.

If a system noticed a VAX format a.out, it could load a vax simulator, and
run the binary that way, without the user even noticing.  If it wanted.


You just described basically how macOS runs Intel binaries on M1 hardware,
and how Intel hardware ran PowerPC binaries before that. No mystery here.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Oğuz via austin-group-l at The Open Group
12 Nisan 2021 Pazartesi tarihinde David A. Wheeler 
yazdı:

>
>
> On Apr 12, 2021, at 10:57 AM, Oğuz  wrote:
> 12 Nisan 2021 Pazartesi tarihinde David A. Wheeler via austin-group-l at
> The Open Group  yazdı:
>>
>> If you want a robust shell script, I recommend that you try out the tool
>> “shellcheck”.
>> That checks a shell script against a set of recommended practices (e.g.,
>> use “$variable” not $variable).
>>
>
> If it makes that suggestion no matter what context `$variable' is used in,
> I don't see how it'll help make a shell script "robust”.
>
>
> It’s very common advice to recommend using double-quotes on variable
> expansions unless you
> have a *good* reason to do otherwise in shell scripts, because it prevents
> word splitting
> on a variable reference, and in most cases you do NOT want the word
> splitting. Examples:
> * For example, the "Advanced Bash-Scripting Guide” says,
>   “When referencing a variable, it is generally advisable to enclose its
> name in double quotes.”
>   https://tldp.org/LDP/abs/html/quotingvar.html
> * The “Quotes” page says:
>   "When in doubt, double-quote every expansion in your shell commands.”
>   https://mywiki.wooledge.org/Quotes#I.27m_Too_Lazy_to_Read.
> 2C_Just_Tell_Me_What_to_Do
> * https://www.tecmint.com/useful-tips-for-writing-bash-scripts-in-linux/
> * https://levelup.gitconnected.com/9-tips-for-writing-safer-
> shell-scripts-b0c185da9bae#729e
>

Whether it is a common advice or not doesn't mean anything, people say
stupid things on the internet all the time.


> Yes, there are rare cases where you *do* want word-splitting. In those
> cases, omit the double-quotes,
> and shellcheck lets you disable checks in specific uses when you actually
> *did* want this.
>

Huh, now I see what those `SC12345' comments in terribly written, barely
working shell scripts mean.

Following the rule “always double-quote unless you have special reasons”
> means that you don’t have to do
> program-wide analysis to think about word-splitting in most variable
> references.
> As a result, it tends to make creating *reliable* scripts easier.
>

I don't need a tool for reminding that to me.

Taking "always double-quote your dollar variables", "eval is evil, avoid
it", etc. as "the rule" is cargo cult programming. Average programmer's
incompetence doesn't make the shell broken or unsafe or anything like that
and doesn't justify parroting nonsensical advice like those.


>
> --- David A. Wheeler
>
>
>

-- 
Oğuz


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Harald van Dijk via austin-group-l at The Open Group

On 12/04/2021 12:47, (Joerg Schilling) wrote:

Harald van Dijk  wrote:


If they are mistakes, they are widespread mistakes. As hinted in the
links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing
as files with execute permission, but /bin/gcc as a text file containing
#!/bad so that any attempt to execute it will fail, there are a lot of
shells where command -v gcc returns /bin/gcc, but running gcc actually
executes /usr/bin/gcc instead without reporting any error: this
behaviour is common to bosh, dash and variants (including mine), ksh,
and zsh.


My tests show that ksh, bash, yash, mksh do not find gcc in that case.


Huh. My tests with ksh were with 93v, it's possible different versions
behave differently.


Do you have a private variant og ksh93v?

I get the same behavior from  ksh88, the ksh93 from OpenSolaris and ksh93v.


I don't. I was testing with ksh built from 
. I will try to figure 
out why I am getting different results from you.



[...]

I don't think command -v should do more, I think ordinary command lookup
should do less. The behaviour of shells of continuing command lookup
after a failed execve() is not supported by what POSIX says in "Command
Search and Execution". Command lookup is supposed to stop as soon as "an
executable file with the specified name and appropriate execution
permissions is found" (per the referenced "Other Environment Variables",
"PATH"). In my example that results in /bin/gcc. The shell should
attempt to execute /bin/gcc, and once that fails, stop.


Given that #!/bad causes an ENOENT, this seems to be a missinterpreation from
you. The shell cannot distinct the missconfigured script from a missing file.


It can. bash already does. You are correct that is not possible to check 
this just by looking at execve()'s error code, but there are other ways 
to handle this.


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Harald van Dijk via austin-group-l at The Open Group

On 12/04/2021 00:25, Robert Elz wrote:

 Date:Sun, 11 Apr 2021 22:27:19 +0100
 From:Harald van Dijk 
 Message-ID:  <79b98e30-46ba-d468-153f-c1a2a0416...@gigawatt.nl>

   | Okay, but that is a technicality. The pre-seeding is only permitted at
   | startup time,

No, what it says is "an unspecifiedshell start-up activity".
"unspecified" means it can be anything.


No, not anything. It still has to be shell start-up activity.


  Anything includes starting
a thread which monitors what commands are about to be executed and
loads the hash table just in time.   Or one which populates the hash table
with every possible command every tenth of a micro-second.   Anything.
It is unspecified.


The starting a thread would be shell start-up activity. The actions 
performed on that thread while some other thread is running the script 
clearly aren't.



   | so cannot depend on the contents of the script.

Of course, it can, the script is available at startup time of the
shell, the startup activity can read the entire script, parse it,
find all the command names and possible command names, and add them
to the hash table.


It cannot do this either, parsing the whole script in advance is not 
only not allowed (it would break the use of aliases defined in the 
script, at the least) but also impossible as command names need not be 
named literally inside the script.



Alternatively, it can examine PATH and load
every executable in every directory in PATH into the hash table.
zsh (seems to) do something like the latter.


This is something that I agree is valid for a shell to do. It does not 
make any fundamental difference.


Incidentally, I only see this in zsh's interactive mode. I am not sure 
whether this depends on interactive mode directly, or on another option 
automatically turned on or off in interactive mode.



   | I want to say this is a theoretical concern, that there are no shells
   | where hash -r is implemented as doing anything other than clearing the
   | hash table. I cannot prove this but will be quite disappointed if any to
   | turn out to do something else.

zsh comes close, it appears to empty the hash table on "hash -r", but
do anything at all, and it fills up again.  And I mean fills.   And I
understand that - if you're going to search the directories in PATH
over and over again, every time a command is executed, better to read
them once, and remember what they contain - no more useless I/O.
(I vaguely recall deciding that zsh read as many directories as needed
to find the command, and then stopped - getting a "command not found"
would result in everything possible from PATH now being in the hash table.)


Yes, that is exactly what it is doing.


   | > That is, find an entry for cmd in PATH for which exec() succeeds.
   | > Only fail if there is none.
   |
   | Yes, that is what dash is doing.

The way PATH searches should be done.

   | Well, that is sort of what dash does. dash takes an extra integer that
   | specifies which PATH component was hashed and uses that as the starting
   | point for the search,

I know.  This is irrelevant here.  If this algorithm doesn't produce the
required results, that would be a bug, and like most bugs, if it is
considered serious enough, it can be fixed.

The important issue, is that the intent is to examine each element in
PATH, until we get success from exec(), (or ENOEXEC with a file we're
willing to treat as a script, and so exec a shell to interpret it).
So, if there is a /bin/gcc that is "#!/bad" and a later one in path
that is a real executable, we should exec the later one, right?


I am not convinced that that is the intent at all.

This code is shared between ordinary command execution and the exec 
builtin. The former needs no second PATH lookup if the hashing was done 
correctly, the latter does: no hashing happens for exec, as it would be 
useless. The code looks like the simplest way to shoehorn both into a 
single function based on the assumption that the first execve() in the 
ordinary command execution case would not fail. The fact that it can 
fail tells us nothing about what was intended to happen in such a case.


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread David A. Wheeler via austin-group-l at The Open Group


> On Apr 12, 2021, at 10:57 AM, Oğuz  wrote:
> 12 Nisan 2021 Pazartesi tarihinde David A. Wheeler via austin-group-l at The 
> Open Group  > yazdı:
> If you want a robust shell script, I recommend that you try out the tool 
> “shellcheck”.
> That checks a shell script against a set of recommended practices (e.g., use 
> “$variable” not $variable).
> 
> If it makes that suggestion no matter what context `$variable' is used in, I 
> don't see how it'll help make a shell script "robust”.

It’s very common advice to recommend using double-quotes on variable expansions 
unless you
have a *good* reason to do otherwise in shell scripts, because it prevents word 
splitting
on a variable reference, and in most cases you do NOT want the word splitting. 
Examples:
* For example, the "Advanced Bash-Scripting Guide” says,
  “When referencing a variable, it is generally advisable to enclose its name 
in double quotes.”
  https://tldp.org/LDP/abs/html/quotingvar.html
* The “Quotes” page says:
  "When in doubt, double-quote every expansion in your shell commands.”
  
https://mywiki.wooledge.org/Quotes#I.27m_Too_Lazy_to_Read.2C_Just_Tell_Me_What_to_Do
* https://www.tecmint.com/useful-tips-for-writing-bash-scripts-in-linux/
* 
https://levelup.gitconnected.com/9-tips-for-writing-safer-shell-scripts-b0c185da9bae#729e

Yes, there are rare cases where you *do* want word-splitting. In those cases, 
omit the double-quotes,
and shellcheck lets you disable checks in specific uses when you actually *did* 
want this.

Following the rule “always double-quote unless you have special reasons” means 
that you don’t have to do
program-wide analysis to think about word-splitting in most variable references.
As a result, it tends to make creating *reliable* scripts easier.

--- David A. Wheeler




Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Joerg Schilling via austin-group-l at The Open Group
"Robert Elz via austin-group-l at The Open Group" 
 wrote:

> Date:Mon, 12 Apr 2021 15:07:28 + (UTC)
> From:shwaresyst 
> Message-ID:  <1662152200.1116623.1618240048...@mail.yahoo.com>
> 
>   | Then that is conformance bugs in those kernels,
> 
> Rubbish.
> 
>   | in that files of this type are not load images exec() is to handle
> 
> There is no specification at all of what file types exec() is to handle.
> 
> Anything that the system can run, no matter how it does that, is acceptable.

Correct, on Solaris the e.g. includes precompiled ksh93 scripts.

You can compile a ksh93 script with the command "shcomp" and the result starts 
with:

^k^s^h\0

The Solaris kernel detects this and starts ksh93 the right way with that binary.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 11 Apr 2021 14:02:15 +0200
From:"Joerg Schilling via austin-group-l at The Open Group" 

Message-ID:  <20210411120215.wchk4%sch...@schily.net>


  | If command -v should become able to do more, we would need to invent a
  | way to execute _any_ utility (regardless of whether it is a binary or
  | script) to execute in a harmless way without side-effects.

The way to do that would be to add an exectest*() set of system calls to the 
exec*() family (one for each of the exec*() calls) which does exactly the same
thing as the exec*() system calls, up to the point where the kernel has
decided that it is going to run the binary, and is just about to destroy
the current image and replace it with a new one.  Instead of doing that,
the exectest*() system calls would simply return 0.   If any error was
detected before that point, then the error return would be identical to
that from exec*().   Almost no code would be needed to make this work,
aside from libc sys call stubs, and lots of documentation.

I was going to have just one new sys call (and in the implementation,
rather than libc, there might just be one, or two perhaps, required in
any case) with just the path name to test as the arg, which is all that
would be needed, or used, by command -v, but other applications might want
to determine whether they can exec a combination of binary+args+env
(not get E2BIG from too many args, etc) so, if something new were to
be invented, it should start out being truly useful, not just half.

Personally, I think the actual problem here is more noise ("what would
happen if ...?") type thing than any real world problem, so I'm not
about to waste cycles attempting to fix something that doesn't really
need fixing, other than to assuage someone's "that's weird" feeling.

kre




Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Robert Elz via austin-group-l at The Open Group
Date:Mon, 12 Apr 2021 15:07:28 + (UTC)
From:shwaresyst 
Message-ID:  <1662152200.1116623.1618240048...@mail.yahoo.com>

  | Then that is conformance bugs in those kernels,

Rubbish.

  | in that files of this type are not load images exec() is to handle

There is no specification at all of what file types exec() is to handle.

Anything that the system can run, no matter how it does that, is acceptable.

If a system noticed a VAX format a.out, it could load a vax simulator, and
run the binary that way, without the user even noticing.  If it wanted.

  | that are usable with dl*().

Almost nothing exec() loads is usable with dl*().   Those functions
only work with specially prepared dynamic objects, complete with symbol
tables.   exec() is not nearly so limited.

  | The allowance is for magics differentiating formats of that nature,
  | as I see as the intent,

No-one cares what you think is or was the intent.

There is actually no requirement for magic numbers (or anything like
it) at all.   The kernel is free to evaluate a potential executable file
to see whether it meets that kernel's requirements to be an executable
file however it deems appropriate.   Needless to say the code generation
tools would need to generate something compatible, but none of that is
relevant here.

  | not one bypassing what the shell is supposed to determine

None of this has anything to do with what the shell is supposed to determine,
even if one were to imagine that such a thing had ever been specified, by 
anyone.

The true questions of this thread concern the shell, but your totally
off target side issues do not.

  | and in the process making illegal what the shell description asserts
  | is required to be possible.

Absolute nonsense.

  | The way to get shebang processing is as I outlined by adding to set,

And how does that allow other programs (ones that aren't shells) to
exec scripts?  That is anything which is not whatever you consider
to be an acceptable binary format.   Which would include a shell
script.

No-one cares that you have some personal crusade against #!.
It has been around a very long time, is in use essentially everywhere,
and is one of (relying upon what I was told, when I was shown the code,
and then added it to BSD, long long ago) one of Dennis Ritchie's better ideas.

If you want to keep up your (totally bizarre, and factually absurd)
crusade, please do it in a different thread (start a new one) which
we can all then simply ignore - nothing specific to #! is in any
way related to the issues in this one.

kre



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread shwaresyst via austin-group-l at The Open Group
Then that is conformance bugs in those kernels, to me, in that files of this 
type are not load images exec() is to handle that are usable with dl*(). The 
allowance is for magics differentiating formats of that nature, as I see as the 
intent, not one bypassing what the shell is supposed to determine and in the 
process making illegal what the shell description asserts is required to be 
possible. The way to get shebang processing is as I outlined by adding to set, 
not trying to take advantage of the current language of exec() being too 
permissive.

 
 
  On Mon, Apr 12, 2021 at 9:04 AM, Joerg Schilling via austin-group-l at The 
Open Group wrote:   "shwaresyst via 
austin-group-l at The Open Group"  wrote:

> No, it's not nonsense. The definition of comment has all characters, 
> including '!', shall be ignored until newline or end-of-file being 
> conforming. Then tokenization which might discover an operator, keyword or 
> command continues. This precludes "#!" being recognized as any of those. 
> There is NO allowance for '!' being the second character as reserved for 
> implementation extensions.

#!/bad of course is a normal comment from the vew if a normal shell. 
An execption is mz old "bsh" (not bosh) on a historic UNIX without support for
#! in the kernel.

On all recent platforms, #! is just another *magic number* that is handled by 
the kernel only.

POSIX of course does not limit what magics are recognised by the kernel.

Jörg

-- 
EMail:jo...@schily.net                  Jörg Schilling D-13353 Berlin
                    Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/

  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Oğuz via austin-group-l at The Open Group
12 Nisan 2021 Pazartesi tarihinde David A. Wheeler via austin-group-l at
The Open Group  yazdı:
>
> If you want a robust shell script, I recommend that you try out the tool
> “shellcheck”.
> That checks a shell script against a set of recommended practices (e.g.,
> use “$variable” not $variable).
>

If it makes that suggestion no matter what context `$variable' is used in,
I don't see how it'll help make a shell script "robust".


-- 
Oğuz


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread David A. Wheeler via austin-group-l at The Open Group


> On Apr 10, 2021, at 5:54 AM, Jan Hafer via austin-group-l at The Open Group 
>  wrote:
> ...
> 2. In an ideal scenario the semantic of a word can be make constant, so no 
> other script or shell invocation running afterwards can change it (this would 
> compare to best practices in compiled languages and can be cheaply analyzed 
> before execution of a script).

That sounds like a nightmare, not an ideal.

...
> Does POSIX have any opinion or recommendation how to make SHELL scripting 
> robust?

POSIX, as a specification, generally just states requirements.

If you want a robust shell script, I recommend that you try out the tool 
“shellcheck”.
That checks a shell script against a set of recommended practices (e.g., use 
“$variable” not $variable).
Sadly, spellcheck does not provide a way to directly analyze the shell scripts 
within Makefiles.

--- David A. Wheeler




Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Joerg Schilling via austin-group-l at The Open Group
Harald van Dijk  wrote:

> That is an implementation detail. As far as POSIX is concerned, there is 
> only a single command search when a command is executed, so "a 
> subsequent invocation" can only refer to the shell script attempting to 
> execute the same command again at a later time. POSIX does not even 
> require the shell to fork at all, the shell may use some other 
> system-specific way of creating a new process. This isn't hypothetical, 
> such other system-specific ways of creating new processes were the 
> reason posix_spawn was added, and posix_spawn appears to be used by at 
> least one shell (ksh).

This was possible on ksh93 because ksh93 installed the needed infrastucture for 
Win-DOS in order to support UWIN.

Do not expect other shells to be able to follow that easily.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Joerg Schilling via austin-group-l at The Open Group
Robert Elz  wrote:

> Actually, in my, and I suspect most, implementations, even the first
> will invoke the "subsequent" clause, as the (parent) shell first searches
> PATH to find the executable, and enters it in the hash table.  Then it
> forks, and the child repeats the whole thing (after redirects etc have
> all been done).  This one is the subsequent search, which starts out
> with what is already in the hash table (assuming the command was found
> at all) and then if that fails, goes ahead and looks for another.

This is the method from the Bourne Shell and I also expect this to be used.
The Bourne shell btw uses the hash as the start value for the search in thew 
forked child.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Chet Ramey via austin-group-l at The Open Group

On 4/11/21 4:17 PM, shwaresyst via austin-group-l at The Open Group wrote:

conforming applications can not rely on unspecified behaviors, so having a 
use beyond that specified makes the shell nonconforming. Calling it out 
like that simply acknowledges a lot of shell implementations choose to make 
themselves nonconforming, I do not see it as an endorsement or allowance. 


This is just wrong. By this definition, every shell is non-conforming.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Joerg Schilling via austin-group-l at The Open Group
"shwaresyst via austin-group-l at The Open Group" 
 wrote:

> No, it's not nonsense. The definition of comment has all characters, 
> including '!', shall be ignored until newline or end-of-file being 
> conforming. Then tokenization which might discover an operator, keyword or 
> command continues. This precludes "#!" being recognized as any of those. 
> There is NO allowance for '!' being the second character as reserved for 
> implementation extensions.

#!/bad of course is a normal comment from the vew if a normal shell. 
An execption is mz old "bsh" (not bosh) on a historic UNIX without support for
#! in the kernel.

On all recent platforms, #! is just another *magic number* that is handled by 
the kernel only.

POSIX of course does not limit what magics are recognised by the kernel.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-12 Thread Joerg Schilling via austin-group-l at The Open Group
Harald van Dijk  wrote:

> >> If they are mistakes, they are widespread mistakes. As hinted in the
> >> links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing
> >> as files with execute permission, but /bin/gcc as a text file containing
> >> #!/bad so that any attempt to execute it will fail, there are a lot of
> >> shells where command -v gcc returns /bin/gcc, but running gcc actually
> >> executes /usr/bin/gcc instead without reporting any error: this
> >> behaviour is common to bosh, dash and variants (including mine), ksh,
> >> and zsh.
> > 
> > My tests show that ksh, bash, yash, mksh do not find gcc in that case.
> 
> Huh. My tests with ksh were with 93v, it's possible different versions 
> behave differently.

Do you have a private variant og ksh93v?

I get the same behavior from  ksh88, the ksh93 from OpenSolaris and ksh93v.
 
> I am assuming that by "do not find gcc" you mean "do not find 
> /usr/bin/gcc" here.

As mentioned in another post, the error is ENOENT, caused by execiting the 
#!/bad script. So unlike the Bourne Shell, ksh does not use the hash as start 
index for further searching but rather as a definite value.

> > I believe what bosh and dash do is the best behavior. None of the known 
> > shells
> > opens the file with "command -v something" and thus cannot know whether the
> > content is a script, a useless #! script or even a binary for the wrong
> > architecture.
> 
> Earlier, you did not see the problem that prompted this thread, and now 

Well, with 40 years of UNIX history experience, you have so many exceptions to 
remember that you do not always remember every detail and the OP could at least 
give a hint.

> you say that the behaviour where command -v lookup does not match 
> execution lookup is the best behaviour. I trust that you do see now the 
> problem that prompted this thread: there is, in these shells at least, 
> no reliable way to perform command lookup separate from execution.

This is indeed a problem that that would need a coordinated change in all 
shells.

Fortunately, the problem is only present on missconfigured environments. It can 
be avoided by admins...

> I don't think command -v should do more, I think ordinary command lookup 
> should do less. The behaviour of shells of continuing command lookup 
> after a failed execve() is not supported by what POSIX says in "Command 
> Search and Execution". Command lookup is supposed to stop as soon as "an 
> executable file with the specified name and appropriate execution 
> permissions is found" (per the referenced "Other Environment Variables", 
> "PATH"). In my example that results in /bin/gcc. The shell should 
> attempt to execute /bin/gcc, and once that fails, stop.

Given that #!/bad causes an ENOENT, this seems to be a missinterpreation from 
you. The shell cannot distinct the missconfigured script from a missing file.

> > There is still a problem: only bosh and ksh could in therory add the right
> > entry into the hash, since they are using vfork() and could report back the
> > final result via shared memory. I have that probability in mind for bosh 
> > since
> > I introduced vfork() support to bosh in 2014.
> 
> That's an interesting thought. The approach taken by the other shells 
> avoids the problem entirely and makes this unnecessary though.

Not by all shells, see e.g. dash.

The Bourne Shell and dash behavior seems to be the expected behavior since I 
would not first look at how the hash entries are created in order to judge on 
the real execute behavior.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 11 Apr 2021 22:27:19 +0100
From:Harald van Dijk 
Message-ID:  <79b98e30-46ba-d468-153f-c1a2a0416...@gigawatt.nl>

  | Okay, but that is a technicality. The pre-seeding is only permitted at 
  | startup time,

No, what it says is "an unspecified shell start-up activity".
"unspecified" means it can be anything.   Anything includes starting
a thread which monitors what commands are about to be executed and
loads the hash table just in time.   Or one which populates the hash table
with every possible command every tenth of a micro-second.   Anything.
It is unspecified.

  | so cannot depend on the contents of the script.

Of course, it can, the script is available at startup time of the
shell, the startup activity can read the entire script, parse it,
find all the command names and possible command names, and add them
to the hash table.   Alternatively, it can examine PATH and load
every executable in every directory in PATH into the hash table.
zsh (seems to) do something like the latter.

  | Replace gcc by any utility that is not hashed at startup

There are none (or none that can be found by a PATH search).

  | Actually, if hashing commands is only allowed "as a result of this 
  | specific search or as part of an unspecified shell start-up activity",

unspecified remember...

  | then after "hash -r" has executed, before a new command search has been 
  | performed, the hash table must be empty.

Not unless the specification for hash says so, and it doesn't.

  | I want to say this is a theoretical concern, that there are no shells 
  | where hash -r is implemented as doing anything other than clearing the 
  | hash table. I cannot prove this but will be quite disappointed if any to 
  | turn out to do something else.

zsh comes close, it appears to empty the hash table on "hash -r", but
do anything at all, and it fills up again.  And I mean fills.   And I
understand that - if you're going to search the directories in PATH
over and over again, every time a command is executed, better to read
them once, and remember what they contain - no more useless I/O.
(I vaguely recall deciding that zsh read as many directories as needed
to find the command, and then stopped - getting a "command not found"
would result in everything possible from PATH now being in the hash table.)

  | > That is, find an entry for cmd in PATH for which exec() succeeds.
  | > Only fail if there is none.
  |
  | Yes, that is what dash is doing.

The way PATH searches should be done.

  | Well, that is sort of what dash does. dash takes an extra integer that 
  | specifies which PATH component was hashed and uses that as the starting 
  | point for the search,

I know.  This is irrelevant here.  If this algorithm doesn't produce the
required results, that would be a bug, and like most bugs, if it is
considered serious enough, it can be fixed.

The important issue, is that the intent is to examine each element in
PATH, until we get success from exec(), (or ENOEXEC with a file we're
willing to treat as a script, and so exec a shell to interpret it).
So, if there is a /bin/gcc that is "#!/bad" and a later one in path
that is a real executable, we should exec the later one, right?

The dash (ash in general, at least originally) optimisation is simply
to note that if we read the directory, and didn't find the command name
there, then there's no point attempting to exec it from that directory,
that must fail.   If between reading the directory and when the exec
attempt was made, someone inserted the command into one of the directories
that had been read, then we have a race - and as usual, sometimes one
wins, sometimes the other - if you're willing to bias the conditions
(handicap) you can force one result or the other (or just make one or
the other more likely), or you can make it even more unpredictable.
Nothing is wrong here, races have unpredicable results.

kre



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 11 Apr 2021 20:17:09 + (UTC)
From:shwaresyst 
Message-ID:  <1360977422.847706.1618172229...@mail.yahoo.com>

  | We are talking about the shell, not some bastardization of execve(),
  | that sees it's not a directly loadable process image so treats it as
  | a script.

shells only do that when the error is ENOEXEC.

In the cases we were discussing, it was not.   Had there been an ENOEXEC
error, then the shells would (all I believe) have attempted to run the
file as a script.  But that wasn't the error, so they (rightly) did not.

The use of #!/bad (and similar) is simply an easy way (on most kernels,
ie: those which support #!) to get the exec to fail with an error other
than ENOEXEC.   It is not the only way, name or path too long are also
possibilities, so is a smylink loop, ... there are many possibilities.

For those shells implementing shebang as an extension

There are a couple that do that, but that's completely irrelevant, that
only happens when the kernel doesn't support #! (all that matter do), and
the shell is trying to do what any modern kernel would (should) do.
Posix might not mandate #! support but the marketplace does.

So:
   it is still them piping the body of the script after the shebang line,
   without any token expansion, to an alternate interpreter via an exec()
   of some sort.

This is completely immaterial as no-one here is in any way considering
this kind of case, we're not getting ENOEXEC errors.

   Second, conforming applications can not rely on unspecified behaviors,

Of course.

   so having a use beyond that specified makes the shell nonconforming.

Nonsense.

Shells are allowed to implement extensions.   They don't become non-conforming
because of that.   The reference shell (ksh88) implements extensions after all.

   Some conforming script authors may simply want the first line to be a
# IMPORTANT USAGE NOTE 
   headline,

That's a contradiction.  A conforming script cannot start that way.
You have already been told why.   It can start \n#!!! if it wants.
It can even start \b#!!! if it wants to pretend (at least to people
who look via "cat file") as if it starts #!!!.   It cannot start #!anything.

  | What the standard does allow as an extension,
  | and I would support adding to the standard, is adding an option
  | to turn off token expansion in here-doc bodies,

What does this have to do with the current discussion?

  | This allows the effect of shebang to be accomplished anywhere in a script,

Nonsense.   #! is not really for when shells run commands (though it
helps), it is for when other utilities run commands

find /where/ever -name something -exec my_cmd {} \;

where "my_cmd" is awk, or perl, or python, or tcl, or ...

I wasn't here when any austin-group discussions on #! were being held,
but it is hard these days to think of any good reason for it not to be
included, with the possible exception that executable formats in general
are not specified.  If that was it, I would think an exception for this
one case would make sense.

However #! has ***nothing*** to do with the current issue, it's just
a tool to use for demonstrating what happens.   The same issues can
arise in lots of other ways.   Please stop confusing things.

If you don't understand what we're talking about, please just observe
and try to learn something (feel free to ask questions).

kre



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Harald van Dijk via austin-group-l at The Open Group

On 11/04/2021 22:05, Robert Elz wrote:

 Date:Sun, 11 Apr 2021 19:46:36 +0100
 From:Harald van Dijk 
 Message-ID:  <9ab286f9-125d-55a4-a65f-08d4af04d...@gigawatt.nl>

   | Sure, that's why I then switched to a different example that did not
   | have an earlier "command -v" to point out how this leads to inconsistent
   | behaviour.

But while it is possible to (at least probabilistically - it is a hash table
after all, effectively a cache) ensure that an entry exists, it is not
possible to ensure that one doesn't.

Recall this part from POSIX (still 2.9.1.1 1. e. i.)

Once a utility has been searched for and found (either as a
result of this specific search or as part of an unspecified
shell start-up activity),

That is, a shell is permitted to pre-seed the hash table at startup time,
and if allowed then, exactly when it happens between when main() of the
shell is first called, and when a lookup for a command is actually done,
is unknowable.   That means it is OK for the shell to pre-seed the hash
table for a command when the command name is seen, and then it will be
there when the search for that command is done.


Okay, but that is a technicality. The pre-seeding is only permitted at 
startup time, so cannot depend on the contents of the script. Replace 
gcc by any utility that is not hashed at startup and you will still have 
the same problem. Or, as you say, clear the hash table explicitly.



Even hash -r (which removes everything) doesn't guarantee that everything
isn't immediately replaced (with up to date values of course) before that
command even finishes.


Actually, if hashing commands is only allowed "as a result of this 
specific search or as part of an unspecified shell start-up activity", 
then after "hash -r" has executed, before a new command search has been 
performed, the hash table must be empty.


I want to say this is a theoretical concern, that there are no shells 
where hash -r is implemented as doing anything other than clearing the 
hash table. I cannot prove this but will be quite disappointed if any to 
turn out to do something else.



But all of this is really irrelevant, it is based upon a flawed assumption
about what is happening (and even what should happen).

What dash and the others, I presume, are doing, is not really
the "subsequent command" thing (that was just an interesting argument to
make), it is rather an implementation of the original Bourne shell
strategy (pre hash table), which was, more or less (not this code,
I don't write algol68, just a similar effect):
[...] 
That is, find an entry for cmd in PATH for which exec() succeeds.

Only fail if there is none.


Yes, that is what dash is doing.


The addition of the hash table should allow that algorithm to run
faster (with the occasional problem when after a hash entry is
created, someone inserts an entry earlier in PATH than it was before)
but it should not normally change the outcome of that algorithm.


Well, that is sort of what dash does. dash takes an extra integer that 
specifies which PATH component was hashed and uses that as the starting 
point for the search, but otherwise it is the same algorithm. So if 
PATH=/a:/b:/c and the hash table says x is found in /b, the search in 
the shell child will look for /b/x, and if that fails, /c/x. It will not 
search for /a/x unless the hash table is cleared.


This does not seem useful to me. If the command is no longer present in 
/b, it should be checked in all PATH components. Commands may 
legitimately move from /usr/bin to /bin by system upgrades just as well 
as the other way around.


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Joerg Schilling via austin-group-l at The Open Group
"shwaresyst via austin-group-l at The Open Group" 
 wrote:

> We are talking about the shell, not some bastardization of execve(), that 
> sees it's not a directly loadable process image so treats it as a script. For 
> those shells implementing shebang as an extension it is still them piping the 
> body of the script after the shebang line, without any token expansion, to an 
> alternate interpreter via an exec() of some sort. Second, conforming 
> applications can not rely on unspecified behaviors, so having a use beyond 
> that specified makes the shell nonconforming. Calling it out like that simply 
> acknowledges a lot of shell implementations choose to make themselves 
> nonconforming, I do not see it as an endorsement or allowance. The 
> requirement explicitly specified behavior shall be implemented as specified 
> takes priority. Some conforming script authors may simply want the first line 
> to be a# IMPORTANT USAGE NOTE headline, or similar, not want a 
> utility named "!!!" to be exec'd.

You are mistaken again.

The only platform that worked like you describe is my old shell "bsh" when
run on UNOS (the first UNIX clone).  But this was in the 1980s and there is no 
other similar platform.

Today, #!/path is always handled by the kernel.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Joerg Schilling via austin-group-l at The Open Group
"shwaresyst via austin-group-l at The Open Group" 
 wrote:

> No, it's not nonsense. The definition of comment has all characters, 
> including '!', shall be ignored until newline or end-of-file being 
> conforming. Then tokenization which might discover an operator, keyword or 
> command continues. This precludes "#!" being recognized as any of those. 
> There is NO allowance for '!' being the second character as reserved for 
> implementation extensions.

No, sorry but #!/path is a kernel extension that is permittd by POSIX.

The shells handle such a line as comment

Also note that the error code from exec*() for a file that contains #!/bad is 
not ENOEXEC, but ENOENT. This is why the shells continue to search for a 
potential executable in PATH when they actually try to execute thw binaries.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Harald van Dijk via austin-group-l at The Open Group

On 11/04/2021 21:17, shwaresyst wrote:
   The requirement explicitly 
specified behavior shall be implemented as specified takes priority. 
Some conforming script authors may simply want the first line to be a

# IMPORTANT USAGE NOTE 
headline, or similar, not want a utility named "!!!" to be exec'd.


If you are really saying that when POSIX says "If the first line of a 
file of shell commands starts with the characters "#!", the results are 
unspecified.", it actually means the results are well-defined, you are 
either seriously deluded, or trolling. I cannot tell which and have no 
interest in wasting time figuring it out.


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 11 Apr 2021 19:46:36 +0100
From:Harald van Dijk 
Message-ID:  <9ab286f9-125d-55a4-a65f-08d4af04d...@gigawatt.nl>

  | Sure, that's why I then switched to a different example that did not 
  | have an earlier "command -v" to point out how this leads to inconsistent 
  | behaviour.

But while it is possible to (at least probabilistically - it is a hash table
after all, effectively a cache) ensure that an entry exists, it is not
possible to ensure that one doesn't.

Recall this part from POSIX (still 2.9.1.1 1. e. i.)

Once a utility has been searched for and found (either as a
result of this specific search or as part of an unspecified
shell start-up activity),

That is, a shell is permitted to pre-seed the hash table at startup time,
and if allowed then, exactly when it happens between when main() of the
shell is first called, and when a lookup for a command is actually done,
is unknowable.   That means it is OK for the shell to pre-seed the hash
table for a command when the command name is seen, and then it will be
there when the search for that command is done.

Even hash -r (which removes everything) doesn't guarantee that everything
isn't immediately replaced (with up to date values of course) before that
command even finishes.

  | Ha, that's bad enough for an interactive shell, but for a 
  | non-interactive shell script that executes gcc and exits if it fails, 
  | retrying wouldn't even work.

Hmm - if the shell exits after the first one fails (which isn't usual,
"command not found" is just a "may exit") then who cares?   It fails, and
we're done, what would have happened had we tried again will never be known.

But all of this is really irrelevant, it is based upon a flawed assumption
about what is happening (and even what should happen).

What dash and the others, I presume, are doing, is not really
the "subsequent command" thing (that was just an interesting argument to
make), it is rather an implementation of the original Bourne shell
strategy (pre hash table), which was, more or less (not this code,
I don't write algol68, just a similar effect):

if (fork() == 0) {
/* do redirects, etc - omitted here */
p = copystr(lookup("PATH"));
err = 0;
do {
q = strchr(p, ':');
if (q != NULL)
*q++ = '\0';
sprintf(buf, "%s/%s", p, cmd);
/* ignore relative paths here */
execve(buf, args, env);
/* if we get here, the exec failed */
if (err == 0)   /* more complex test really */
err = errno;
/* ignore trying to exec /bin/sh on ENOEXEC here */
} while ((p = q) != NULL);
fprintf(stderr, "sh: %s: command not found: %s\n",
cmd, strerror(err));
exit(err == N || err == M ? 126 : 127); /* specifics omitted */
}

[Aside: I know there's lots of errors and omissions there,
but you get the model].

That is, find an entry for cmd in PATH for which exec() succeeds.
Only fail if there is none.

The addition of the hash table should allow that algorithm to run
faster (with the occasional problem when after a hash entry is
created, someone inserts an entry earlier in PATH than it was before)
but it should not normally change the outcome of that algorithm.

That's what was originally done, that's what we should still be
doing, and that's what the shells that go on to the 2nd gcc or cmd
actually do.   It makes no difference (should make no difference)
whether the name was in the hash table before the command was invoked
or not.

If any of the shells which do not copy the ksh/bash behaviour aren't
doing that, then I'd agree, those are broken.   Those that do copy it
are simply broken.

The command utility (with -v) (and which, whence, type, ...) cannot exec
the command so all it can do is find the first entry in PATH which matches.

When loading the hash table, the shell has the same limitations.

  | and no scenario in which I am seeing the dash behaviour as clearly better.

Sorry, I am not an optometrist, and cannot assist with your vision problems.

kre



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread shwaresyst via austin-group-l at The Open Group
We are talking about the shell, not some bastardization of execve(), that sees 
it's not a directly loadable process image so treats it as a script. For those 
shells implementing shebang as an extension it is still them piping the body of 
the script after the shebang line, without any token expansion, to an alternate 
interpreter via an exec() of some sort. Second, conforming applications can not 
rely on unspecified behaviors, so having a use beyond that specified makes the 
shell nonconforming. Calling it out like that simply acknowledges a lot of 
shell implementations choose to make themselves nonconforming, I do not see it 
as an endorsement or allowance. The requirement explicitly specified behavior 
shall be implemented as specified takes priority. Some conforming script 
authors may simply want the first line to be a# IMPORTANT USAGE NOTE 
headline, or similar, not want a utility named "!!!" to be exec'd.
What the standard does allow as an extension, and I would support adding to the 
standard, is adding an option to turn off token expansion in here-doc bodies, 
and back on, via set. This allows the effect of shebang to be accomplished 
anywhere in a script, at the expense of a few extra characters for the here 
delimiter and set commands, without any other changes to tokenizing or the 
grammar. 
 
  On Sun, Apr 11, 2021 at 12:15 PM, Harald van Dijk wrote:   
On 11/04/2021 17:09, shwaresyst via austin-group-l at The Open Group wrote:
> No, it's not nonsense. The definition of comment has all characters, 
> including '!', shall be ignored until newline or end-of-file being 
> conforming. Then tokenization which might discover an operator, keyword 
> or command continues. This precludes "#!" being recognized as any of 
> those. There is NO allowance for '!' being the second character as 
> reserved for implementation extensions.

This is wrong on two counts. The first is that you're assuming that this 
will be interpreted by a shell. If execve() succeeds (and the #! line 
does not name a shell), it will not be interpreted by a shell at all, 
and the shell syntax for comments is irrelevant. The second is about 
what happens when it does get interpreted by a shell: POSIX allows 
shells to treat files starting with "#!" specially: "If the first line 
of a file of shell commands starts with the characters "#!", the results 
are unspecified."

Cheers,
Harald van Dijk
  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Harald van Dijk via austin-group-l at The Open Group

On 11/04/2021 17:50, Robert Elz wrote:

 Date:Sun, 11 Apr 2021 17:04:05 +0100
 From:Harald van Dijk 
 Message-ID:  <92113e70-5605-10f4-8e57-47c9f64cd...@gigawatt.nl>


   | This only applies when a remembered location exists at all, though.

Yes, but in the examples I showed, it did (you can see that from the
output of the hash command before the attempt to execute cmd).  It was
put there by "command -v".   I haven't checked again, but I think all
shells do that.


Sure, that's why I then switched to a different example that did not 
have an earlier "command -v" to point out how this leads to inconsistent 
behaviour.



   | Then, if you accept that, for consistency, "the shell shall repeat the
   | search" can only mean to repeat the full search and again stop at the
   | first file with execute permissions, as it would be batshit crazy to
   | have a shell that, when presented with "gcc; gcc", for the first gcc
   | issues an error because /bin/gcc cannot be executed, and for the second
   | gcc to find /usr/bin/gcc because /bin/gcc failed to execute.

Actually, in my, and I suspect most, implementations, even the first
will invoke the "subsequent" clause, as the (parent) shell first searches
PATH to find the executable, and enters it in the hash table.  Then it
forks, and the child repeats the whole thing (after redirects etc have
all been done).  This one is the subsequent search, which starts out
with what is already in the hash table (assuming the command was found
at all) and then if that fails, goes ahead and looks for another.


That is an implementation detail. As far as POSIX is concerned, there is 
only a single command search when a command is executed, so "a 
subsequent invocation" can only refer to the shell script attempting to 
execute the same command again at a later time. POSIX does not even 
require the shell to fork at all, the shell may use some other 
system-specific way of creating a new process. This isn't hypothetical, 
such other system-specific ways of creating new processes were the 
reason posix_spawn was added, and posix_spawn appears to be used by at 
least one shell (ksh).



   | I am pretty sure you are not suggesting that that is reasonable,

Don't be too sure, I would not object to an implementation that did
work the way you described, and I suspect most users wouldn't either.
We're a pragmatic bunch, if something goes wrong the first time, and
fixes itself the second time (and subsequently), people tend to be
fairly happy.   Not deliriously, just fairly...


Ha, that's bad enough for an interactive shell, but for a 
non-interactive shell script that executes gcc and exits if it fails, 
retrying wouldn't even work.



   | I think that is easier to explain than the other way around, myself.
   | Suppose PATH is intentionally modified so that an uClibc-linked version
   | of GCC appears first in $PATH, but the user messed up,

For almost everything we do, we can find instances where the results
are sub-optimal.   Throwing away everything where that could occur leaves
us with almost nothing.   The best way to avoid this would be to remove
PATH completely (not revert to the Thompson shell fixed search path, but
require all commands to be always specified by full pathname).   I doubt
that would be well received as a solution however.


If it were a case of choosing your poison, then sure, but we do now have 
multiple benefits in this thread of the bash behaviour, and no scenario 
in which I am seeing the dash behaviour as clearly better. If possible, 
I will stick with choosing no poison.


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Joerg Schilling via austin-group-l at The Open Group
"Stephane Chazelas via austin-group-l at The Open Group" 
 wrote:

> 2021-04-10 22:12:47 +0200, Joerg Schilling via austin-group-l at The Open 
> Group:
> > "Jan Hafer via austin-group-l at The Open Group" 
> >  wrote:
> > 
> > > For a short recap why: There are `which, type, command, whence, where, 
> > > whereis, whatis, hash` used in shells. Worse, the semantics of `which` 
> > > is shell-dependent.
> > 
> > which   is a csh script and unrelated to Bourne or POSIX shells.
> > It therefore cannot give useful results in a standard
> > shell environment.
> > 
> > Even worse: On Linux, "which" may be a program with different
> > behavior.
> 
> The OS kernel is hardly relevant here. Various Linux-based OSes

Did I write Linux kernel?

If you tell other people they are not 100% precise, please carefully read 
what you are replying to.

> use various implementations of "which". On Debian-based systems,
> these days, it's implemented as a POSIX sh script (regardless of
> whether Linux (most common by far), kFreeBSD, Hurd, Illumos...
> is used as the kernel)

Do you like to say Illumos did replace the original csh script by something 
that is incompatible? I cannot confirm that.

> > 
> > typeis built into the shell since 1976. What problems do you
> > have with it?
> 
> No, actually type was added to the Bourne shell in SVR2 released
> in 1984, and had that problem that it would not return failure
> when failing to find a command (a bug which survived well into
> the 90s on some OSes IIRC).

OK, I did forgot to first check Sven Maschek and just wrote what I had in mind.
 
> The fact that "which" came first largely explains why it's still
> more popular (even if more broken and less useful in shells
> other than tcsh/zsh) than "type".

I did use "which" in the early 1980s, but at that time, the Bourne Shell was 
not a nice interactive shell, so I used my old "bsh". In 1986, SunOS switched 
to the SYSV Bourne Shell and that had "type". That is really a long time ago.

Most people who currently belive "which" is a good idea, did use diapers
in 1986. So that does not seem to be the problem.

I guess the reason for the problem we see today is caused by bad advise from 
the internet.

> > command is POSIX standard. What problems do you have with it?
> 
> Technically, a "command" builtin was added to zsh first in 1990.
> POSIX.2 introduced a "command" builtin with different
> semantics for sh in 1992.

Interesting: command indeed was added to the POSIX variant of ksh88 in 1995.
I thought it was a ksh88 invention.

> Most of that and much more was already mentioned at
> https://unix.stackexchange.com/questions/85249/why-not-use-which-what-to-use-then
> as referenced in the OP's original message.

That was too long to read.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 11 Apr 2021 17:04:05 +0100
From:Harald van Dijk 
Message-ID:  <92113e70-5605-10f4-8e57-47c9f64cd...@gigawatt.nl>


  | This only applies when a remembered location exists at all, though.

Yes, but in the examples I showed, it did (you can see that from the
output of the hash command before the attempt to execute cmd).  It was
put there by "command -v".   I haven't checked again, but I think all
shells do that.

  | Then, if you accept that, for consistency, "the shell shall repeat the 
  | search" can only mean to repeat the full search and again stop at the 
  | first file with execute permissions, as it would be batshit crazy to 
  | have a shell that, when presented with "gcc; gcc", for the first gcc 
  | issues an error because /bin/gcc cannot be executed, and for the second 
  | gcc to find /usr/bin/gcc because /bin/gcc failed to execute.

Actually, in my, and I suspect most, implementations, even the first
will invoke the "subsequent" clause, as the (parent) shell first searches
PATH to find the executable, and enters it in the hash table.  Then it
forks, and the child repeats the whole thing (after redirects etc have
all been done).  This one is the subsequent search, which starts out
with what is already in the hash table (assuming the command was found
at all) and then if that fails, goes ahead and looks for another.

  | I am pretty sure you are not suggesting that that is reasonable,

Don't be too sure, I would not object to an implementation that did
work the way you described, and I suspect most users wouldn't either.
We're a pragmatic bunch, if something goes wrong the first time, and
fixes itself the second time (and subsequently), people tend to be
fairly happy.   Not deliriously, just fairly...

  | I think that is easier to explain than the other way around, myself. 
  | Suppose PATH is intentionally modified so that an uClibc-linked version 
  | of GCC appears first in $PATH, but the user messed up,

For almost everything we do, we can find instances where the results
are sub-optimal.   Throwing away everything where that could occur leaves
us with almost nothing.   The best way to avoid this would be to remove
PATH completely (not revert to the Thompson shell fixed search path, but
require all commands to be always specified by full pathname).   I doubt
that would be well received as a solution however.

kre



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Stephane Chazelas via austin-group-l at The Open Group
2021-04-10 22:12:47 +0200, Joerg Schilling via austin-group-l at The Open Group:
> "Jan Hafer via austin-group-l at The Open Group" 
>  wrote:
> 
> > For a short recap why: There are `which, type, command, whence, where, 
> > whereis, whatis, hash` used in shells. Worse, the semantics of `which` 
> > is shell-dependent.
> 
> which is a csh script and unrelated to Bourne or POSIX shells.
>   It therefore cannot give useful results in a standard
>   shell environment.
> 
>   Even worse: On Linux, "which" may be a program with different
>   behavior.

The OS kernel is hardly relevant here. Various Linux-based OSes
use various implementations of "which". On Debian-based systems,
these days, it's implemented as a POSIX sh script (regardless of
whether Linux (most common by far), kFreeBSD, Hurd, Illumos...
is used as the kernel)

> 
> type  is built into the shell since 1976. What problems do you
>   have with it?

No, actually type was added to the Bourne shell in SVR2 released
in 1984, and had that problem that it would not return failure
when failing to find a command (a bug which survived well into
the 90s on some OSes IIRC).

The fact that "which" came first largely explains why it's still
more popular (even if more broken and less useful in shells
other than tcsh/zsh) than "type".

> command   is POSIX standard. What problems do you have with it?

Technically, a "command" builtin was added to zsh first in 1990.
POSIX.2 introduced a "command" builtin with different
semantics for sh in 1992.

> whenceis a ksh specific command and thus non-portable
> 
> where ??? what is that?

A builtin of tcsh (since 1991) and zsh. In zsh, it's the same as
which -a, "which" being the same as whence -c.

> whereis   does not exist on a typical UNIX system

whereis was added to 3BSD at the same time as which.

> 
> whatisis a command that behaves like "man -k"
[...]

The type builtin was renamed to whatis in research Unix V8 sh
(1985), based on SVR2's shell and extended.

That's different from 2BSD's whatis command (1979, by Bill Joy,
csh/vi's author) that grep'ed /usr/lib/whatis, a man page
index (itself originally generated by a makewhatis csh script).

Most of that and much more was already mentioned at
https://unix.stackexchange.com/questions/85249/why-not-use-which-what-to-use-then
as referenced in the OP's original message.

-- 
Stephane



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Harald van Dijk via austin-group-l at The Open Group

On 11/04/2021 17:09, shwaresyst via austin-group-l at The Open Group wrote:
No, it's not nonsense. The definition of comment has all characters, 
including '!', shall be ignored until newline or end-of-file being 
conforming. Then tokenization which might discover an operator, keyword 
or command continues. This precludes "#!" being recognized as any of 
those. There is NO allowance for '!' being the second character as 
reserved for implementation extensions.


This is wrong on two counts. The first is that you're assuming that this 
will be interpreted by a shell. If execve() succeeds (and the #! line 
does not name a shell), it will not be interpreted by a shell at all, 
and the shell syntax for comments is irrelevant. The second is about 
what happens when it does get interpreted by a shell: POSIX allows 
shells to treat files starting with "#!" specially: "If the first line 
of a file of shell commands starts with the characters "#!", the results 
are unspecified."


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread shwaresyst via austin-group-l at The Open Group
No, it's not nonsense. The definition of comment has all characters, including 
'!', shall be ignored until newline or end-of-file being conforming. Then 
tokenization which might discover an operator, keyword or command continues. 
This precludes "#!" being recognized as any of those. There is NO allowance for 
'!' being the second character as reserved for implementation extensions.

 
 
  On Sun, Apr 11, 2021 at 11:37 AM, Robert Elz wrote:       
Date:        Sun, 11 Apr 2021 10:46:48 + (UTC)
    From:        shwaresyst 
    Message-ID:  <1413127944.766378.1618138008...@mail.yahoo.com>

  | That's bugs in those shells for POSIX mode then, that I see.

That's nonsense.

  | The conforming behavior is /usr/gcc is found and succeeds at doing nothing,

Nonsense.

That would be a conforming behaviour, it is not "the" conforming behaviour.

POSIX does not define what format a file must be to succeed in being
exec'd by one of the exec*() commands.  The system can have a thousand
different types that work, if it wants, and #! executables are one of
those.  That they're not required to work by POSIX doesn't mean they're
not allowed to work.

For the rest of your message, the reply I just made to Harald's message
applies.

kre

  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Harald van Dijk via austin-group-l at The Open Group

On 11/04/2021 16:33, Robert Elz wrote:

 Date:Sun, 11 Apr 2021 13:25:46 +0100
 From:Harald van Dijk 
 Message-ID:  


   | > My tests show that ksh, bash, yash, mksh do not find gcc in that case.
   |
   | Huh. My tests with ksh were with 93v, it's possible different versions
   | behave differently.

I see the same results as Joerg.  I'm using ksh93u.


Interesting. Will need to re-test with that later.


[...]
Note that POSIX says (this is from 8 D1.1 XCU 2.9.1.1 1. e. i.)

Once a utility has been searched for and found (either as a result
of this specific search or as part of an unspecified shell start-up
activity), an implementation may remember its location and need not
search for the utility again unless the PATH variable has been
the subject of an assignment.

Aside from the lack of mention of hash -r there, that much is fine.  It
goes on:

If the remembered location fails for a subsequent invocation,
the shell shall repeat the search to find the new location for
the utility, if any.

Note: "fails" not "utility is is not found at" or similar, and "the shell
shall".

What it means in these circumstances to "repeat the search to find the
new location for the utility, if any" is less clear - but a reasonable
interpretation (adopted by about half the shells) is that it should look
through PATH, see if it can find a copy of the utility that does not
fail to invoke, and invoke that one.Also note that it does not say
that it is OK to replace the remembered location with that of the newly
located command.


This only applies when a remembered location exists at all, though. If 
no remembered location exists, the invocation is not a "subsequent 
invocation" and the paragraph does not apply.


Then, if you accept that, for consistency, "the shell shall repeat the 
search" can only mean to repeat the full search and again stop at the 
first file with execute permissions, as it would be batshit crazy to 
have a shell that, when presented with "gcc; gcc", for the first gcc 
issues an error because /bin/gcc cannot be executed, and for the second 
gcc to find /usr/bin/gcc because /bin/gcc failed to execute. I am pretty 
sure you are not suggesting that that is reasonable, but I think that is 
a bad consequence of your interpretation of the wording.


And if "shall repeat the search" does refer to the exact same search 
that was initially performed, then "Once a utility has been searched for 
and found [...] an implementation may remember its location" arguably 
applies to that repeated search as well, but that is less clear. You 
have asked questions about that later on. They are good questions to 
think about. I am not sure about those yet, so am skipping them for now.



I agree with that.   Nothing else is rationally possible, except failing
to exec the command (like bash and the ksh's do), but it is hard to
explain how failing to run a command when one that is runnable exists
in $PATH, is a better outcome than running it.


I think that is easier to explain than the other way around, myself. 
Suppose PATH is intentionally modified so that an uClibc-linked version 
of GCC appears first in $PATH, but the user messed up, the dynamic 
linker of uClibc is actually not yet installed, or is installed in the 
wrong location. It is clearly the user's intention to execute the 
uClibc-linked version, and attempting to execute that and reporting the 
error is what bash and others would do. Silently executing some other 
version that the user didn't want is, in my opinion, doing the user a 
disservice.


(Disclaimer: I am not certain whether all shells would treat this 
exactly the same way as the '#!/bad' example.)


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 11 Apr 2021 10:46:48 + (UTC)
From:shwaresyst 
Message-ID:  <1413127944.766378.1618138008...@mail.yahoo.com>

  | That's bugs in those shells for POSIX mode then, that I see.

That's nonsense.

  | The conforming behavior is /usr/gcc is found and succeeds at doing nothing,

Nonsense.

That would be a conforming behaviour, it is not "the" conforming behaviour.

POSIX does not define what format a file must be to succeed in being
exec'd by one of the exec*() commands.   The system can have a thousand
different types that work, if it wants, and #! executables are one of
those.   That they're not required to work by POSIX doesn't mean they're
not allowed to work.

For the rest of your message, the reply I just made to Harald's message
applies.

kre



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Robert Elz via austin-group-l at The Open Group
Date:Sun, 11 Apr 2021 13:25:46 +0100
From:Harald van Dijk 
Message-ID:  


  | > My tests show that ksh, bash, yash, mksh do not find gcc in that case.
  |
  | Huh. My tests with ksh were with 93v, it's possible different versions 
  | behave differently.

I see the same results as Joerg.  I'm using ksh93u.

  | I am assuming that by "do not find gcc" you mean "do not find 
  | /usr/bin/gcc" here.

They give an error (what it is varies) from the attempt to execute
/bin/gcc.

I did a slightly different test (not mangling /bin...)

   $ ls -l /tmp/P?/cmd; cat /tmp/P?/cmd
   -rwxr-xr-x  1 kre  wheel  40 Apr 11 21:45 /tmp/P1/cmd
   -rwxr-xr-x  1 kre  wheel  37 Apr 11 21:46 /tmp/P2/cmd

   #! /not-found
   echo This is /tmp/P1/cmd

   #! /bin/sh
   echo This is /tmp/P2/cmd

I manually added the blank lines in the output there, for this e-mail,
to make it easier to see the results).

And then ran
$SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd;
 type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'

(I actually ran that without the newline in the middle, except for mksh
which otherwise screwed the terminal display of the command, but that should
make no difference either way).


fbsh $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; 
hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'
/tmp/P1/cmd
cmd is a tracked alias for /tmp/P1/cmd
/tmp/P1/cmd
This is /tmp/P2/cmd
/tmp/P1/cmd

nbsh $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; 
hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'
/tmp/P1/cmd
cmd is a tracked alias for /tmp/P1/cmd
/tmp/P1/cmd
This is /tmp/P2/cmd
/tmp/P1/cmd

dash $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; 
hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'
/tmp/P1/cmd
cmd is a tracked alias for /tmp/P1/cmd
/tmp/P1/cmd
This is /tmp/P2/cmd
/tmp/P1/cmd

bosh $  $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; 
hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'
/tmp/P1/cmd
cmd is /tmp/P1/cmd
This is /tmp/P2/cmd
1   1   /tmp/P1/cmd

yash $   $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; 
hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'
/tmp/P1/cmd
cmd: an external command at /tmp/P1/cmd
/tmp/P1/cmd
/home/kre/bin/yash: cannot execute command `cmd' (/tmp/P1/cmd): No such file or 
directory
/tmp/P1/cmd

pdksh $ /grep cmd; cmd; hash | /usr/bin/grep cmd'   
/tmp/P1/cmd
cmd is a tracked alias for /tmp/P1/cmd
cmd=/tmp/P1/cmd
/bin/ksh: cmd: No such file or directory
cmd=/tmp/P1/cmd


mksh $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd;
 type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'
/tmp/P1/cmd
cmd is a tracked alias for /tmp/P1/cmd
cmd=/tmp/P1/cmd
/usr/pkg/bin/mksh: /tmp/P1/cmd: No such file or directory
cmd=/tmp/P1/cmd


ksh93 $  $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; 
hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'
/tmp/P1/cmd
cmd is a tracked alias for /tmp/P1/cmd
cmd=/tmp/P1/cmd
/usr/pkg/bin/ksh93: cmd: not found [No such file or directory]
cmd=/tmp/P1/cmd


zsh $  $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; 
hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'
/tmp/P1/cmd
cmd is /tmp/P1/cmd
zsh:1: /tmp/P1/cmd: bad interpreter: /not-found: no such file or directory
This is /tmp/P2/cmd
cmd=/tmp/P1/cmd

bash5 $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; 
hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd'
/tmp/P1/cmd
cmd is /tmp/P1/cmd
/usr/pkg/bin/bash: /tmp/P1/cmd: /not-found: bad interpreter: No such file or 
directory
   1/tmp/P1/cmd


Note that POSIX says (this is from 8 D1.1 XCU 2.9.1.1 1. e. i.)

Once a utility has been searched for and found (either as a result
of this specific search or as part of an unspecified shell start-up
activity), an implementation may remember its location and need not
search for the utility again unless the PATH variable has been
the subject of an assignment.

Aside from the lack of mention of hash -r there, that much is fine.  It
goes on:

If the remembered location fails for a subsequent invocation,
the shell shall repeat the search to find the new location for
the utility, if any.

Note: "fails" not "utility is is not found at" or similar, and "the shell
shall".

What it means in these circumstances to "repeat the search to find the
new location for the utility, if any" is less clear - but a reasonable
interpretation (adopted by about half the shells) is that it should look
through PATH, see if it can find a copy of the utility that does not
fail to invoke, and invoke that one.Also note that it does not say
that it is OK to replace the remembered location with that of the newly
located command.

  | > [... and] dash execute the correct gcc binary, but still have the
  | > wrong script path in their hash after calling gcc.

Arguably not 

Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Harald van Dijk via austin-group-l at The Open Group
On 11/04/2021 13:02, Joerg Schilling via austin-group-l at The Open 
Group wrote:

"Harald van Dijk via austin-group-l at The Open Group" 
 wrote:


If they are mistakes, they are widespread mistakes. As hinted in the
links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing
as files with execute permission, but /bin/gcc as a text file containing
#!/bad so that any attempt to execute it will fail, there are a lot of
shells where command -v gcc returns /bin/gcc, but running gcc actually
executes /usr/bin/gcc instead without reporting any error: this
behaviour is common to bosh, dash and variants (including mine), ksh,
and zsh.


My tests show that ksh, bash, yash, mksh do not find gcc in that case.


Huh. My tests with ksh were with 93v, it's possible different versions 
behave differently.


I am assuming that by "do not find gcc" you mean "do not find 
/usr/bin/gcc" here.



   bosh and
dash execute the correct gcc binary, but still have the wrong script path in
their hash after calling gcc.

I believe what bosh and dash do is the best behavior. None of the known shells
opens the file with "command -v something" and thus cannot know whether the
content is a script, a useless #! script or even a binary for the wrong
architecture.


Earlier, you did not see the problem that prompted this thread, and now 
you say that the behaviour where command -v lookup does not match 
execution lookup is the best behaviour. I trust that you do see now the 
problem that prompted this thread: there is, in these shells at least, 
no reliable way to perform command lookup separate from execution.



This is a result of the layering that has been introduced in the past 50 years
of UNIX.

If command -v should become able to do more, we would need to invent a way to
execute _any_ utility (regardless of whether it is a binary or script) to
execute in a harmless way without side-effects.


I don't think command -v should do more, I think ordinary command lookup 
should do less. The behaviour of shells of continuing command lookup 
after a failed execve() is not supported by what POSIX says in "Command 
Search and Execution". Command lookup is supposed to stop as soon as "an 
executable file with the specified name and appropriate execution 
permissions is found" (per the referenced "Other Environment Variables", 
"PATH"). In my example that results in /bin/gcc. The shell should 
attempt to execute /bin/gcc, and once that fails, stop.


This is what the other shells do, including bash, and what I intend to 
implement in mine.



There is still a problem: only bosh and ksh could in therory add the right
entry into the hash, since they are using vfork() and could report back the
final result via shared memory. I have that probability in mind for bosh since
I introduced vfork() support to bosh in 2014.


That's an interesting thought. The approach taken by the other shells 
avoids the problem entirely and makes this unnecessary though.


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Joerg Schilling via austin-group-l at The Open Group
"Harald van Dijk via austin-group-l at The Open Group" 
 wrote:

> If they are mistakes, they are widespread mistakes. As hinted in the 
> links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing 
> as files with execute permission, but /bin/gcc as a text file containing 
> #!/bad so that any attempt to execute it will fail, there are a lot of 
> shells where command -v gcc returns /bin/gcc, but running gcc actually 
> executes /usr/bin/gcc instead without reporting any error: this 
> behaviour is common to bosh, dash and variants (including mine), ksh, 
> and zsh.

My tests show that ksh, bash, yash, mksh do not find gcc in that case. bosh and 
dash execute the correct gcc binary, but still have the wrong script path in 
their hash after calling gcc.

I believe what bosh and dash do is the best behavior. None of the known shells 
opens the file with "command -v something" and thus cannot know whether the 
content is a script, a useless #! script or even a binary for the wrong 
architecture. 

This is a result of the layering that has been introduced in the past 50 years 
of UNIX.

If command -v should become able to do more, we would need to invent a way to 
execute _any_ utility (regardless of whether it is a binary or script) to 
execute in a harmless way without side-effects.

There is still a problem: only bosh and ksh could in therory add the right 
entry into the hash, since they are using vfork() and could report back the 
final result via shared memory. I have that probability in mind for bosh since 
I introduced vfork() support to bosh in 2014.

If that was implemented and command -v was used with a well known command like
gcc, there could be a way to get the finally correct result from command -v:

1)  call "gcc --version 2>&1 > /dev/null"

2)  if that resulted in $? == 0, call: "command -v gcc"
The output now could reports what is actually used, in case that
the finally used binary path was reported back via shared memory.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Oğuz via austin-group-l at The Open Group
On Sun, Apr 11, 2021 at 1:47 PM shwaresyst via austin-group-l at The Open
Group  wrote:

> That's bugs in those shells for POSIX mode then, that I see. The
> conforming behavior is /usr/gcc is found and succeeds at doing nothing,
> since it contains just a comment line. Other elements of path never get
> checked. Even in non-POSIX mode, trying to process it as a shebang with
> "/bad" as a ENOEXEC because not present, or other reason, does not imply
> the rest of the path should be searched, it should simply return a failure
> code.


I agree with this. Most of those shells also hash `/bin/gcc' despite
executing `/usr/bin/gcc'. This must have been discussed in #1161, but bash
still seems to be the only one that fails on `/bin/gcc' and doesn't execute
`/usr/bin/gcc'.


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread shwaresyst via austin-group-l at The Open Group
That's bugs in those shells for POSIX mode then, that I see. The conforming 
behavior is /usr/gcc is found and succeeds at doing nothing, since it contains 
just a comment line. Other elements of path never get checked. Even in 
non-POSIX mode, trying to process it as a shebang with "/bad" as a ENOEXEC 
because not present, or other reason, does not imply the rest of the path 
should be searched, it should simply return a failure code.
 
 
  On Sun, Apr 11, 2021 at 6:07 AM, Harald van Dijk via austin-group-l at The 
Open Group wrote:   On 10/04/2021 17:08, Robert 
Elz via austin-group-l at The Open Group wrote:
>      Date:        Sat, 10 Apr 2021 11:54:34 +0200
>      From:        "Jan Hafer via austin-group-l at The Open Group" 
>
>      Message-ID:  <15c15a5b-2808-3c14-7218-885e704cc...@rwth-aachen.de>
> 
>    | my inquiry is a question about the potential unexpected behavior of the
>    | shell execution environment on names. It is related to shortcomings of
>    | the command utility.
> 
> I'm not sure I understand.  I read the rest of the message, and I
> couldn't find anything really about any shortcomings, other than perhaps
> some mistakes in interpretation, and usage.

If they are mistakes, they are widespread mistakes. As hinted in the 
links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing 
as files with execute permission, but /bin/gcc as a text file containing 
#!/bad so that any attempt to execute it will fail, there are a lot of 
shells where command -v gcc returns /bin/gcc, but running gcc actually 
executes /usr/bin/gcc instead without reporting any error: this 
behaviour is common to bosh, dash and variants (including mine), ksh, 
and zsh.

Cheers,
Harald van Dijk

  


Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-11 Thread Harald van Dijk via austin-group-l at The Open Group

On 10/04/2021 17:08, Robert Elz via austin-group-l at The Open Group wrote:

 Date:Sat, 10 Apr 2021 11:54:34 +0200
 From:"Jan Hafer via austin-group-l at The Open Group" 

 Message-ID:  <15c15a5b-2808-3c14-7218-885e704cc...@rwth-aachen.de>

   | my inquiry is a question about the potential unexpected behavior of the
   | shell execution environment on names. It is related to shortcomings of
   | the command utility.

I'm not sure I understand.   I read the rest of the message, and I
couldn't find anything really about any shortcomings, other than perhaps
some mistakes in interpretation, and usage.


If they are mistakes, they are widespread mistakes. As hinted in the 
links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing 
as files with execute permission, but /bin/gcc as a text file containing 
#!/bad so that any attempt to execute it will fail, there are a lot of 
shells where command -v gcc returns /bin/gcc, but running gcc actually 
executes /usr/bin/gcc instead without reporting any error: this 
behaviour is common to bosh, dash and variants (including mine), ksh, 
and zsh.


Cheers,
Harald van Dijk



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-10 Thread Paul Smith via austin-group-l at The Open Group
On Sat, 2021-04-10 at 22:12 +0200, Joerg Schilling via austin-group-l
at The Open Group wrote:
> where   ??? what is that?

I don't know if this is what Jan was thinking of, but "where" exists on
Windows and is equivalent to "which" etc. there.



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-10 Thread Joerg Schilling via austin-group-l at The Open Group
"Jan Hafer via austin-group-l at The Open Group"  
wrote:

> For a short recap why: There are `which, type, command, whence, where, 
> whereis, whatis, hash` used in shells. Worse, the semantics of `which` 
> is shell-dependent.

which   is a csh script and unrelated to Bourne or POSIX shells.
It therefore cannot give useful results in a standard
shell environment.

Even worse: On Linux, "which" may be a program with different
behavior.

typeis built into the shell since 1976. What problems do you
have with it?

command is POSIX standard. What problems do you have with it?

whence  is a ksh specific command and thus non-portable

where   ??? what is that?

whereis does not exist on a typical UNIX system

whatis  is a command that behaves like "man -k"

hashis POSIX standard, it allows to query all hash entries or
to add a potential hash entry.

> As for `command -v`:
> It is not possible to know the path of the executable without executing 
> it. This is however slow, since the shell environment has the paths in 
> memory or it is even cached. It can also have potential unexpected or 
> dangerous side effects, since command behavior (without arguments) and 
> command arguments can be deviating.

This is not correct.

"command -v" does not execute the command. It rather imitates the PATH lookup
procedure without executing the command.

> 1. Are there plans to allow introspection to the shell environment as to 
> prevent failures and enforce basic semantics of words?
> I am thinking of ways to differentiate 1.aliases, 2.functions, 
> 3.builtins and 4.executables in and not in $PATH both machine and 
> human-readable where sufficiently simple (otherwise human readable).
> This list may be incomplete.

This does not seem to be related to the statements you made above.

If you like help, you could make it easier if you could explain more in depth
what you like to do and what exact problems you have.

Jörg

-- 
EMail:jo...@schily.net  Jörg Schilling D-13353 Berlin
Blog: http://schily.blogspot.com/
URL:  http://cdrecord.org/private/ 
http://sourceforge.net/projects/schilytools/files/



Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-10 Thread Robert Elz via austin-group-l at The Open Group
Date:Sat, 10 Apr 2021 23:08:40 +0700
From:"Robert Elz via austin-group-l at The Open Group" 

Message-ID:  <20226.1618070...@jinx.noi.kre.to>

Now to add to something I said in the previous reply (so so the rest
of the audience here don't think I have suddenly changed views...)

This message (unlike the previous one) has nothing to do with your message,
only mine.

  | One of the features of sh is that a
  | name can be an alias, a function, a built command, and several different
  | filesystem commands, all at the same time.   It is neither possible nor
  | desirable to change that.

Actually, it would be possible, and highly desirable, to change that, but
in a limited way, by simply deleting aliases.   Then there would be one less
thing a word could be.But I seem to be in a minority with this desire
(but apart from "well they exist, so we should document them" no-one has
supplied any good reason for keeping aliases.)

kre

ps: to make this perhaps a little relevant to the thread, don't forget that
in shell, a word can also be just a word, it does not have to be any kind of
command at all, and it can be difficult (if not impossible) to look at a
script and tell.  Consider this simple example

read cmd
$cmd echo foo

What is "echo" there?If your answer doesn't depend upon what was
entered in response to the "read" command, then you're wrong.
This ability is useful, but how can you expect a shell to ever parse
that script, without running it, and have any idea at all what it means?




Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-10 Thread Robert Elz via austin-group-l at The Open Group
Date:Sat, 10 Apr 2021 11:54:34 +0200
From:"Jan Hafer via austin-group-l at The Open Group" 

Message-ID:  <15c15a5b-2808-3c14-7218-885e704cc...@rwth-aachen.de>

  | my inquiry is a question about the potential unexpected behavior of the 
  | shell execution environment on names. It is related to shortcomings of 
  | the command utility.

I'm not sure I understand.   I read the rest of the message, and I
couldn't find anything really about any shortcomings, other than perhaps
some mistakes in interpretation, and usage.

  | For a short recap why: There are `which, type, command, whence, where, 
  | whereis, whatis, hash` used in shells. Worse, the semantics of `which` 
  | is shell-dependent.

Yes, the prevalence of these things is somewhat of an issue, but there
is nothing we can do about that really.   People are free to reinvent the
wheel (make it rounder).

'which' started life, long long ago, as a csh utility (a script written in
csh, for use by csh users).   There were enough of them, who grew
familiar with using it, that when they switched to better shells (when
those became available) they continued to use which.  I know that
even I do on occasion.   This wasn't helped by which (kind of) working
even for users of other shells (kind of, in that the original still
looked in the user's .cshrc for assistance, despite that not being
anything used by the shell the user was using).   Since then there
have been a myriad different versions of which.   Best to forget it now
(unless you're still a csh user perhaps).

'type' (which you neglected to mention other than by including it in
this list) is the POSIX sh way to do what which was designed for, for
csh users, and is what ought be used.

'whence' was a (ksh I think) alternative to which, built into some
Bourne style shells, also best now forgotten.   (In some of those
shells, their whence implementation provides their implementation
of 'type' I believe .. but I might be wrong.)

'where' I don't know at all, what's that?

'whereis' is different, it was designed (back in csh days) to tell
where a standard utility is located - rather than where a utility
of a particular name would be run from for a particular user.
It doesn't belong in this list at all, it does a different job.

'whatis' is even further removed, that does man page lookups.

'hash' has an entirely different function, which just happens to
sometimes be able to reveal similar information to type (but only
sometimes) - it should not really be considered in this list
either (though it is more related than whereis or whatis).

  | As for `command -v`:
  | It is not possible to know the path of the executable without executing 
  | it.

This depends entirely on what information you're actually seeking.
It is certainly true that to know for sure whether a particular
executable runs or not, an attempt must be made to run it.   Only
the kernel gets to make the yes/no decision on that, and only an
actual exec() (or posix_spawn) attempt will invoke the appropriate
checks.   (eg: one issue might be that a binary that would run
prefectly fine elsewhere, fails on this particular system, because
it lacks sufficient resources, a 1GB binary (mostly data) cannot run
on a system with 512Mb of ram and no swap space, for example, but
would run just fine on the same system if it had 4GB of ram.)

But that's not what people usually want to know, what is usually
desired is, first, what kind of object is "name" (alias, function,
built-in command, filesystem command) and in the last case, what
path looks to be where it will be located.

If "name" has been executed, we will know what path name was used
to do that, otherwise we find an executable looking file called
"name" somewhere in $PATH and that's it, so, "command -v" certainly
doesn't try to execute anything, but the answer it gives is not
guaranteed to be correct (this is true of all of the (relevant) commands
listed, none of them will actually execute the command once they have
located a path just to verify that it actually works - do do that would
need far more info (like a file to compile if looking for "cc")
than is present in the arg list of any of these commands.

  | This is however slow, since the shell environment has the paths in 
  | memory or it is even cached.

Sorry, I cannot parse that sentence in a way that makes it at all
relevant to what you seem to be saying.

  | It can also have potential unexpected or 
  | dangerous side effects, since command behavior (without arguments) and 
  | command arguments can be deviating.

Yes, exactly, that's why it is never done.   However, if the user has
previously executed the command, and now wants to know what file was
executed (say the command dumped core, and the user needs the binary
to use with a debugger and the core file, so needs the path name of
the binary) then that information can be made available.   But no
shell (or stand-alone command) is ever going to attempt to run 

[Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

2021-04-10 Thread Jan Hafer via austin-group-l at The Open Group

Dear Ladies and Gentlemen,

my inquiry is a question about the potential unexpected behavior of the 
shell execution environment on names. It is related to shortcomings of 
the command utility.

Related page for command utlity semantics:
https://pubs.opengroup.org/onlinepubs/009604599/utilities/command.html

Take for example this question with high activity:
https://unix.stackexchange.com/questions/85249/why-not-use-which-what-to-use-then
and this answer gives a relative good POSIX shell history:
https://unix.stackexchange.com/a/85250/273935

For a short recap why: There are `which, type, command, whence, where, 
whereis, whatis, hash` used in shells. Worse, the semantics of `which` 
is shell-dependent.

As for `command -v`:
It is not possible to know the path of the executable without executing 
it. This is however slow, since the shell environment has the paths in 
memory or it is even cached. It can also have potential unexpected or 
dangerous side effects, since command behavior (without arguments) and 
command arguments can be deviating.


1. Are there plans to allow introspection to the shell environment as to 
prevent failures and enforce basic semantics of words?
I am thinking of ways to differentiate 1.aliases, 2.functions, 
3.builtins and 4.executables in and not in $PATH both machine and 
human-readable where sufficiently simple (otherwise human readable).

This list may be incomplete.

2. In an ideal scenario the semantic of a word can be make constant, so 
no other script or shell invocation running afterwards can change it 
(this would compare to best practices in compiled languages and can be 
cheaply analyzed before execution of a script).
What is the POSIX model for static settings and behavior, which should 
not be overwritten during execution of the operating system?
Does POSIX have any opinion or recommendation how to make SHELL 
scripting robust?


Sincerely,
Jan Philipp Hafer

PS: I am not able to view the current draft status after email 
confirmation here https://www.opengroup.org/austin/login.html

As I understand it,
How frequently are credentials+email subscriber status information 
synchronised between the file hosting server and the general server?