Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
Date:Sun, 11 Apr 2021 22:27:19 +0100 From:Harald van Dijk Message-ID: <79b98e30-46ba-d468-153f-c1a2a0416...@gigawatt.nl> | Okay, but that is a technicality. The pre-seeding is only permitted at | startup time, No, what it says is "an unspecified shell start-up activity". "unspecified" means it can be anything. Anything includes starting a thread which monitors what commands are about to be executed and loads the hash table just in time. Or one which populates the hash table with every possible command every tenth of a micro-second. Anything. It is unspecified. | so cannot depend on the contents of the script. Of course, it can, the script is available at startup time of the shell, the startup activity can read the entire script, parse it, find all the command names and possible command names, and add them to the hash table. Alternatively, it can examine PATH and load every executable in every directory in PATH into the hash table. zsh (seems to) do something like the latter. | Replace gcc by any utility that is not hashed at startup There are none (or none that can be found by a PATH search). | Actually, if hashing commands is only allowed "as a result of this | specific search or as part of an unspecified shell start-up activity", unspecified remember... | then after "hash -r" has executed, before a new command search has been | performed, the hash table must be empty. Not unless the specification for hash says so, and it doesn't. | I want to say this is a theoretical concern, that there are no shells | where hash -r is implemented as doing anything other than clearing the | hash table. I cannot prove this but will be quite disappointed if any to | turn out to do something else. zsh comes close, it appears to empty the hash table on "hash -r", but do anything at all, and it fills up again. And I mean fills. And I understand that - if you're going to search the directories in PATH over and over again, every time a command is executed, better to read them once, and remember what they contain - no more useless I/O. (I vaguely recall deciding that zsh read as many directories as needed to find the command, and then stopped - getting a "command not found" would result in everything possible from PATH now being in the hash table.) | > That is, find an entry for cmd in PATH for which exec() succeeds. | > Only fail if there is none. | | Yes, that is what dash is doing. The way PATH searches should be done. | Well, that is sort of what dash does. dash takes an extra integer that | specifies which PATH component was hashed and uses that as the starting | point for the search, I know. This is irrelevant here. If this algorithm doesn't produce the required results, that would be a bug, and like most bugs, if it is considered serious enough, it can be fixed. The important issue, is that the intent is to examine each element in PATH, until we get success from exec(), (or ENOEXEC with a file we're willing to treat as a script, and so exec a shell to interpret it). So, if there is a /bin/gcc that is "#!/bad" and a later one in path that is a real executable, we should exec the later one, right? The dash (ash in general, at least originally) optimisation is simply to note that if we read the directory, and didn't find the command name there, then there's no point attempting to exec it from that directory, that must fail. If between reading the directory and when the exec attempt was made, someone inserted the command into one of the directories that had been read, then we have a race - and as usual, sometimes one wins, sometimes the other - if you're willing to bias the conditions (handicap) you can force one result or the other (or just make one or the other more likely), or you can make it even more unpredictable. Nothing is wrong here, races have unpredicable results. kre
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
Date:Sun, 11 Apr 2021 20:17:09 + (UTC) From:shwaresyst Message-ID: <1360977422.847706.1618172229...@mail.yahoo.com> | We are talking about the shell, not some bastardization of execve(), | that sees it's not a directly loadable process image so treats it as | a script. shells only do that when the error is ENOEXEC. In the cases we were discussing, it was not. Had there been an ENOEXEC error, then the shells would (all I believe) have attempted to run the file as a script. But that wasn't the error, so they (rightly) did not. The use of #!/bad (and similar) is simply an easy way (on most kernels, ie: those which support #!) to get the exec to fail with an error other than ENOEXEC. It is not the only way, name or path too long are also possibilities, so is a smylink loop, ... there are many possibilities. For those shells implementing shebang as an extension There are a couple that do that, but that's completely irrelevant, that only happens when the kernel doesn't support #! (all that matter do), and the shell is trying to do what any modern kernel would (should) do. Posix might not mandate #! support but the marketplace does. So: it is still them piping the body of the script after the shebang line, without any token expansion, to an alternate interpreter via an exec() of some sort. This is completely immaterial as no-one here is in any way considering this kind of case, we're not getting ENOEXEC errors. Second, conforming applications can not rely on unspecified behaviors, Of course. so having a use beyond that specified makes the shell nonconforming. Nonsense. Shells are allowed to implement extensions. They don't become non-conforming because of that. The reference shell (ksh88) implements extensions after all. Some conforming script authors may simply want the first line to be a # IMPORTANT USAGE NOTE headline, That's a contradiction. A conforming script cannot start that way. You have already been told why. It can start \n#!!! if it wants. It can even start \b#!!! if it wants to pretend (at least to people who look via "cat file") as if it starts #!!!. It cannot start #!anything. | What the standard does allow as an extension, | and I would support adding to the standard, is adding an option | to turn off token expansion in here-doc bodies, What does this have to do with the current discussion? | This allows the effect of shebang to be accomplished anywhere in a script, Nonsense. #! is not really for when shells run commands (though it helps), it is for when other utilities run commands find /where/ever -name something -exec my_cmd {} \; where "my_cmd" is awk, or perl, or python, or tcl, or ... I wasn't here when any austin-group discussions on #! were being held, but it is hard these days to think of any good reason for it not to be included, with the possible exception that executable formats in general are not specified. If that was it, I would think an exception for this one case would make sense. However #! has ***nothing*** to do with the current issue, it's just a tool to use for demonstrating what happens. The same issues can arise in lots of other ways. Please stop confusing things. If you don't understand what we're talking about, please just observe and try to learn something (feel free to ask questions). kre
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
On 11/04/2021 22:05, Robert Elz wrote: Date:Sun, 11 Apr 2021 19:46:36 +0100 From:Harald van Dijk Message-ID: <9ab286f9-125d-55a4-a65f-08d4af04d...@gigawatt.nl> | Sure, that's why I then switched to a different example that did not | have an earlier "command -v" to point out how this leads to inconsistent | behaviour. But while it is possible to (at least probabilistically - it is a hash table after all, effectively a cache) ensure that an entry exists, it is not possible to ensure that one doesn't. Recall this part from POSIX (still 2.9.1.1 1. e. i.) Once a utility has been searched for and found (either as a result of this specific search or as part of an unspecified shell start-up activity), That is, a shell is permitted to pre-seed the hash table at startup time, and if allowed then, exactly when it happens between when main() of the shell is first called, and when a lookup for a command is actually done, is unknowable. That means it is OK for the shell to pre-seed the hash table for a command when the command name is seen, and then it will be there when the search for that command is done. Okay, but that is a technicality. The pre-seeding is only permitted at startup time, so cannot depend on the contents of the script. Replace gcc by any utility that is not hashed at startup and you will still have the same problem. Or, as you say, clear the hash table explicitly. Even hash -r (which removes everything) doesn't guarantee that everything isn't immediately replaced (with up to date values of course) before that command even finishes. Actually, if hashing commands is only allowed "as a result of this specific search or as part of an unspecified shell start-up activity", then after "hash -r" has executed, before a new command search has been performed, the hash table must be empty. I want to say this is a theoretical concern, that there are no shells where hash -r is implemented as doing anything other than clearing the hash table. I cannot prove this but will be quite disappointed if any to turn out to do something else. But all of this is really irrelevant, it is based upon a flawed assumption about what is happening (and even what should happen). What dash and the others, I presume, are doing, is not really the "subsequent command" thing (that was just an interesting argument to make), it is rather an implementation of the original Bourne shell strategy (pre hash table), which was, more or less (not this code, I don't write algol68, just a similar effect): [...] That is, find an entry for cmd in PATH for which exec() succeeds. Only fail if there is none. Yes, that is what dash is doing. The addition of the hash table should allow that algorithm to run faster (with the occasional problem when after a hash entry is created, someone inserts an entry earlier in PATH than it was before) but it should not normally change the outcome of that algorithm. Well, that is sort of what dash does. dash takes an extra integer that specifies which PATH component was hashed and uses that as the starting point for the search, but otherwise it is the same algorithm. So if PATH=/a:/b:/c and the hash table says x is found in /b, the search in the shell child will look for /b/x, and if that fails, /c/x. It will not search for /a/x unless the hash table is cleared. This does not seem useful to me. If the command is no longer present in /b, it should be checked in all PATH components. Commands may legitimately move from /usr/bin to /bin by system upgrades just as well as the other way around. Cheers, Harald van Dijk
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
"shwaresyst via austin-group-l at The Open Group" wrote: > We are talking about the shell, not some bastardization of execve(), that > sees it's not a directly loadable process image so treats it as a script. For > those shells implementing shebang as an extension it is still them piping the > body of the script after the shebang line, without any token expansion, to an > alternate interpreter via an exec() of some sort. Second, conforming > applications can not rely on unspecified behaviors, so having a use beyond > that specified makes the shell nonconforming. Calling it out like that simply > acknowledges a lot of shell implementations choose to make themselves > nonconforming, I do not see it as an endorsement or allowance. The > requirement explicitly specified behavior shall be implemented as specified > takes priority. Some conforming script authors may simply want the first line > to be a# IMPORTANT USAGE NOTE headline, or similar, not want a > utility named "!!!" to be exec'd. You are mistaken again. The only platform that worked like you describe is my old shell "bsh" when run on UNOS (the first UNIX clone). But this was in the 1980s and there is no other similar platform. Today, #!/path is always handled by the kernel. Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
"shwaresyst via austin-group-l at The Open Group" wrote: > No, it's not nonsense. The definition of comment has all characters, > including '!', shall be ignored until newline or end-of-file being > conforming. Then tokenization which might discover an operator, keyword or > command continues. This precludes "#!" being recognized as any of those. > There is NO allowance for '!' being the second character as reserved for > implementation extensions. No, sorry but #!/path is a kernel extension that is permittd by POSIX. The shells handle such a line as comment Also note that the error code from exec*() for a file that contains #!/bad is not ENOEXEC, but ENOENT. This is why the shells continue to search for a potential executable in PATH when they actually try to execute thw binaries. Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
On 11/04/2021 21:17, shwaresyst wrote: The requirement explicitly specified behavior shall be implemented as specified takes priority. Some conforming script authors may simply want the first line to be a # IMPORTANT USAGE NOTE headline, or similar, not want a utility named "!!!" to be exec'd. If you are really saying that when POSIX says "If the first line of a file of shell commands starts with the characters "#!", the results are unspecified.", it actually means the results are well-defined, you are either seriously deluded, or trolling. I cannot tell which and have no interest in wasting time figuring it out. Cheers, Harald van Dijk
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
Date:Sun, 11 Apr 2021 19:46:36 +0100 From:Harald van Dijk Message-ID: <9ab286f9-125d-55a4-a65f-08d4af04d...@gigawatt.nl> | Sure, that's why I then switched to a different example that did not | have an earlier "command -v" to point out how this leads to inconsistent | behaviour. But while it is possible to (at least probabilistically - it is a hash table after all, effectively a cache) ensure that an entry exists, it is not possible to ensure that one doesn't. Recall this part from POSIX (still 2.9.1.1 1. e. i.) Once a utility has been searched for and found (either as a result of this specific search or as part of an unspecified shell start-up activity), That is, a shell is permitted to pre-seed the hash table at startup time, and if allowed then, exactly when it happens between when main() of the shell is first called, and when a lookup for a command is actually done, is unknowable. That means it is OK for the shell to pre-seed the hash table for a command when the command name is seen, and then it will be there when the search for that command is done. Even hash -r (which removes everything) doesn't guarantee that everything isn't immediately replaced (with up to date values of course) before that command even finishes. | Ha, that's bad enough for an interactive shell, but for a | non-interactive shell script that executes gcc and exits if it fails, | retrying wouldn't even work. Hmm - if the shell exits after the first one fails (which isn't usual, "command not found" is just a "may exit") then who cares? It fails, and we're done, what would have happened had we tried again will never be known. But all of this is really irrelevant, it is based upon a flawed assumption about what is happening (and even what should happen). What dash and the others, I presume, are doing, is not really the "subsequent command" thing (that was just an interesting argument to make), it is rather an implementation of the original Bourne shell strategy (pre hash table), which was, more or less (not this code, I don't write algol68, just a similar effect): if (fork() == 0) { /* do redirects, etc - omitted here */ p = copystr(lookup("PATH")); err = 0; do { q = strchr(p, ':'); if (q != NULL) *q++ = '\0'; sprintf(buf, "%s/%s", p, cmd); /* ignore relative paths here */ execve(buf, args, env); /* if we get here, the exec failed */ if (err == 0) /* more complex test really */ err = errno; /* ignore trying to exec /bin/sh on ENOEXEC here */ } while ((p = q) != NULL); fprintf(stderr, "sh: %s: command not found: %s\n", cmd, strerror(err)); exit(err == N || err == M ? 126 : 127); /* specifics omitted */ } [Aside: I know there's lots of errors and omissions there, but you get the model]. That is, find an entry for cmd in PATH for which exec() succeeds. Only fail if there is none. The addition of the hash table should allow that algorithm to run faster (with the occasional problem when after a hash entry is created, someone inserts an entry earlier in PATH than it was before) but it should not normally change the outcome of that algorithm. That's what was originally done, that's what we should still be doing, and that's what the shells that go on to the 2nd gcc or cmd actually do. It makes no difference (should make no difference) whether the name was in the hash table before the command was invoked or not. If any of the shells which do not copy the ksh/bash behaviour aren't doing that, then I'd agree, those are broken. Those that do copy it are simply broken. The command utility (with -v) (and which, whence, type, ...) cannot exec the command so all it can do is find the first entry in PATH which matches. When loading the hash table, the shell has the same limitations. | and no scenario in which I am seeing the dash behaviour as clearly better. Sorry, I am not an optometrist, and cannot assist with your vision problems. kre
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
We are talking about the shell, not some bastardization of execve(), that sees it's not a directly loadable process image so treats it as a script. For those shells implementing shebang as an extension it is still them piping the body of the script after the shebang line, without any token expansion, to an alternate interpreter via an exec() of some sort. Second, conforming applications can not rely on unspecified behaviors, so having a use beyond that specified makes the shell nonconforming. Calling it out like that simply acknowledges a lot of shell implementations choose to make themselves nonconforming, I do not see it as an endorsement or allowance. The requirement explicitly specified behavior shall be implemented as specified takes priority. Some conforming script authors may simply want the first line to be a# IMPORTANT USAGE NOTE headline, or similar, not want a utility named "!!!" to be exec'd. What the standard does allow as an extension, and I would support adding to the standard, is adding an option to turn off token expansion in here-doc bodies, and back on, via set. This allows the effect of shebang to be accomplished anywhere in a script, at the expense of a few extra characters for the here delimiter and set commands, without any other changes to tokenizing or the grammar. On Sun, Apr 11, 2021 at 12:15 PM, Harald van Dijk wrote: On 11/04/2021 17:09, shwaresyst via austin-group-l at The Open Group wrote: > No, it's not nonsense. The definition of comment has all characters, > including '!', shall be ignored until newline or end-of-file being > conforming. Then tokenization which might discover an operator, keyword > or command continues. This precludes "#!" being recognized as any of > those. There is NO allowance for '!' being the second character as > reserved for implementation extensions. This is wrong on two counts. The first is that you're assuming that this will be interpreted by a shell. If execve() succeeds (and the #! line does not name a shell), it will not be interpreted by a shell at all, and the shell syntax for comments is irrelevant. The second is about what happens when it does get interpreted by a shell: POSIX allows shells to treat files starting with "#!" specially: "If the first line of a file of shell commands starts with the characters "#!", the results are unspecified." Cheers, Harald van Dijk
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
On 11/04/2021 17:50, Robert Elz wrote: Date:Sun, 11 Apr 2021 17:04:05 +0100 From:Harald van Dijk Message-ID: <92113e70-5605-10f4-8e57-47c9f64cd...@gigawatt.nl> | This only applies when a remembered location exists at all, though. Yes, but in the examples I showed, it did (you can see that from the output of the hash command before the attempt to execute cmd). It was put there by "command -v". I haven't checked again, but I think all shells do that. Sure, that's why I then switched to a different example that did not have an earlier "command -v" to point out how this leads to inconsistent behaviour. | Then, if you accept that, for consistency, "the shell shall repeat the | search" can only mean to repeat the full search and again stop at the | first file with execute permissions, as it would be batshit crazy to | have a shell that, when presented with "gcc; gcc", for the first gcc | issues an error because /bin/gcc cannot be executed, and for the second | gcc to find /usr/bin/gcc because /bin/gcc failed to execute. Actually, in my, and I suspect most, implementations, even the first will invoke the "subsequent" clause, as the (parent) shell first searches PATH to find the executable, and enters it in the hash table. Then it forks, and the child repeats the whole thing (after redirects etc have all been done). This one is the subsequent search, which starts out with what is already in the hash table (assuming the command was found at all) and then if that fails, goes ahead and looks for another. That is an implementation detail. As far as POSIX is concerned, there is only a single command search when a command is executed, so "a subsequent invocation" can only refer to the shell script attempting to execute the same command again at a later time. POSIX does not even require the shell to fork at all, the shell may use some other system-specific way of creating a new process. This isn't hypothetical, such other system-specific ways of creating new processes were the reason posix_spawn was added, and posix_spawn appears to be used by at least one shell (ksh). | I am pretty sure you are not suggesting that that is reasonable, Don't be too sure, I would not object to an implementation that did work the way you described, and I suspect most users wouldn't either. We're a pragmatic bunch, if something goes wrong the first time, and fixes itself the second time (and subsequently), people tend to be fairly happy. Not deliriously, just fairly... Ha, that's bad enough for an interactive shell, but for a non-interactive shell script that executes gcc and exits if it fails, retrying wouldn't even work. | I think that is easier to explain than the other way around, myself. | Suppose PATH is intentionally modified so that an uClibc-linked version | of GCC appears first in $PATH, but the user messed up, For almost everything we do, we can find instances where the results are sub-optimal. Throwing away everything where that could occur leaves us with almost nothing. The best way to avoid this would be to remove PATH completely (not revert to the Thompson shell fixed search path, but require all commands to be always specified by full pathname). I doubt that would be well received as a solution however. If it were a case of choosing your poison, then sure, but we do now have multiple benefits in this thread of the bash behaviour, and no scenario in which I am seeing the dash behaviour as clearly better. If possible, I will stick with choosing no poison. Cheers, Harald van Dijk
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
"Stephane Chazelas via austin-group-l at The Open Group" wrote: > 2021-04-10 22:12:47 +0200, Joerg Schilling via austin-group-l at The Open > Group: > > "Jan Hafer via austin-group-l at The Open Group" > > wrote: > > > > > For a short recap why: There are `which, type, command, whence, where, > > > whereis, whatis, hash` used in shells. Worse, the semantics of `which` > > > is shell-dependent. > > > > which is a csh script and unrelated to Bourne or POSIX shells. > > It therefore cannot give useful results in a standard > > shell environment. > > > > Even worse: On Linux, "which" may be a program with different > > behavior. > > The OS kernel is hardly relevant here. Various Linux-based OSes Did I write Linux kernel? If you tell other people they are not 100% precise, please carefully read what you are replying to. > use various implementations of "which". On Debian-based systems, > these days, it's implemented as a POSIX sh script (regardless of > whether Linux (most common by far), kFreeBSD, Hurd, Illumos... > is used as the kernel) Do you like to say Illumos did replace the original csh script by something that is incompatible? I cannot confirm that. > > > > typeis built into the shell since 1976. What problems do you > > have with it? > > No, actually type was added to the Bourne shell in SVR2 released > in 1984, and had that problem that it would not return failure > when failing to find a command (a bug which survived well into > the 90s on some OSes IIRC). OK, I did forgot to first check Sven Maschek and just wrote what I had in mind. > The fact that "which" came first largely explains why it's still > more popular (even if more broken and less useful in shells > other than tcsh/zsh) than "type". I did use "which" in the early 1980s, but at that time, the Bourne Shell was not a nice interactive shell, so I used my old "bsh". In 1986, SunOS switched to the SYSV Bourne Shell and that had "type". That is really a long time ago. Most people who currently belive "which" is a good idea, did use diapers in 1986. So that does not seem to be the problem. I guess the reason for the problem we see today is caused by bad advise from the internet. > > command is POSIX standard. What problems do you have with it? > > Technically, a "command" builtin was added to zsh first in 1990. > POSIX.2 introduced a "command" builtin with different > semantics for sh in 1992. Interesting: command indeed was added to the POSIX variant of ksh88 in 1995. I thought it was a ksh88 invention. > Most of that and much more was already mentioned at > https://unix.stackexchange.com/questions/85249/why-not-use-which-what-to-use-then > as referenced in the OP's original message. That was too long to read. Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
Date:Sun, 11 Apr 2021 17:04:05 +0100 From:Harald van Dijk Message-ID: <92113e70-5605-10f4-8e57-47c9f64cd...@gigawatt.nl> | This only applies when a remembered location exists at all, though. Yes, but in the examples I showed, it did (you can see that from the output of the hash command before the attempt to execute cmd). It was put there by "command -v". I haven't checked again, but I think all shells do that. | Then, if you accept that, for consistency, "the shell shall repeat the | search" can only mean to repeat the full search and again stop at the | first file with execute permissions, as it would be batshit crazy to | have a shell that, when presented with "gcc; gcc", for the first gcc | issues an error because /bin/gcc cannot be executed, and for the second | gcc to find /usr/bin/gcc because /bin/gcc failed to execute. Actually, in my, and I suspect most, implementations, even the first will invoke the "subsequent" clause, as the (parent) shell first searches PATH to find the executable, and enters it in the hash table. Then it forks, and the child repeats the whole thing (after redirects etc have all been done). This one is the subsequent search, which starts out with what is already in the hash table (assuming the command was found at all) and then if that fails, goes ahead and looks for another. | I am pretty sure you are not suggesting that that is reasonable, Don't be too sure, I would not object to an implementation that did work the way you described, and I suspect most users wouldn't either. We're a pragmatic bunch, if something goes wrong the first time, and fixes itself the second time (and subsequently), people tend to be fairly happy. Not deliriously, just fairly... | I think that is easier to explain than the other way around, myself. | Suppose PATH is intentionally modified so that an uClibc-linked version | of GCC appears first in $PATH, but the user messed up, For almost everything we do, we can find instances where the results are sub-optimal. Throwing away everything where that could occur leaves us with almost nothing. The best way to avoid this would be to remove PATH completely (not revert to the Thompson shell fixed search path, but require all commands to be always specified by full pathname). I doubt that would be well received as a solution however. kre
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
2021-04-10 22:12:47 +0200, Joerg Schilling via austin-group-l at The Open Group: > "Jan Hafer via austin-group-l at The Open Group" > wrote: > > > For a short recap why: There are `which, type, command, whence, where, > > whereis, whatis, hash` used in shells. Worse, the semantics of `which` > > is shell-dependent. > > which is a csh script and unrelated to Bourne or POSIX shells. > It therefore cannot give useful results in a standard > shell environment. > > Even worse: On Linux, "which" may be a program with different > behavior. The OS kernel is hardly relevant here. Various Linux-based OSes use various implementations of "which". On Debian-based systems, these days, it's implemented as a POSIX sh script (regardless of whether Linux (most common by far), kFreeBSD, Hurd, Illumos... is used as the kernel) > > type is built into the shell since 1976. What problems do you > have with it? No, actually type was added to the Bourne shell in SVR2 released in 1984, and had that problem that it would not return failure when failing to find a command (a bug which survived well into the 90s on some OSes IIRC). The fact that "which" came first largely explains why it's still more popular (even if more broken and less useful in shells other than tcsh/zsh) than "type". > command is POSIX standard. What problems do you have with it? Technically, a "command" builtin was added to zsh first in 1990. POSIX.2 introduced a "command" builtin with different semantics for sh in 1992. > whenceis a ksh specific command and thus non-portable > > where ??? what is that? A builtin of tcsh (since 1991) and zsh. In zsh, it's the same as which -a, "which" being the same as whence -c. > whereis does not exist on a typical UNIX system whereis was added to 3BSD at the same time as which. > > whatisis a command that behaves like "man -k" [...] The type builtin was renamed to whatis in research Unix V8 sh (1985), based on SVR2's shell and extended. That's different from 2BSD's whatis command (1979, by Bill Joy, csh/vi's author) that grep'ed /usr/lib/whatis, a man page index (itself originally generated by a makewhatis csh script). Most of that and much more was already mentioned at https://unix.stackexchange.com/questions/85249/why-not-use-which-what-to-use-then as referenced in the OP's original message. -- Stephane
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
On 11/04/2021 17:09, shwaresyst via austin-group-l at The Open Group wrote: No, it's not nonsense. The definition of comment has all characters, including '!', shall be ignored until newline or end-of-file being conforming. Then tokenization which might discover an operator, keyword or command continues. This precludes "#!" being recognized as any of those. There is NO allowance for '!' being the second character as reserved for implementation extensions. This is wrong on two counts. The first is that you're assuming that this will be interpreted by a shell. If execve() succeeds (and the #! line does not name a shell), it will not be interpreted by a shell at all, and the shell syntax for comments is irrelevant. The second is about what happens when it does get interpreted by a shell: POSIX allows shells to treat files starting with "#!" specially: "If the first line of a file of shell commands starts with the characters "#!", the results are unspecified." Cheers, Harald van Dijk
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
No, it's not nonsense. The definition of comment has all characters, including '!', shall be ignored until newline or end-of-file being conforming. Then tokenization which might discover an operator, keyword or command continues. This precludes "#!" being recognized as any of those. There is NO allowance for '!' being the second character as reserved for implementation extensions. On Sun, Apr 11, 2021 at 11:37 AM, Robert Elz wrote: Date: Sun, 11 Apr 2021 10:46:48 + (UTC) From: shwaresyst Message-ID: <1413127944.766378.1618138008...@mail.yahoo.com> | That's bugs in those shells for POSIX mode then, that I see. That's nonsense. | The conforming behavior is /usr/gcc is found and succeeds at doing nothing, Nonsense. That would be a conforming behaviour, it is not "the" conforming behaviour. POSIX does not define what format a file must be to succeed in being exec'd by one of the exec*() commands. The system can have a thousand different types that work, if it wants, and #! executables are one of those. That they're not required to work by POSIX doesn't mean they're not allowed to work. For the rest of your message, the reply I just made to Harald's message applies. kre
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
On 11/04/2021 16:33, Robert Elz wrote: Date:Sun, 11 Apr 2021 13:25:46 +0100 From:Harald van Dijk Message-ID: | > My tests show that ksh, bash, yash, mksh do not find gcc in that case. | | Huh. My tests with ksh were with 93v, it's possible different versions | behave differently. I see the same results as Joerg. I'm using ksh93u. Interesting. Will need to re-test with that later. [...] Note that POSIX says (this is from 8 D1.1 XCU 2.9.1.1 1. e. i.) Once a utility has been searched for and found (either as a result of this specific search or as part of an unspecified shell start-up activity), an implementation may remember its location and need not search for the utility again unless the PATH variable has been the subject of an assignment. Aside from the lack of mention of hash -r there, that much is fine. It goes on: If the remembered location fails for a subsequent invocation, the shell shall repeat the search to find the new location for the utility, if any. Note: "fails" not "utility is is not found at" or similar, and "the shell shall". What it means in these circumstances to "repeat the search to find the new location for the utility, if any" is less clear - but a reasonable interpretation (adopted by about half the shells) is that it should look through PATH, see if it can find a copy of the utility that does not fail to invoke, and invoke that one.Also note that it does not say that it is OK to replace the remembered location with that of the newly located command. This only applies when a remembered location exists at all, though. If no remembered location exists, the invocation is not a "subsequent invocation" and the paragraph does not apply. Then, if you accept that, for consistency, "the shell shall repeat the search" can only mean to repeat the full search and again stop at the first file with execute permissions, as it would be batshit crazy to have a shell that, when presented with "gcc; gcc", for the first gcc issues an error because /bin/gcc cannot be executed, and for the second gcc to find /usr/bin/gcc because /bin/gcc failed to execute. I am pretty sure you are not suggesting that that is reasonable, but I think that is a bad consequence of your interpretation of the wording. And if "shall repeat the search" does refer to the exact same search that was initially performed, then "Once a utility has been searched for and found [...] an implementation may remember its location" arguably applies to that repeated search as well, but that is less clear. You have asked questions about that later on. They are good questions to think about. I am not sure about those yet, so am skipping them for now. I agree with that. Nothing else is rationally possible, except failing to exec the command (like bash and the ksh's do), but it is hard to explain how failing to run a command when one that is runnable exists in $PATH, is a better outcome than running it. I think that is easier to explain than the other way around, myself. Suppose PATH is intentionally modified so that an uClibc-linked version of GCC appears first in $PATH, but the user messed up, the dynamic linker of uClibc is actually not yet installed, or is installed in the wrong location. It is clearly the user's intention to execute the uClibc-linked version, and attempting to execute that and reporting the error is what bash and others would do. Silently executing some other version that the user didn't want is, in my opinion, doing the user a disservice. (Disclaimer: I am not certain whether all shells would treat this exactly the same way as the '#!/bad' example.) Cheers, Harald van Dijk
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
Date:Sun, 11 Apr 2021 10:46:48 + (UTC) From:shwaresyst Message-ID: <1413127944.766378.1618138008...@mail.yahoo.com> | That's bugs in those shells for POSIX mode then, that I see. That's nonsense. | The conforming behavior is /usr/gcc is found and succeeds at doing nothing, Nonsense. That would be a conforming behaviour, it is not "the" conforming behaviour. POSIX does not define what format a file must be to succeed in being exec'd by one of the exec*() commands. The system can have a thousand different types that work, if it wants, and #! executables are one of those. That they're not required to work by POSIX doesn't mean they're not allowed to work. For the rest of your message, the reply I just made to Harald's message applies. kre
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
Date:Sun, 11 Apr 2021 13:25:46 +0100 From:Harald van Dijk Message-ID: | > My tests show that ksh, bash, yash, mksh do not find gcc in that case. | | Huh. My tests with ksh were with 93v, it's possible different versions | behave differently. I see the same results as Joerg. I'm using ksh93u. | I am assuming that by "do not find gcc" you mean "do not find | /usr/bin/gcc" here. They give an error (what it is varies) from the attempt to execute /bin/gcc. I did a slightly different test (not mangling /bin...) $ ls -l /tmp/P?/cmd; cat /tmp/P?/cmd -rwxr-xr-x 1 kre wheel 40 Apr 11 21:45 /tmp/P1/cmd -rwxr-xr-x 1 kre wheel 37 Apr 11 21:46 /tmp/P2/cmd #! /not-found echo This is /tmp/P1/cmd #! /bin/sh echo This is /tmp/P2/cmd I manually added the blank lines in the output there, for this e-mail, to make it easier to see the results). And then ran $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' (I actually ran that without the newline in the middle, except for mksh which otherwise screwed the terminal display of the command, but that should make no difference either way). fbsh $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd is a tracked alias for /tmp/P1/cmd /tmp/P1/cmd This is /tmp/P2/cmd /tmp/P1/cmd nbsh $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd is a tracked alias for /tmp/P1/cmd /tmp/P1/cmd This is /tmp/P2/cmd /tmp/P1/cmd dash $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd is a tracked alias for /tmp/P1/cmd /tmp/P1/cmd This is /tmp/P2/cmd /tmp/P1/cmd bosh $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd is /tmp/P1/cmd This is /tmp/P2/cmd 1 1 /tmp/P1/cmd yash $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd: an external command at /tmp/P1/cmd /tmp/P1/cmd /home/kre/bin/yash: cannot execute command `cmd' (/tmp/P1/cmd): No such file or directory /tmp/P1/cmd pdksh $ /grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd is a tracked alias for /tmp/P1/cmd cmd=/tmp/P1/cmd /bin/ksh: cmd: No such file or directory cmd=/tmp/P1/cmd mksh $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd is a tracked alias for /tmp/P1/cmd cmd=/tmp/P1/cmd /usr/pkg/bin/mksh: /tmp/P1/cmd: No such file or directory cmd=/tmp/P1/cmd ksh93 $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd is a tracked alias for /tmp/P1/cmd cmd=/tmp/P1/cmd /usr/pkg/bin/ksh93: cmd: not found [No such file or directory] cmd=/tmp/P1/cmd zsh $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd is /tmp/P1/cmd zsh:1: /tmp/P1/cmd: bad interpreter: /not-found: no such file or directory This is /tmp/P2/cmd cmd=/tmp/P1/cmd bash5 $ $SHELL -c 'PATH=/tmp/P1:/tmp/P2; command -v cmd; type cmd; hash|/usr/bin/grep cmd; cmd; hash | /usr/bin/grep cmd' /tmp/P1/cmd cmd is /tmp/P1/cmd /usr/pkg/bin/bash: /tmp/P1/cmd: /not-found: bad interpreter: No such file or directory 1/tmp/P1/cmd Note that POSIX says (this is from 8 D1.1 XCU 2.9.1.1 1. e. i.) Once a utility has been searched for and found (either as a result of this specific search or as part of an unspecified shell start-up activity), an implementation may remember its location and need not search for the utility again unless the PATH variable has been the subject of an assignment. Aside from the lack of mention of hash -r there, that much is fine. It goes on: If the remembered location fails for a subsequent invocation, the shell shall repeat the search to find the new location for the utility, if any. Note: "fails" not "utility is is not found at" or similar, and "the shell shall". What it means in these circumstances to "repeat the search to find the new location for the utility, if any" is less clear - but a reasonable interpretation (adopted by about half the shells) is that it should look through PATH, see if it can find a copy of the utility that does not fail to invoke, and invoke that one.Also note that it does not say that it is OK to replace the remembered location with that of the newly located command. | > [... and] dash execute the correct gcc binary, but still have the | > wrong script path in their hash after calling gcc. Arguably not
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
On 11/04/2021 13:02, Joerg Schilling via austin-group-l at The Open Group wrote: "Harald van Dijk via austin-group-l at The Open Group" wrote: If they are mistakes, they are widespread mistakes. As hinted in the links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing as files with execute permission, but /bin/gcc as a text file containing #!/bad so that any attempt to execute it will fail, there are a lot of shells where command -v gcc returns /bin/gcc, but running gcc actually executes /usr/bin/gcc instead without reporting any error: this behaviour is common to bosh, dash and variants (including mine), ksh, and zsh. My tests show that ksh, bash, yash, mksh do not find gcc in that case. Huh. My tests with ksh were with 93v, it's possible different versions behave differently. I am assuming that by "do not find gcc" you mean "do not find /usr/bin/gcc" here. bosh and dash execute the correct gcc binary, but still have the wrong script path in their hash after calling gcc. I believe what bosh and dash do is the best behavior. None of the known shells opens the file with "command -v something" and thus cannot know whether the content is a script, a useless #! script or even a binary for the wrong architecture. Earlier, you did not see the problem that prompted this thread, and now you say that the behaviour where command -v lookup does not match execution lookup is the best behaviour. I trust that you do see now the problem that prompted this thread: there is, in these shells at least, no reliable way to perform command lookup separate from execution. This is a result of the layering that has been introduced in the past 50 years of UNIX. If command -v should become able to do more, we would need to invent a way to execute _any_ utility (regardless of whether it is a binary or script) to execute in a harmless way without side-effects. I don't think command -v should do more, I think ordinary command lookup should do less. The behaviour of shells of continuing command lookup after a failed execve() is not supported by what POSIX says in "Command Search and Execution". Command lookup is supposed to stop as soon as "an executable file with the specified name and appropriate execution permissions is found" (per the referenced "Other Environment Variables", "PATH"). In my example that results in /bin/gcc. The shell should attempt to execute /bin/gcc, and once that fails, stop. This is what the other shells do, including bash, and what I intend to implement in mine. There is still a problem: only bosh and ksh could in therory add the right entry into the hash, since they are using vfork() and could report back the final result via shared memory. I have that probability in mind for bosh since I introduced vfork() support to bosh in 2014. That's an interesting thought. The approach taken by the other shells avoids the problem entirely and makes this unnecessary though. Cheers, Harald van Dijk
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
"Harald van Dijk via austin-group-l at The Open Group" wrote: > If they are mistakes, they are widespread mistakes. As hinted in the > links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing > as files with execute permission, but /bin/gcc as a text file containing > #!/bad so that any attempt to execute it will fail, there are a lot of > shells where command -v gcc returns /bin/gcc, but running gcc actually > executes /usr/bin/gcc instead without reporting any error: this > behaviour is common to bosh, dash and variants (including mine), ksh, > and zsh. My tests show that ksh, bash, yash, mksh do not find gcc in that case. bosh and dash execute the correct gcc binary, but still have the wrong script path in their hash after calling gcc. I believe what bosh and dash do is the best behavior. None of the known shells opens the file with "command -v something" and thus cannot know whether the content is a script, a useless #! script or even a binary for the wrong architecture. This is a result of the layering that has been introduced in the past 50 years of UNIX. If command -v should become able to do more, we would need to invent a way to execute _any_ utility (regardless of whether it is a binary or script) to execute in a harmless way without side-effects. There is still a problem: only bosh and ksh could in therory add the right entry into the hash, since they are using vfork() and could report back the final result via shared memory. I have that probability in mind for bosh since I introduced vfork() support to bosh in 2014. If that was implemented and command -v was used with a well known command like gcc, there could be a way to get the finally correct result from command -v: 1) call "gcc --version 2>&1 > /dev/null" 2) if that resulted in $? == 0, call: "command -v gcc" The output now could reports what is actually used, in case that the finally used binary path was reported back via shared memory. Jörg -- EMail:jo...@schily.net Jörg Schilling D-13353 Berlin Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sourceforge.net/projects/schilytools/files/
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
On Sun, Apr 11, 2021 at 1:47 PM shwaresyst via austin-group-l at The Open Group wrote: > That's bugs in those shells for POSIX mode then, that I see. The > conforming behavior is /usr/gcc is found and succeeds at doing nothing, > since it contains just a comment line. Other elements of path never get > checked. Even in non-POSIX mode, trying to process it as a shebang with > "/bad" as a ENOEXEC because not present, or other reason, does not imply > the rest of the path should be searched, it should simply return a failure > code. I agree with this. Most of those shells also hash `/bin/gcc' despite executing `/usr/bin/gcc'. This must have been discussed in #1161, but bash still seems to be the only one that fails on `/bin/gcc' and doesn't execute `/usr/bin/gcc'.
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
That's bugs in those shells for POSIX mode then, that I see. The conforming behavior is /usr/gcc is found and succeeds at doing nothing, since it contains just a comment line. Other elements of path never get checked. Even in non-POSIX mode, trying to process it as a shebang with "/bad" as a ENOEXEC because not present, or other reason, does not imply the rest of the path should be searched, it should simply return a failure code. On Sun, Apr 11, 2021 at 6:07 AM, Harald van Dijk via austin-group-l at The Open Group wrote: On 10/04/2021 17:08, Robert Elz via austin-group-l at The Open Group wrote: > Date: Sat, 10 Apr 2021 11:54:34 +0200 > From: "Jan Hafer via austin-group-l at The Open Group" > > Message-ID: <15c15a5b-2808-3c14-7218-885e704cc...@rwth-aachen.de> > > | my inquiry is a question about the potential unexpected behavior of the > | shell execution environment on names. It is related to shortcomings of > | the command utility. > > I'm not sure I understand. I read the rest of the message, and I > couldn't find anything really about any shortcomings, other than perhaps > some mistakes in interpretation, and usage. If they are mistakes, they are widespread mistakes. As hinted in the links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing as files with execute permission, but /bin/gcc as a text file containing #!/bad so that any attempt to execute it will fail, there are a lot of shells where command -v gcc returns /bin/gcc, but running gcc actually executes /usr/bin/gcc instead without reporting any error: this behaviour is common to bosh, dash and variants (including mine), ksh, and zsh. Cheers, Harald van Dijk
Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]
On 10/04/2021 17:08, Robert Elz via austin-group-l at The Open Group wrote: Date:Sat, 10 Apr 2021 11:54:34 +0200 From:"Jan Hafer via austin-group-l at The Open Group" Message-ID: <15c15a5b-2808-3c14-7218-885e704cc...@rwth-aachen.de> | my inquiry is a question about the potential unexpected behavior of the | shell execution environment on names. It is related to shortcomings of | the command utility. I'm not sure I understand. I read the rest of the message, and I couldn't find anything really about any shortcomings, other than perhaps some mistakes in interpretation, and usage. If they are mistakes, they are widespread mistakes. As hinted in the links, with PATH=/bin:/usr/bin, /bin/gcc and /usr/bin/gcc both existing as files with execute permission, but /bin/gcc as a text file containing #!/bad so that any attempt to execute it will fail, there are a lot of shells where command -v gcc returns /bin/gcc, but running gcc actually executes /usr/bin/gcc instead without reporting any error: this behaviour is common to bosh, dash and variants (including mine), ksh, and zsh. Cheers, Harald van Dijk
[Online Pubs 0001465]: trap synopsis missing newline
A NOTE has been added to this issue. == https://austingroupbugs.net/view.php?id=1465 == Reported By:mihai_moldovan Assigned To: == Project:Online Pubs Issue ID: 1465 Category: Shell and Utilities Type: Error Severity: Editorial Priority: normal Status: New Name: Mihai Moldovan Organization: User Reference: URL: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#trap Section:trap == Date Submitted: 2021-04-11 06:18 UTC Last Modified: 2021-04-11 08:16 UTC == Summary:trap synopsis missing newline == -- (0005310) Don Cragun (manager) - 2021-04-11 08:16 https://austingroupbugs.net/view.php?id=1465#c5310 -- It displays correctly in the PDF of the 2017 version of the standard (2008 + TC1 + TC2) on P2420 L77484-77485. Issue History Date ModifiedUsername FieldChange == 2021-04-11 06:18 mihai_moldovan New Issue 2021-04-11 06:18 mihai_moldovan Name => Mihai Moldovan 2021-04-11 06:18 mihai_moldovan URL => https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#trap 2021-04-11 06:18 mihai_moldovan Section => trap 2021-04-11 08:16 Don Cragun Note Added: 0005310 ==
[Online Pubs 0001465]: trap synopsis missing newline
The following issue has been SUBMITTED. == https://www.austingroupbugs.net/view.php?id=1465 == Reported By:mihai_moldovan Assigned To: == Project:Online Pubs Issue ID: 1465 Category: Shell and Utilities Type: Error Severity: Editorial Priority: normal Status: New Name: Mihai Moldovan Organization: User Reference: URL: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#trap Section:trap == Date Submitted: 2021-04-11 06:18 UTC Last Modified: 2021-04-11 06:18 UTC == Summary:trap synopsis missing newline Description: The trap utility has two different forms, which are described in the text later on. However, the synopsis currently lists just one weird form, since both are concatenated there. Desired Action: Add newline between the two different forms of the trap utility. == Issue History Date ModifiedUsername FieldChange == 2021-04-11 06:18 mihai_moldovan New Issue 2021-04-11 06:18 mihai_moldovan Name => Mihai Moldovan 2021-04-11 06:18 mihai_moldovan URL => https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#trap 2021-04-11 06:18 mihai_moldovan Section => trap ==