A while ago it was suggested (perhaps in some private mail) that perhaps all of the builtin sh commands should have man pages of their own, rather than only the ones which are also implemented as external commands.
As things are growing (including the sh man page) and partly inspired by the possible sh enhancement described below, I have been wondering more if perhaps this would not be an improvement. We wouldn't necessarily need to document every sh builtin command in a separate page (I'd not do most of the special builtins as one exclusion, and perhaps not a few more that are too simple to need it (like say inputrc, jobid and jobs .. just as examples) but I suspect that cd fc getopts hash read ulimit at least, and (from below) wait could all usefully have man pages of their own with a consequent reduction to the size of sh(1) - just the getopts section of sh(1) is kind of long ... I would expect each of these man pages would contain a specific warning that the command defined is a built-in command from the shell, and that details might differ from shell to shell. It would also refer readers to their own shell's man page, and explain that what is documented in the page in question should only be used if the relevant shell's man page references it. The man pages could contain sections for each shell we want to document in the NetBSD base (there are sh ksh and csh) if that seems useful, and if the shell in question implements the relevant built-in, to document the specific variations that a particular shell implements. First question is does this sound reasonable? (For this, you can assume that I will be doing the work, at least initially, for sh(1) references, so by agreeing it is a good idea you would not be volunteering!) Second, assuming this happens (even if you say "no" to the first question, please answer this one as if the answer to the previous is affirmative) what the man pages be called? They could just be cd.1 getopts.1 (etc) or we could invent a new suffix, perhaps 1S or 1sh and have cd.1S (etc) to mark these pages as distinct from the normal xxx(1) wich generally documents something that can be found via a $PATH search. Or something different. [For the 1S approach, someone (else) would need to fix the man (etc) commands/config to make it work.] Second issue, you might have guessed from the hint above that the proposed extension is to wait (the sh command) - which would require its section of the sh(1) to get bigger... The following (or something like it) is what I am thinking of (discussion of why following the text). Note: this is yet to be spell checked, grammar checked, ... so you can just ignore (or not) that kind of issue - on the other hand, if none of what is here makes any sense at all (ie: you cannot work out what it is all about) then do say so ... I know what I mean it to say, so when I read it, it clearly says that! wait [-n] [-p var] [job ...] Wait for the specified jobs to complete and return the exit status of the last job to exit, or 127 if none of the jobs are a current child of the shell. If no jobs argument is given, wait for all jobs to complete and then return an exit status of zero (including when there were no jobs, and so nothing exited.) With the -n option, wait instead for any one of the given jobs, or if none are given, any job, to complete, and return the exit status of that job. If none of the given job arguments is a current child of the shell, or if no job arguments are given and the shell has no unwaited for children, then the exit status will be 127. The -p var option allows the process (or job) identifier of the job for which the exit status is returned to be obtained. The variable named (which must not be readonly) will be unset initially, and then set to the identifier from the arg list (if given) of the job that exited, or the process identifier of the job to exit when used with -n and no job arguments. Note that -p with neither -n nor job arguments is useless, as in that case no job status is returned, the variable named is simply unset. If the wait is interrupted by a signal, its exit status will be greater than 128. Once waited upon, by specific process number or job-id, or by a wait with no arguments, knowledge of the child is removed from the system, and it cannot be waited upon again. In that, the first 2 paragraphs are intended to describe the status quo, the next two are the proposed extension, the final 2 paragraphs also document the current wait command.) The sh(1) wait command dates from the very early days of unix (not sure how far back, before my time, and that's saying something, but certainly early to mid 70's) and has changed very little since. The only enhancement since then has been the addition of the job (or pid) args so it is possible to wait until (one or more) specific process(es) have finished, rather than just everything, which was all that was possible before. [An aside: our current man page says it allows just "wait [job]" as if only one job arg is permitted - that isn't posix conformant, and isn't what is implemented either, so the doc needs fixing in any case.] In the meantime the (now) wait(2) family of syscalls has been one of the most extended of all, with wait3() followed by wait4() and waitpid(), and more recently waitid() and wait6(). All of the new ones have an options arg, with flag bits, which have also been extended (new options invented) over time. It is (way beyond) time for wait(1) (the sh built-in) to do some catching up. This is also inspired by (at least one) of my scripts that wants to run processes, then when one finishes (any one) start a replacement, so I have the need to wait for any process to exit, not any specific process, and certainly not all of them (but sometimes perhaps, one of a subset of those running.) For this, bash already has "wait -n" though I am not aware of any other shells yet to copy it. Bash's -n is (I believe) currently an alternative to the list of processes (or jobs) as an arg, but I see no reason that the two should be exclusive, "wait -n" simply waits for any child to complete, "wait -n p1 p2 p3" can wait for any of the listed processes to exit. Bash is lacking a mechanism to discover which process exited however, which makes "wait -n" a little less useful (not useless, there are always ways, like sending kill -0 at all known children and finding out which one now returns ESRCH) but it seemed to me that since the shell internally knows which process terminated, it should provide some means for the script to find out more easily. Hence the "-p var" (-p for "pid"). (In the current implementation, and yes, all of this is implemented already) the result returned in var is actually the arg string that was passed as the operand when "job..." operands are given, that is both generally easier to do, and also seems to be more consistent, if you say "wait -n -p job %1 %2 %3", after it returns (without error) $job will be one of "%1" "%2" or "%3". This could be changed to always be the decimal pid (value that was available in $! when the background job started) if that seems better (that was what I coded initially) or we could have an option to choose (I'd kind of prefer not that.) I have discussed this (briefly, and a while ago) with Chet Ramey, and he has it on his "things to consider for bash later when there is time" list, and -p was (as I recall) agreed as a reasonable option name. I'd appreciate opinions on this, is it a reasonable thing to do? Is it being done in the right way if it is? I also notice now (and will fix soon in my uncommitted copy) that the text: Wait for the specified jobs to complete and return the exit status of the last job to exit, from above is not actually what should happen (nor what does happen), sh(1) waits for each job to exit, in the order given, and then returns the status from the last of them (the last on the command line.) That's what posix specifies, and that is what we do (while each job will produce a 127 return code if the child named does not exist - either never did as a child of this shell, or has already been waited for - or in an interactive shell, has had its status reported as "Done" - or anything else that indicates it no longer exists, that 127 is only observable when it happens from the final job listed.) To perhaps avoid one kind of obvious question, while the wait command and the wait system call are obviously related, the command does not necessarily perform the system call, nor is the system call only used (wrt background jobs) when the command is invoked - sh(1) cleans up child processes (zombies) whenever it notices that they have finished, but then remembers the status to report when the script does a wait command, if all jobs listed on the command line have already been detected as finished, then no wait system call will be performed. The "does not exist" in the previous paragraph relates to what the shell has in its data struct about the process, not what is in the kernel process table. So, on all of this, your thoughts please. In the meantime, the code for this needs more extensive testing before it is committed (if there is reasonable agreement that doing do is a good thing), and the doc (obviously) needs improvements, so I'll be doing that... kre ps: while I have been typing this, I can also see uses for a -u (maybe -w) option, kind of equivalent to the WNOWAIT option to the wait*(2) functions. That is, to not clean up after doing this wait, so it can be repeated later - that might be useful for use in SIGCHLD trap handlers, for example. I think I'll add that and see how well it works... I kind of doubt that options to act like WNOHANG or WUNTRACED would be useful to scripts, but please feel free to disagree. (If a WNOHANG kind of thing seems useful, perhaps better would be a -t timeout instead, with -t 0 working of course.) Anyway, all of this potential feeds back into the questions that started this e-mail, all so long ago...