Re: When can shells remove "known" process IDs from the list?
Chet and I can continue thus conversation off list, what is being discussed now has nothing at all to do with anything related to posix. kre
Re: When can shells remove "known" process IDs from the list?
On 5/13/22 5:37 PM, Robert Elz wrote: Date:Sat, 14 May 2022 03:56:32 +0700 From:"Robert Elz via austin-group-l at The Open Group" Message-ID: <2459.1652475...@jinx.noi.kre.to> | | Show your work. | I no longer remember the exact command I used (cannot even locate the | message you're quoting from), I finally did ... This is what I see: I don't see that. $ echo $BASH_VERSION 5.1.16(2)-release $ sleep 20 | sleep 20 & sleep 30 | sleep 30 & jobs -l ; pstree $$ ; ps jT [1] 22954 [2] 22956 [1]- 22953 Running sleep 20 22954 | sleep 20 & [2]+ 22955 Running sleep 30 22956 | sleep 30 & -+= 22938 chet ./bash |--- 22953 chet sleep 20 |--- 22954 chet sleep 20 |--- 22955 chet sleep 30 |--- 22956 chet sleep 30 \-+- 22957 chet pstree 22938 \--- 22958 root ps -axwwo user,pid,ppid,pgid,command USER PID PPID PGID SESS JOBC STAT TT TIME COMMAND root 811 544 811 00 Ss s0190:00.05 login -pfl chet /bin/ba chet 814 811 814 01 Ss0190:00.09 -bash chet 22938 814 22938 01 S+ s0190:00.04 ./bash chet 22953 22938 22938 01 S+ s0190:00.00 sleep 20 chet 22954 22938 22938 01 S+ s0190:00.00 sleep 20 chet 22955 22938 22938 01 S+ s0190:00.00 sleep 30 chet 22956 22938 22938 01 S+ s0190:00.00 sleep 30 root 22959 22938 22938 01 R+ s0190:00.00 ps jT $ kill %1 $ ps jT USER PID PPID PGID SESS JOBC STAT TT TIME COMMAND root 811 544 811 00 Ss s0190:00.05 login -pfl chet /bin/ba chet 814 811 814 01 Ss0190:00.09 -bash chet 22938 814 22938 01 S+ s0190:00.04 ./bash chet 22955 22938 22938 01 S+ s0190:00.00 sleep 30 chet 22956 22938 22938 01 S+ s0190:00.00 sleep 30 root 22960 22938 22938 01 R+ s0190:00.00 ps jT $ -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
On 5/13/22 4:56 PM, Robert Elz wrote: Date:Fri, 13 May 2022 11:22:20 -0400 From:Chet Ramey Message-ID: | Show your work. | | I tested this on macOS 12 and RHEL 7, using interactive shells with job | control enabled, That is likely the difference. The question was about what happens when job control is not enabled. The same thing. This example uses bash-5.2-beta on macOS 10.15, but the same thing happens with bash-5.1.16. $ ./bash $ set +m $ sleep 20 | sleep 20 & [1] 22755 jenna.local(2)$ pstree $$ -+= 22753 chet ./bash |--- 22754 chet sleep 20 |--- 22755 chet sleep 20 \-+- 22756 chet pstree 22753 \--- 22757 root ps -axwwo user,pid,ppid,pgid,command $ kill %1 $ ps ax | grep sleep 22759 s018 S+ 0:00.00 grep sleep $ sleep 20 | sleep 20 & pstree $$ [1] 22787 -+= 22753 chet ./bash |--- 22786 chet sleep 20 |--- 22787 chet sleep 20 \-+- 22788 chet pstree 22753 \--- 22789 root ps -axwwo user,pid,ppid,pgid,command $ kill %1 $ ps axuw | grep sleep chet 22791 0.0 0.0 4408552764 s018 S+ 10:25AM 0:00.00 grep sleep -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
Date:Sat, 14 May 2022 03:56:32 +0700 From:"Robert Elz via austin-group-l at The Open Group" Message-ID: <2459.1652475...@jinx.noi.kre.to> | | Show your work. | I no longer remember the exact command I used (cannot even locate the | message you're quoting from), I finally did ... This is what I see: bash5 $ echo $BASH_VERSION 5.1.16(1)-release bash5 $ jobs bash5 $ set +m bash5 $ sleep 20 | sleep 20 & sleep 30 | sleep 30 & jobs -l; ps jT [1] 1868 [2] 1847 [1]- 29632 Running sleep 20 1868 | sleep 20 & [2]+ 2715 Running sleep 30 1847 | sleep 30 & USER PID PPID PGID SESS JOBC STAT TTY TIME COMMAND kre355 1847 5699 d0d6d70 S+ pts/26 0:00.00 sleep 30 kre410 29632 5699 d0d6d70 S+ pts/26 0:00.00 sleep 20 kre 1687 1868 5699 d0d6d70 S+ pts/26 0:00.00 sleep 20 kre 1847 5699 5699 d0d6d70 S+ pts/26 0:00.00 -bash kre 1868 5699 5699 d0d6d70 S+ pts/26 0:00.00 -bash kre 2715 5699 5699 d0d6d70 S+ pts/26 0:00.00 -bash kre 4319 2715 5699 d0d6d70 R+ pts/26 0:00.00 sleep 30 (bash) kre 5333 5699 5699 d0d6d70 O+ pts/26 0:00.00 ps -jT kre 5699 3620 5699 d0d6d70 Ss+ pts/26 0:00.03 -bash kre 29632 5699 5699 d0d6d70 S+ pts/26 0:00.00 -bash bash5 $ echo $$ 5699 bash5 $ Note that pids 29632 and 1868 (which jobs claims are "sleep") are actually bash, the sleep processes are 410 and 1687. Similarly for job 2. Everything is in process group 5699 (the interactive shell's pid). When one kills %1 processes 29632 and 1868 get killed, processes 410 and 1687 do not. You can decide whether the extra interposed bash processes are intentional or not, as I said in the previous message, that is not wrong. The inability to signal the (unknown) grandchildren is expected (the same kind of thing would happen if the command were "make" and there's a whole tree of make, compiler, linker, ... processes running - this is unavoidable). kre
Re: When can shells remove "known" process IDs from the list?
Date:Fri, 13 May 2022 11:22:20 -0400 From:Chet Ramey Message-ID: | Show your work. | | I tested this on macOS 12 and RHEL 7, using interactive shells with job | control enabled, That is likely the difference. The question was about what happens when job control is not enabled. When job control is enabled, the kill kills that job's process group, and all of it gets signalled. Without job control, that's not possible, the shell can only kill its known children, their children (absent relaying of the signal down the tree) never see it. I no longer remember the exact command I used (cannot even locate the message you're quoting from), which caused bash to fork a sub-shell, in which to run the pipeline, rather than running it directly from the parent - but that's not really the point, doing that was not wrong, whatever provoked it, it simply meant that the parent shell did not know the actual processes running in the pipe, so could not do anything to them. kre
Re: When can shells remove "known" process IDs from the list?
On 5/5/22 7:46 AM, Geoff Clare via austin-group-l at The Open Group wrote: [Robert intended to send the mail I'm replying to to the list, but it was only sent to me. I've quoted it in full.] Robert Elz wrote, on 05 May 2022: This leaves just bash of the shells I have to test. bash is odd, at first glance it seems to act like the ksh's, zsh & fbsh do. But it doesn't. This seems to be because in a pipeline like sleep 20 | sleep 20 & creates a subshell for the '&' first, and then creates a new subshell environment for each side of the pipe. None of the other shells do that, the processes in the pipeline are in subshell environments (in most anyway) but the same one as the one created for the async process execution - that is, the sleep processes are direct children of the parent shell, not grandchildren as they are in bash. When given "kill %1" it then seems to work just like those other shells, but all that is actually killed is the forked copy of itself, leaving the sleep processes running, orphaned. Show your work. I tested this on macOS 12 and RHEL 7, using interactive shells with job control enabled, running the latest bash devel version, and could not reproduce it. The Linux version of pstree shows the process group; the macOS version doesn't have that option. Both show the sleep processes are direct descendents of the parent shell, but even if they aren't, bash clearly does not leave the sleep processes orphaned. macOS 12: $ sleep 20 | sleep 20 & [1] 16711 $ pstree $$ -+= 16694 chet ./bash |--= 16710 chet sleep 20 |--- 16711 chet sleep 20 \-+= 16712 chet pstree 16694 \--- 16713 root ps -axwwo user,pid,ppid,pgid,command $ kill %1 $ ps axuw | grep sleep chet 16717 0.0 0.0 34142704632 s027 U+ 11:04AM 0:00.00 grep sleep [1]+ Terminated: 15 sleep 20 | sleep 20 RHEL 7: $ sleep 20 | sleep 20 & [1] 106739 $ pstree -g $$ bash(106427)─┬─pstree(106743) ├─sleep(106738) └─sleep(106738) $ kill %1 $ ps axuw | grep sleep chet 106753 0.0 0.0 112812 960 pts/1R+ 10:59 0:00 grep sleep [1]+ Terminated sleep 20 | sleep 20 -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
chet.ra...@case.edu wrote in <217874a6-64d5-184b-68e8-0bedb322f...@case.edu>: |On 5/13/22 10:27 AM, Geoff Clare via austin-group-l at The Open Group \ |wrote: |> Chet Ramey wrote, on 13 May 2022: |>> On 5/13/22 5:20 AM, Geoff Clare via austin-group-l at The Open Group \ |>> wrote: |>>> The definition of "Job" is: ... |>> Why not? This is what allows jobs/kill/wait to use job control notation |>> in operands even when job control is not currently enabled. I'd argue |>> that that was intended. |> |> My reading is that all the standard requires here is that if one or |> more jobs are created with job control enabled, and job control is |> subsequently disabled, you can still use "jobs" to list those jobs, |> and %n etc. with "kill" to refer to those jobs. | |Of course; it relies on your assertion that the standard requires job |control to be enabled to create a job and put it in the jobs list. I've |already said what I think about that, and most, if not all, shells behave |differently. Not to mention the ones where "set -m" is broken somewhere deep within. After running against the wall of reliable asynchronous process interaction from within a sh(1)ell script some years ago, i had to rewrite it all a bit differently, and one core point now is [ -n "${JOBMON}" ] && set -m >/dev/null 2>&1 ( # Place the job in its own directory to ease file management trap '' EXIT HUP INT QUIT TERM USR1 USR2 ${mkdir} t.${JOBS}.d && cd t.${JOBS}.d && eval t_${1} ${JOBS} ${1} && ${rm} -f ../t.${JOBS}.id ) > t.${JOBS}.io &1 /dev/null 2>&1 JOBLIST="${JOBLIST} ${i}" printf '%s\n%s\n' ${i} ${1} > t.${JOBS}.id # ..until we should sync or reach the maximum concurrent number [ ${JOBS} -lt ${JOBNO} ] && return This works reliable on all tested systems (*BSD, Linux of several kind, SunOS 5.{9,10,11}) with all tested (installed) shells. (Beside the one with actually broken set -m, i have to say printf >&2 '%s! $JOBMON: $SHELL %s incapable, disabled!%s\n' \ "${COLOR_ERR_ON}" "${SHELL}" "${COLOR_ERR_OFF}" printf >&2 '%s! No process groups available, killed tests may '\ 'leave process "zombies"!%s\n' \ "${COLOR_ERR_ON}" "${COLOR_ERR_OFF}" but that just cannot be helped.) Of course it is still a mess that requires synchronization files etc., but without this it just will not do. It is still racy jtimeout() { i=0 while [ ${i} -lt ${JOBS} ]; do i=`add ${i} 1` if [ -f t.${i}.id ] && read pid < t.${i}.id >/dev/null 2>&1 && kill -0 ${pid} >/dev/null 2>&1; then j=${pid} [ -n "${JOBMON}" ] && j=-${j} kill -KILL ${j} >/dev/null 2>&1 else ${rm} -f t.${i}.id fi done } But only a bit. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: When can shells remove "known" process IDs from the list?
On 5/13/22 10:27 AM, Geoff Clare via austin-group-l at The Open Group wrote: Chet Ramey wrote, on 13 May 2022: On 5/13/22 5:20 AM, Geoff Clare via austin-group-l at The Open Group wrote: The definition of "Job" is: A set of processes, comprising a shell pipeline, and any processes descended from it, that are all in the same process group. Notice it says "that are all in the same process group". In the case of a background command started with job control disabled, the processes all have the same process group as the parent shell. By a strict reading, this counts as a job, but I don't think that was intended. Why not? This is what allows jobs/kill/wait to use job control notation in operands even when job control is not currently enabled. I'd argue that that was intended. My reading is that all the standard requires here is that if one or more jobs are created with job control enabled, and job control is subsequently disabled, you can still use "jobs" to list those jobs, and %n etc. with "kill" to refer to those jobs. Of course; it relies on your assertion that the standard requires job control to be enabled to create a job and put it in the jobs list. I've already said what I think about that, and most, if not all, shells behave differently. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
Chet Ramey wrote, on 13 May 2022: > > On 5/13/22 5:20 AM, Geoff Clare via austin-group-l at The Open Group wrote: > > > The definition of "Job" is: > > > > A set of processes, comprising a shell pipeline, and any processes > > descended from it, that are all in the same process group. > > > > Notice it says "that are all in the same process group". In the > > case of a background command started with job control disabled, the > > processes all have the same process group as the parent shell. > > By a strict reading, this counts as a job, but I don't think that > > was intended. > > Why not? This is what allows jobs/kill/wait to use job control notation > in operands even when job control is not currently enabled. I'd argue > that that was intended. My reading is that all the standard requires here is that if one or more jobs are created with job control enabled, and job control is subsequently disabled, you can still use "jobs" to list those jobs, and %n etc. with "kill" to refer to those jobs. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: When can shells remove "known" process IDs from the list?
On 5/13/22 5:20 AM, Geoff Clare via austin-group-l at The Open Group wrote: You are over reaching in the way you are reading that text. I strongly disagree. If you have to work that hard to make your case, it's a good indication that the existing language is wrong -- or at least insufficient -- and needs to be changed. There is no such thing as a known process ID that is not a job. Bash allows process substitutions to set $!, so users can wait for them, but they are not jobs. Process substitution is, of course, an extension. The definition of "Job" is: A set of processes, comprising a shell pipeline, and any processes descended from it, that are all in the same process group. Notice it says "that are all in the same process group". In the case of a background command started with job control disabled, the processes all have the same process group as the parent shell. > By a strict reading, this counts as a job, but I don't think that was intended. Why not? This is what allows jobs/kill/wait to use job control notation in operands even when job control is not currently enabled. I'd argue that that was intended. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
On 5/12/22 10:03 AM, Geoff Clare via austin-group-l at The Open Group wrote: The normative text relating to creation of job numbers/IDs is all conditional on job control being enabled. Where is that? It's not in the definition of Job ID, it's not in 2.9.3 Asynchronous Lists, it's not in the `jobs' description, it's not part of the definition of Background Job or Foreground Job, it's not in any of fg/bg/kill/wait. I feel like I'm missing something obvious here. You're looking in (some of) the right places, but missing the significance of what's written there. If we're going to make basic concepts dependent on obscure language in the standard that requires the reader to make the proper set of inferences, the standard has failed. It's worse that it fails to capture what the majority of shells do in practice. This set of examples you give, which you might assert are definitive, is not all that compelling. If the standard wants to specify something, why can't it just say so in plain language? Why make it a puzzle to be solved? If you have to work this hard to make your case, it's probably not that obvious. So for the known IDs list, it's pretty much `wait' and `jobs', right? The phrase kre used was "when their termination status has been reported to the user - however that happens". That includes information written by an interactive shell before it writes a prompt. Although the standard says this information is about the exit status of "the background job", it is also, by association, information about the exit status of a process in the known process IDs list. Another reason that the language relating the two things, and describing how they interact, needs to be clear and unambiguous, and handle all four scenarios. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
On 5/11/22 6:31 PM, Robert Elz wrote: | For neither the first nor the last time. Including now. People can disagree. | > I think they should remain independent. | Sure, I agree. I don't. I cannot think of a single reason why the shell should be forced to maintain two separate lists of its child processes. The jobs table needs to have them, so processes in the job can be identified as they finish. Duplicating that in another table, for no particular reason I can imagine makes no sense to me. Still, if others want to implement it that way, I don't object - but the standard has never required that, and should not, absent some very good reason, be changed to require it now. It's going to take more work on the standard to make it be that way, then. There will have to be more specific language about when and how the jobs list is created, when jobs are added and removed, when and how jobs correspond to known process IDs, and whether or not removing IDs from that list just means removing the job from the table. If we're going to require job control to be enabled to maintain a jobs list, at least a visible one, then we have to have something else to use. It may be the jobs list internally, if we end up fixing all the places in the standard that are underspecified, and that would probably work. It's my impression that the known IDs list is a remnant from the time when job control was optional, and you didn't need to implement job control unless you implemented the UPE. You still needed a way to keep track of background processes, and the known IDs list was it. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
Date:Fri, 13 May 2022 10:20:49 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220513092049.GB17043@localhost> | [Robert Cc'ed this to austin-grou...@netbsd.org which presumably bounced. | I'm taking that as indication that he intended it to go to this list, | and am quoting it in full.] Oops. And yes, I did, and thanks. Didn't even notice that this one hadn't appeared on the list (I ignore bounce messages). | However, what the standard requires here does not match existing | practice in some shells and so the standard should change. OK, let's just agree on that, whatever our opinions of what it currently says. | It's not clear at all, and I would say the opposite is implied. | The definition of "Job" is: | | A set of processes, comprising a shell pipeline, and any processes | descended from it, that are all in the same process group. | | Notice it says "that are all in the same process group". Yes, I did. | In the case of a background command started with job control disabled, | the processes all have the same process group Exactly. That meets the definition, doesn't it? | as the parent shell. Not relevant. | By a strict reading, this counts as a job, but I don't think that | was intended. Intended or not, that's what the standard says. It also largely matches what is implemented. | In any case we already know that the current definition of "job" is | very wrong, so using it to support either position is futile. "very wrong" I think is too much - it is very close to the implementations. But given the last clause, we probably need to wait upon proposed new definitions, and specs for the relevant usages, to see if those are a closer fit to reality. kre
Re: When can shells remove "known" process IDs from the list?
[Robert Cc'ed this to austin-grou...@netbsd.org which presumably bounced. I'm taking that as indication that he intended it to go to this list, and am quoting it in full.] Robert Elz wrote, on 12 May 2022: > > | The standard needs to specify them separately because, as per the > | mail I just sent in reply to Chet, job numbers identify process groups > | and therefore cannot identify asynchronous commands started with job > | control disabled. > > You are over reaching in the way you are reading that text. I strongly disagree. > The job notation needs to be able to refer to jobs with > process groups, and for some shells (incl NetBSD for the > kill command, currently) but there is nothing preventing > a shell from extending tge jobs table (and job id notation) > to cover non job control jobs. And aside from bosh, all > shells do. The standard clearly requires that the job number written by "jobs" identifies a process group. A shell which includes a "job" in the jobs output that does not have its own process group does not conform to that requirement. However, what the standard requires here does not match existing practice in some shells and so the standard should change. > | However, it is an internal implementation detail how that is managed. > > Agreed, but when one or the other is to be deleted, when those > are not specified identically, it makes a noticeable difference. > > There is also the question of whether a scriot can wait for > a process that is in an asybc non-trivial pipeline, which is > not the one which is $!. In the jobs table, all the pids > need to be maintained (all children of this shell) so the > parent can associate them with the correct pipeline so > termination of the job can be determined (not just termination > of the process which happened to be $! when the pipeline > started, and so the pipefail option can work correctly > > I would prefer if the standard did not require retaining a known > pid once the jobs entry that contains that pid is stated to > be removed. Nb: not require retaining, not not permit retaining. > > | If you want to have one table with some flag to say whether each > | entry is a job or a "known process ID that's not a job", that's fine. > > There is no such thing as a known process ID that is not a job. > That is quite clear in the XBD definition of a job. It's not clear at all, and I would say the opposite is implied. The definition of "Job" is: A set of processes, comprising a shell pipeline, and any processes descended from it, that are all in the same process group. Notice it says "that are all in the same process group". In the case of a background command started with job control disabled, the processes all have the same process group as the parent shell. By a strict reading, this counts as a job, but I don't think that was intended. In any case we already know that the current definition of "job" is very wrong, so using it to support either position is futile. > It might not (without an extension to the standard being used) > be possible to always use the %n (etc) notation to manipulate > such jobs, but they very clearly are jobs. > > Hence no such flag is required. Knowing whether the job was > started under job control, and hence has a pgrp of its own, > is required, but that is a different thing entirely. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: When can shells remove "known" process IDs from the list?
Robert Elz wrote, on 12 May 2022: > > | > I think they should remain independent. > | Sure, I agree. > > I don't. I cannot think of a single reason why the shell should be > forced to maintain two separate lists of its child processes. The jobs > table needs to have them, so processes in the job can be identified as > they finish. Duplicating that in another table, for no particular reason > I can imagine makes no sense to me. Still, if others want to implement > it that way, I don't object - but the standard has never required that, > and should not, absent some very good reason, be changed to require it now. The standard needs to specify them separately because, as per the mail I just sent in reply to Chet, job numbers identify process groups and therefore cannot identify asynchronous commands started with job control disabled. However, it is an internal implementation detail how that is managed. If you want to have one table with some flag to say whether each entry is a job or a "known process ID that's not a job", that's fine. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: When can shells remove "known" process IDs from the list?
Chet Ramey wrote, on 11 May 2022: > > On 5/10/22 12:03 PM, Geoff Clare via austin-group-l at The Open Group wrote: > > >> I'd be interested in your reasoning. The standard simply says that jobs > >> and kill (and wait should be added) work with job %X notation whether > >> or not job control is enabled. > > > > The normative text relating to creation of job numbers/IDs is all > > conditional on job control being enabled. > > Where is that? It's not in the definition of Job ID, it's not in 2.9.3 > Asynchronous Lists, it's not in the `jobs' description, it's not part of > the definition of Background Job or Foreground Job, it's not in any > of fg/bg/kill/wait. I feel like I'm missing something obvious here. You're looking in (some of) the right places, but missing the significance of what's written there. For example, on the jobs page it says in STDOUT that the job number written to standard output is "A number that can be used to identify the process group to the wait, fg, bg, and kill utilities." Since it identifies a process *group*, it's not possible for a job number to identify an asynchronous command that was started with job control disabled (as it won't be run in its own process group). This is confirmed by the OPERANDS sections for kill and wait, which describe the second form for the pid operand as "A job control job ID (see XBD Section 3.204, on page 66) that identifies a background process group to be {signaled/waited for}." Also, you mention "the definition of Job ID", but there is no such definition. The term that is defined is "Job Control Job ID", which implies that a "Job ID" is always something connected with job control. > >> OK. I'm pretty sure everyone already does this for the jobs list. Not sure > >> whether you want it to include the known IDs list. > > > > I think kre intended it apply to the known IDs list as well, and I > > was agreeing with that. > > So for the known IDs list, it's pretty much `wait' and `jobs', right? The phrase kre used was "when their termination status has been reported to the user - however that happens". That includes information written by an interactive shell before it writes a prompt. Although the standard says this information is about the exit status of "the background job", it is also, by association, information about the exit status of a process in the known process IDs list. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: When can shells remove "known" process IDs from the list?
Date:Wed, 11 May 2022 09:17:15 -0400 From:"Chet Ramey via austin-group-l at The Open Group" Message-ID: <573bc015-dd85-f86e-b89d-33a0bcc4b...@case.edu> Again, apologies, still very little time for any of this. | For neither the first nor the last time. Including now. | > I think they should remain independent. | Sure, I agree. I don't. I cannot think of a single reason why the shell should be forced to maintain two separate lists of its child processes. The jobs table needs to have them, so processes in the job can be identified as they finish. Duplicating that in another table, for no particular reason I can imagine makes no sense to me. Still, if others want to implement it that way, I don't object - but the standard has never required that, and should not, absent some very good reason, be changed to require it now. In a later message Chet said: | > The normative text relating to creation of job numbers/IDs is all | > conditional on job control being enabled. | Where is that? It's not in the definition of Job ID, it's not in 2.9.3 | Asynchronous Lists, it's not in the `jobs' description, it's not part of the | definition of Background Job or Foreground Job, it's not in any of fg/bg/kill/ | wait. I feel like I'm missing something obvious here. Again, I disagree. You're missing nothing. There has not been anything like Geoff is postulating - there might be in his unpublished new draft text, but there is no reason I can imagine that such a change should be adopted. kre
Re: When can shells remove "known" process IDs from the list?
chet.ra...@case.edu wrote in <195c7c59-8328-4ddc-b936-345f34ab1...@case.edu>: |On 5/10/22 12:03 PM, Geoff Clare via austin-group-l at The Open Group \ |wrote: ... |So for the known IDs list, it's pretty much `wait' and `jobs', right? Great words spoken easily. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: When can shells remove "known" process IDs from the list?
On 5/10/22 12:03 PM, Geoff Clare via austin-group-l at The Open Group wrote: >> If jobs and kill work, you should probably add wait to this description, or >> add a separate paragraph to the wait rationale. > > If it works with "wait" in all shells (that we care about), then I > agree it would make sense to add it. Just decide whether or not it makes sense. If it makes sense, add it. Shell behavior is only selectively relevant. >> I'd be interested in your reasoning. The standard simply says that jobs >> and kill (and wait should be added) work with job %X notation whether >> or not job control is enabled. > > The normative text relating to creation of job numbers/IDs is all > conditional on job control being enabled. Where is that? It's not in the definition of Job ID, it's not in 2.9.3 Asynchronous Lists, it's not in the `jobs' description, it's not part of the definition of Background Job or Foreground Job, it's not in any of fg/bg/kill/wait. I feel like I'm missing something obvious here. >> OK. I'm pretty sure everyone already does this for the jobs list. Not sure >> whether you want it to include the known IDs list. > > I think kre intended it apply to the known IDs list as well, and I > was agreeing with that. So for the known IDs list, it's pretty much `wait' and `jobs', right? -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
On 5/10/22 11:17 AM, Geoff Clare via austin-group-l at The Open Group wrote: >> Anyway, I agree with disallowing remove-before-prompting. > > Unfortunately that puts you in opposition to kre. For neither the first nor the last time. >> Or make it clear everywhere that removing a job from the jobs list >> means removing its pid from the list of terminated asynchronous lists. > > I think they should remain independent. Sure, I agree. It just means more work specifying when the shell can remove entries from either. I'll wait for your proposal. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
Chet Ramey wrote, on 06 May 2022: > > On 5/5/22 7:46 AM, Geoff Clare via austin-group-l at The Open Group wrote: > > > The fact that the jobs command works with job control disabled is > > mentioned in the rationale on the jobs page: > > > > The jobs utility is not dependent on the job control option, as > > are the seemingly related bg and fg utilities because jobs is > > useful for examining background jobs, regardless of the condition > > of job control. When the user has invoked a set +m command and job > > control has been turned off, jobs can still be used to examine the > > background jobs associated with that current session. Similarly, > > kill can then be used to kill background jobs with kill > > %. > > > > so that's not an "issue". > > If jobs and kill work, you should probably add wait to this description, or > add a separate paragraph to the wait rationale. If it works with "wait" in all shells (that we care about), then I agree it would make sense to add it. > > The reason I think #2 should say "if job control is disabled" is > > because the standard talks separately about the list of "process IDs > > known in the shell environment" and the job list / job IDs. > > I think it needs to talk a little bit more clearly about the jobs list and > what constitutes a job, not to mention how and when one gets created. The changes I've already worked on for bug 1254 add a lot of job control detail in a new "Job Control" section in XCU chapter 2. > > Your testing above seems to be conflating the "known IDs" and the jobs > > list. My reading of the standard is that entries in the jobs list only > > need to be created when job control is enabled, > > I'd be interested in your reasoning. The standard simply says that jobs > and kill (and wait should be added) work with job %X notation whether > or not job control is enabled. The normative text relating to creation of job numbers/IDs is all conditional on job control being enabled. When the "jobs" rationale says: ... because jobs is useful for examining background jobs, regardless of the condition of job control. When the user has invoked a set +m command and job control has been turned off, jobs can still be used to examine the background jobs associated with that current session. it seems to me that "background jobs associated with that current session" is referring to jobs that were created before the "set +m". > > > If someone wants to implement it that way, I have no objection, but it > > > should not be required. shells should at least be permitted to remove > > > jobs from the list of remembered stuff when their termination status has > > > been reported to the user - however that happens. > > > > I agree. > > OK. I'm pretty sure everyone already does this for the jobs list. Not sure > whether you want it to include the known IDs list. I think kre intended it apply to the known IDs list as well, and I was agreeing with that. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: When can shells remove "known" process IDs from the list?
Chet Ramey wrote, on 06 May 2022: > > > There would seem to be two options to resolve this: > > > > A. Uphold the decision to disallow remove-before-prompting. This > > would mean removing the conflicting text from set -b and updating the > > justification on the wait page to something that holds water. > > (And dash would need to change in order to conform.) > > Are we talking about interactive shells where job control is enabled, > disabled, or both? Or both interactive and non-interactive shells? All interactive shells, regardless of whether job control is enabled. > I guess I'm not clear about the goal of allowing remove-before-prompting > unless it means the jobs list? After the shell has informed the user > that job 1 has terminated, isn't it an error to try something like > `wait %1', since the job may have been removed from the jobs list? It doesn't mean the jobs list. As you say later (and I said in an email you hadn't got to at this point), the known process IDs list and the background jobs list are separate things. I assume the reason to do remove-before-prompting is to keep the list of known process IDs tidy, so as to reduce the chances that the list will reach its size limit when the oldest entry in the list is still running (and presumably therefore still of interest to the user). > Anyway, I agree with disallowing remove-before-prompting. Unfortunately that puts you in opposition to kre. > The standard already requires that the shell notify the user of status > changes in any background jobs before prompting, and requires (or allows, > at least) the shell to remove a terminated job about which the user has > been notified from the jobs table. That is all job-control specific; it doesn't affect the list of known process IDs. > This plus the text in the Asynchronous Lists section implies to me, again, > that there are separate lists of jobs and terminated asynchronous lists, > and shells can remove entries from each one separately. That's the model > bash uses. But the set of jobs that the `jobs' builtin works from isn't > really defined anywhere, and I don't think the standard includes the > concept of a jobs list as such, even though we all know what it is. You > can probably synthesize it from different pieces, but I don't think it's > well defined. It will be clearly specified in my proposed changes for bug 1254. > So maybe you update the `wait' rationale to say something about how the > jobs stay in the jobs list until the user is notified of their termination, > say that the list of process IDs known in the current environment is not > the jobs list, and update the appropriate sections to refer to one or the > other. I agree the distinction needs to be pointed out somewhere, but I think it should be more prominent than the rationale for the wait utility. The obvious place is XCU 2.12 (see below). > Or make it clear everywhere that removing a job from the jobs list > means removing its pid from the list of terminated asynchronous lists. I think they should remain independent. > This suggests that the concept of a job needs more detail, and now you need > a definition of the current set of jobs the shell knows about to refer to > from other sections. The concept of a job is already getting major changes in my proposal for bug 1254. I think 2.12 Shell Execution Environment should have a separate entry for the background jobs list, making it clear that it is not the same things as the known process IDs list (which already has an entry). There are already cross-references to 2.12 where the known process IDs are mentioned, so it would also be the natural place to reference for the background jobs list. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: When can shells remove "known" process IDs from the list?
chet.ra...@case.edu wrote in <88762e56-0276-f936-cf4c-d48c8ddc2...@case.edu>: |On 4/29/22 4:23 PM, Robert Elz via austin-group-l at The Open Group wrote: ... |> true & X=$! ... |They're not jobs! A pid is a pid. It doesn't matter whether it's the pid of |the job's controlling process (or whatever we want to call it). The |Asynchronous Lists text says you have to be able to wait for it. This is |how bash works, too. | |This is what happens when you have a jobs list and a list of terminated |asynchronous lists that are `known in the current shell environment'. ... A bit off-topic, but it would be nice if scripts would be given a hand to signal childs in a safe way. I can wait(1) on a PID maybe, but timeout(1) is not standardized (nor can it be already i think, -- though maybe i should simply open an issue?), and so there is no safe way to collect multiple PIDs while also being able to kill(1) them when they exceed a time limit. This can only be done by means of synchronization of some stamp file or so, but if i kill(1) a PID i do not know whether it was my PID or already a reused PID that belongs to another program. Yet the sh(1) does know whether that PID is still our child or not. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Re: When can shells remove "known" process IDs from the list?
On 5/5/22 7:46 AM, Geoff Clare via austin-group-l at The Open Group wrote: The fact that the jobs command works with job control disabled is mentioned in the rationale on the jobs page: The jobs utility is not dependent on the job control option, as are the seemingly related bg and fg utilities because jobs is useful for examining background jobs, regardless of the condition of job control. When the user has invoked a set +m command and job control has been turned off, jobs can still be used to examine the background jobs associated with that current session. Similarly, kill can then be used to kill background jobs with kill %. so that's not an "issue". If jobs and kill work, you should probably add wait to this description, or add a separate paragraph to the wait rationale. XBD 2.175 defines a job as A set of processes, comprising a shell pipeline, and any processes descended from it, that are all in the same process group. Which says nothing very useful, and I am not sure is even correct. Yes, I made the same point in a previous message. The reason I think #2 should say "if job control is disabled" is because the standard talks separately about the list of "process IDs known in the shell environment" and the job list / job IDs. I think it needs to talk a little bit more clearly about the jobs list and what constitutes a job, not to mention how and when one gets created. Anyway, this also implies the existence of two separate lists. Your testing above seems to be conflating the "known IDs" and the jobs list. My reading of the standard is that entries in the jobs list only need to be created when job control is enabled, I'd be interested in your reasoning. The standard simply says that jobs and kill (and wait should be added) work with job %X notation whether or not job control is enabled. And in any event, that's not how shells work. I do agree that the current text implies two separate lists, and there's insufficient explanation of how they interact. It certainly doesn't imply that the `known IDs' stuff is only in effect when job control is not enabled. Independently of this, when job control is disabled all of the requirements relating to "known IDs" still apply and have nothing to do with %... job ID notation. If you make that change. The known IDs description doesn't depend on job control being enabled or disabled. | I think the description of the wait utility should be updated to require | removal from the list. I would agree with that. I wouldn't object. If someone wants to implement it that way, I have no objection, but it should not be required. shells should at least be permitted to remove jobs from the list of remembered stuff when their termination status has been reported to the user - however that happens. I agree. OK. I'm pretty sure everyone already does this for the jobs list. Not sure whether you want it to include the known IDs list. That could be another valid choice, but I would prefer that all shells wait for termination by default. You might, but that's not the current state of the world. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
On 5/3/22 6:52 AM, Geoff Clare via austin-group-l at The Open Group wrote: Robert Elz wrote, on 30 Apr 2022: | However, today it threw a last curve ball when I was working on an | update to the description of set -b ... How many shells actually implement that? They all accept it as an option, but for some it seems to be a no-op. That's one of the changes I was working on when I spotted this problem. Bash implements it. I doubt very many people use it. | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs | remain known until: | | 1. The command terminates and the application waits for the process ID. | | 2. Another asynchronous list is invoked before "$!" (corresponding to | the previous asynchronous list) is expanded in the current execution | environment. Does anyone implement that bit (#2) at all? In a non-interactive shell it might almost be possible, but in an interactive shell, if the job isn't in the list (whether $! has been referenced or not - usually it will not have been) because it has been removed, what is the shell supposed to do if the job stops? Further users (even in scripts) are allowed to use % %- %1 etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should work). I'd suggest that #2 should simply be removed. I think #2 should say "If job control is disabled, ...". Why? You can use job control notation with jobs/kill/wait even if job control isn't enabled, which implies the presence of a job list separate from the list of known IDs. I think the description of the wait utility should be updated to require removal from the list. I agree, both the jobs list and the list of known IDs. [...] And last, also in this area, is the question of stopped jobs and the wait command, and how those two are intended to interact. The wording in my current draft makes clear that wait waits for processes to terminate. I could, if desired, add some rationale saying that some implementations have, as an extension, an option that allows wait to return when a process stops. That's not the current behavior. At best, it should be unspecified. Bash, yash, mksh, dash, the NetBSD sh, and gwsh allow the `wait' builtin to wait for any process status change (e.g., SIGSTOP). ksh93, FreeBSD sh, and zsh force the shell to wait until the process terminates. Bash provides an option (`wait -f') to force a wait for process termination. I didn't check whether other shells do. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
On 4/29/22 4:23 PM, Robert Elz via austin-group-l at The Open Group wrote: | You can test this by doing | |true & | |wait $!; echo $? | | This should print 0. Then do the same, except with the first command | changed to false &. That should print 1. Yes, in the shells you mention it does, indicating that something different is happening. It is interesting that in bash you can do that wait over and over again, and it keeps returning the 0 status (until one does a plain "wait" command, even the "jobs" command doesn't remove it, though the standard requires that it do so). bash is the only shell that acts like that, whether it is intentional or not I have no idea. It's intentional, and has been in bash for a very long time. As I said in another message, the jobs builtin not removing the pid from the `remembered' list is probably an oversight. I'll fix it in posix mode after bash-5.2 is released. But try a different test true & X=$! (the assignment to X is just in case there is a shell which implements that "no need to retain" stuff when $! is not referenced). Then repeat that line over and over. (Consecutive lines). zsh does something different, once a job has been reported as finished at a prompt, it is removed from the jobs table, and you can no longer do "wait %3" for it, but the pid and status seem to be remembered somewhere else, and wait gets the status from the job. That seems odd to me, it should be possible to use either form to wait on a job. They're not jobs! A pid is a pid. It doesn't matter whether it's the pid of the job's controlling process (or whatever we want to call it). The Asynchronous Lists text says you have to be able to wait for it. This is how bash works, too. This is what happens when you have a jobs list and a list of terminated asynchronous lists that are `known in the current shell environment'. bash is different again, it counts up the job numbers, like bosh and yash, but as it reports each earlier one finished, removes it from the jobs table, so the "jobs" command only ever shows (and then removes) the last one started. It still allows wait N to return the status, as many times as you want to do that command, but not wait %n for any but the most recently created one. Right. The ascending job number depends on your policy for assigning new job numbers, and you can only use job control notation to refer to entries in the job list. But bash will let you wait for pid N as long as pid N is in the list of terminated asynchronous processes. The bigger issue is what do you do about users who can be connected to their shell for weeks, running lots of background commands, and never issuing a wait or jobs command? Do you just keep remembering exit status/pid pairs forever? That doesn't sound sustainable to me. Bash bounds the number remembered. It's at least CHILD_MAX, as POSIX specifies, with an upper bound (right now, 32K -- very few sessions start that many asynchronous jobs/processes). It checks for pid reuse: the entry for pid N will always be the status of the most recent asynchronous process with that pid. That might not be perfect, but it works fine in practice. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
On 4/29/22 2:38 PM, Robert Elz via austin-group-l at The Open Group wrote: | However, today it threw a last curve ball when I was working on an | update to the description of set -b ... How many shells actually implement that? Bash does. I doubt anyone uses it. | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs | remain known until: | | 1. The command terminates and the application waits for the process ID. | | 2. Another asynchronous list is invoked before "$!" (corresponding to | the previous asynchronous list) is expanded in the current execution | environment. Does anyone implement that bit (#2) at all? I think the FreeBSD shell does. In a non-interactive shell it might almost be possible, but in an interactive shell, if the job isn't in the list (whether $! has been referenced or not - usually it will not have been) because it has been removed, what is the shell supposed to do if the job stops? Further users (even in scripts) are allowed to use % %- %1 etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should work). I'd suggest that #2 should simply be removed. I think the standard implies that the jobs list and the list of terminated process IDs `known in the current environment' are different things. It's not clear. But do note that the definition of the jobs command says: When jobs reports the termination status of a job, the shell shall remove its process ID from the list of those ``known in the current shell execution environment''; see Section 2.9.3.1 (on page 2338). This is one place where the two things overlap. | It also appears that dash still implements remove-before-prompting. Does anyone not? Lots of shells don't. | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to | add a third list item (for interactive shells only) and deleting the | above quoted text from the wait page. This is necessary, we would be making use of the shell too difficult for interactive users otherwise. What does "too difficult" mean? The shells that don't do remove-before- prompting seem to be doing just fine. While you're considering all of this, you might want to also consider what is intended to happen if a script does trap '' CHLD and how that is supposed to interact with maintenance of the jobs command, the wait command, and all else related. It should be explicitly stated to be unspecified behavior, since SIGCHLD is necessary to make process handling work. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: When can shells remove "known" process IDs from the list?
On 4/29/22 10:39 AM, Geoff Clare via austin-group-l at The Open Group wrote: I'm responding to these messages in order; sorry if I cover ground that's already been covered. I've been gradually making progress on bug 1254 as a background task. However, today it threw a last curve ball when I was working on an update to the description of set -b ... That description includes this near the end: When the shell notifies the user a job has been completed, it may remove the job's process ID from the list of those known in the current shell execution environment It would be correct if it referred to the `jobs utility' ("...notifies the user via the "jobs" utility that a job..."), which already specifies that, or explicitly said that asynchronous notification behaves as if it used the jobs builtin and inherited its behavior. (Which, to be clear, the bash `jobs' builtin doesn't do. I think that was an oversight, and I'll fix it, at least in posix mode, sometime after bash-5.2 comes out.) I don't think that sentence as written, in this context, describes how bash implements asynchronous notification. This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs remain known until: 1. The command terminates and the application waits for the process ID. 2. Another asynchronous list is invoked before "$!" (corresponding to the previous asynchronous list) is expanded in the current execution environment. Then there is the following in the APPLICATION USAGE for wait: Historical implementations of interactive shells have discarded the exit status of terminated background processes before each shell prompt. Therefore, the status of background processes was usually lost unless it terminated while wait was waiting for it. This could be a serious problem when a job that was expected to run for a long time actually terminated quickly with a syntax or initialization error because the exit status returned was usually zero if the requested process ID was not found. This volume of POSIX.1-202x requires the implementation to keep the status of terminated jobs available until the status is requested, so that scripts like: [...] work without losing status on any of the jobs. I don't think any shell removes jobs from the jobs list without notifying the user about their termination status, at least in interactive shells. So this text must be trying to refer to interactive and non-interactive shells and reconcile the different ways the user is notified of terminated jobs, since, unlike historical shells, the POSIX shells have job control and allow it to be used non-interactively? (This also implies that there are separate lists of jobs and terminated asynchronous lists.) There would seem to be two options to resolve this: A. Uphold the decision to disallow remove-before-prompting. This would mean removing the conflicting text from set -b and updating the justification on the wait page to something that holds water. (And dash would need to change in order to conform.) Are we talking about interactive shells where job control is enabled, disabled, or both? Or both interactive and non-interactive shells? I guess I'm not clear about the goal of allowing remove-before-prompting unless it means the jobs list? After the shell has informed the user that job 1 has terminated, isn't it an error to try something like `wait %1', since the job may have been removed from the jobs list? Anyway, I agree with disallowing remove-before-prompting. The standard already requires that the shell notify the user of status changes in any background jobs before prompting, and requires (or allows, at least) the shell to remove a terminated job about which the user has been notified from the jobs table. This plus the text in the Asynchronous Lists section implies to me, again, that there are separate lists of jobs and terminated asynchronous lists, and shells can remove entries from each one separately. That's the model bash uses. But the set of jobs that the `jobs' builtin works from isn't really defined anywhere, and I don't think the standard includes the concept of a jobs list as such, even though we all know what it is. You can probably synthesize it from different pieces, but I don't think it's well defined. So maybe you update the `wait' rationale to say something about how the jobs stay in the jobs list until the user is notified of their termination, say that the list of process IDs known in the current environment is not the jobs list, and update the appropriate sections to refer to one or the other. Or make it clear everywhere that removing a job from the jobs list means removing its pid from the list of terminated asynchronous lists. This suggests that the concept of a job needs more detail, and now you need a definition of the current set of jobs the shell knows about to refer to from other sections. Chet -- ``The lyf
Re: When can shells remove "known" process IDs from the list?
[Robert intended to send the mail I'm replying to to the list, but it was only sent to me. I've quoted it in full.] Robert Elz wrote, on 05 May 2022: > > | > How many shells actually implement that? > | > | They all accept it as an option, but for some it seems to be a no-op. > > Oh yes, sorry, that is what I meant - just parsing the option is > trivial, it is actually causing the option to do something that I > was wondering about. > > | I think #2 should say "If job control is disabled, ...". > > That might be an option, but does it really matter? That is, does any > shell that anyone knows of actually make use of this allowance to avoid > saving the exit status of anything? > > If not, then we can just delete that, and not have to worry about the > next issue - which is that, even with job control disabled, I think the > jobs command, and job type notation, all still works. Even if no shell currently implements it, I don't see the point of disallowing an optimisation that the standard currently allows. The fact that the jobs command works with job control disabled is mentioned in the rationale on the jobs page: The jobs utility is not dependent on the job control option, as are the seemingly related bg and fg utilities because jobs is useful for examining background jobs, regardless of the condition of job control. When the user has invoked a set +m command and job control has been turned off, jobs can still be used to examine the background jobs associated with that current session. Similarly, kill can then be used to kill background jobs with kill %. so that's not an "issue". > What enabling > job control does is arrange to place jobs in process groups of their own, > different from the shell (for background jobs) and to manage the process > group of the controlling tty. > > But it is not as clear as it might be what is supposed to work. > > XBD 2.175 defines a job as > > A set of processes, comprising a shell pipeline, and any > processes descended from it, that are all in the same process group. > > Which says nothing very useful, and I am not sure is even correct. At this point you start going over old ground. A major part of the work I've been doing on bug 1254 is because this definition is wrong. I started a mailing list thread about it in July 2019, `Bug 1254 gets worse: "Job" definition is wrong', which you contributed to. At some point I will check whether I've accounted for the things you say here, but for now I'm going to skip over most of it and concentrate on the parts related to the question I asked. > Eg: From one shell I can start another, and from that shell, start > several jobs in different process groups. To the initial shell all > of this is one job - though attempts to send signals to it might > reach anyone or no-one if using signals to process groups (the child > shell will be changing its process group to match whichever of its > children is the current foreground job, so there might be nothing in > the pgrp that the initial shell thinks refers to that job). > > For distinguishing jobs in a shell with job control turned off, it > says nothing at all, as ignoring that previous issue (when the job, or > some of its descendants change their pgrp) a shell pipeline is all one > process group, with job control (-m) enabled or not. What differs is > whether other shell pipelines are in the same pgrp as the first one or > not, and that definition doesn't touch on that. So, it is reasonable to > conclude that any shell pipeline is a job, for this definition. > > Then 3.176 defines Job Control, there's nothing particularly relevant > (nor incorrect) about what that says, it isn't useful for present purposes. > > Then 3.177 defines Job Control Job ID > > A handle that is used to refer to a job. The job control job ID > can be any of the forms shown in the following table: > > There's no need to reproduce the table here, it is just the various forms > of % notation that are defined. > > Note there's nothing in 3.175 or 3.177 about requiring job control to be > enabled in order to get jobs, or job control job ID handles. > > The jobs command definition just says: > > The jobs utility shall display the status of jobs that were started > in the current shell environment; see Section 2.12 (on page 2348). > > The xref is just about what "current shell environment" means, and isn't > relevant here. Nothing in that about "when job control is enabled" or > anything similar. > > The kill command says: > > OPERANDS > The following operands shall be supported: > > pid One of the following: > > [ omitting #1, that's just a decimal pid, or negative of a pgrp id, > and adds nothing useful here] > > 2. A job control job ID (see XBD Section 3.177, on page 54) > that identifies a background process group to be signaled. >
Re: When can shells remove "known" process IDs from the list?
Robert Elz wrote, on 30 Apr 2022: > > | However, today it threw a last curve ball when I was working on an > | update to the description of set -b ... > > How many shells actually implement that? They all accept it as an option, but for some it seems to be a no-op. That's one of the changes I was working on when I spotted this problem. > | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs > | remain known until: > | > | 1. The command terminates and the application waits for the process ID. > | > | 2. Another asynchronous list is invoked before "$!" (corresponding to > | the previous asynchronous list) is expanded in the current execution > | environment. > > Does anyone implement that bit (#2) at all? In a non-interactive shell it > might almost be possible, but in an interactive shell, if the job isn't in > the list (whether $! has been referenced or not - usually it will not have > been) because it has been removed, what is the shell supposed to do if the > job stops? Further users (even in scripts) are allowed to use % %- %1 > etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should > work). I'd suggest that #2 should simply be removed. I think #2 should say "If job control is disabled, ...". > But do note that the definition of the jobs command says: > > When jobs reports the termination status of a job, the shell shall > remove its process ID from the list of those ``known in the current > shell execution environment''; see Section 2.9.3.1 (on page 2338). > > (quote from I8 Draft 2.1 -- but that text has been there forever, or > seemingly). > Good catch. That should be added to the numbered list in 2.9.3.1. > So that's another way that an entry is removed, and this one is "shall remove" > whereas "remain known until" puts a minimum on how long the job is supposed > to remain known, but doesn't actually require removal. For #2 that's > obvious, > shells aren't required to make that optimisation (that's some academic view of > what was thought should be possible - but isn't in practice), but for #1 if > the job isn't removed (when wait happens) then it could still be there, again, > and again, forever I think the description of the wait utility should be updated to require removal from the list. > | My initial reaction to this was that the above quote from set -b is > | likely a left-over from before the decision to disallow the historical > | remove-before-prompting behaviour was made. > > I doubt that -b is particularly relevant to this, other than that it provides > an alternate time at which termination status of a process can be shown. > > | However, then I spotted that the text from wait, which seems to be an > | attempt to justify that decision, first says it was historical > | behaviour for *interactive* shells but then talks about the problems > | it could cause for *scripts*. So it seems to me that the > | justification does not stand up to scrutiny. > > The justification doesn't, but for scripts I don't recall there ever > really being an issue - the removal happens when the status of jobs which > have changed status is reported just before PS1 is written, and > non-interactive shells (scripts) don't do that. > > On the other hand, users of interactive shells are not in the habit of > issuing wait commands (even jobs commands, without some reason do do so). I have done it occasionally, when I have a bunch of background jobs running and I don't care about their individual status, I just want to be told when they've all finished: command1 & . . commandN & wait; echo ALL DONE However, in this particular scenario it wouldn't matter if command1, say, had already finished and been removed from the list when I typed the wait command. > They expect to be told when a background job has finished (without -b both > working, and set, that might require causing new prompts to appear from time > to time) and simply expect that when a job has been reported as done, it is > done, and no longer exists. > > | It also appears that dash still implements remove-before-prompting. > > Does anyone not? Most shells do not. Harald's reply has the details. > | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to > | add a third list item (for interactive shells only) and deleting the > | above quoted text from the wait page. > > This is necessary, we would be making use of the shell too difficult for > interactive users otherwise. But there is no particular need for an > "interactive only" here, scripts can (though usually don't) use the jobs > command as well (it is a convenient way to get rid of any jobs from the > table that have finished, without knowing what they are, and without > potentially hanging waiting for something still running). The third item I'm referring to would just be for removal before prompting, so obviously
Re: When can shells remove "known" process IDs from the list?
Date:Fri, 29 Apr 2022 20:11:55 +0100 From:"Harald van Dijk via austin-group-l at The Open Group" Message-ID: | >| It also appears that dash still implements remove-before-prompting. | | busybox ash and my shell do as well, but both are derived from dash and | have merely retained dash's behaviour. All ash derived shells work that way. | > Does anyone not? | | bash does not. bosh does not. ksh does not. mksh does not. posh does | not. yash does not. zsh does not. I did a test (not the same one you did) after I sent the mail, and saw that bosh and yash don't. For the other shells, it is not nearly as clearcut what is happening. | You can test this by doing | |true & | |wait $!; echo $? | | This should print 0. Then do the same, except with the first command | changed to false &. That should print 1. Yes, in the shells you mention it does, indicating that something different is happening. It is interesting that in bash you can do that wait over and over again, and it keeps returning the 0 status (until one does a plain "wait" command, even the "jobs" command doesn't remove it, though the standard requires that it do so). bash is the only shell that acts like that, whether it is intentional or not I have no idea. But try a different test true & X=$! (the assignment to X is just in case there is a shell which implements that "no need to retain" stuff when $! is not referenced). Then repeat that line over and over. (Consecutive lines). In ash derived shells (and pdksh) the first will report job 1 starting (assuming you had none already running), the 2nd line will report job 2 starting, and before prompting for the 3rd, report job 1 has finished. The third will be job 1 again, and report job 2 has finished, and that continues over and over again. This is all consistent with how we know that they work. In bosh and yash, the job number just keeps on climbing, even though they report the previous job finished as each subsequent one is started. That's also consistent with how they operate. A simple "wait N" for one of the jobs removes that one from the list, then more true& commands add more jobs. A simple "wait" clears up everything. In yash "jobs" reports them all finished and clears everything, as it should. In bosh "jobs" reports them all finished, but clears nothing (the jobs command can be repeated over and over and keeps reporting all the completed jobs). That's clearly broken. zsh does something different, once a job has been reported as finished at a prompt, it is removed from the jobs table, and you can no longer do "wait %3" for it, but the pid and status seem to be remembered somewhere else, and wait gets the status from the job. That seems odd to me, it should be possible to use either form to wait on a job. (I should note that there is something odd about my zsh install - I tend to need to type two newlines after a command to get it executed, both are seen by the shell. Most of the time that's just mildly annoying, when I forget the 2nd, nothing happens, and I have to wake up and remember that zsh is waiting for the 2nd before it will do anything with the command - but in testing like this, where the newlines generate prompts, and the accompanying the prompt is an action we care about, it kind of ruins the test.) ksh93 is similar (without the double newline issue). mksh is almost similar, but in it I saw internal error: j_async: bad nzombie (161) twice (once, then more testing, then again), which does not look good. I don't know what the 161 represents, it was not the same each time, but is not a pid of any of the jobs started. A count? In that one, with this sequence, there are only ever 2 jobs (as in job numbers) assigned, as each is started, the previous one is reported finished, and removed from the jobs table. It is possible to wait %n for the job number most recently started, but only that one (were the commands to run for longer, then presumably it would be possible to wait on any not completed and reported as completed). bash is different again, it counts up the job numbers, like bosh and yash, but as it reports each earlier one finished, removes it from the jobs table, so the "jobs" command only ever shows (and then removes) the last one started. It still allows wait N to return the status, as many times as you want to do that command, but not wait %n for any but the most recently created one. | I consider the dash behaviour a bug, but do not want to | fix it in a way that introduces another bug. While removing jobs that have been reported (ie: removing them as soon as possible) might reduce the risk of getting duplicate pids, it doesn't actually solve the problem. In particular, the removal only happens in interactive shells (ones which prompt) so does nothing at all for scripts, which have the same issue. It can also happen in an interactive
Re: When can shells remove "known" process IDs from the list?
On 29/04/2022 19:38, Robert Elz via austin-group-l at The Open Group wrote: Date:Fri, 29 Apr 2022 15:39:23 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220429143923.GA22521@localhost> | It also appears that dash still implements remove-before-prompting. pdksh does as well. However, pdksh is no longer maintained and the maintained shells that derive from pdksh have changed this. busybox ash and my shell do as well, but both are derived from dash and have merely retained dash's behaviour. Does anyone not? bash does not. bosh does not. ksh does not. mksh does not. posh does not. yash does not. zsh does not. You can test this by doing true & wait $!; echo $? This should print 0. Then do the same, except with the first command changed to false &. That should print 1. For the record, in my shell, a fork of dash, I have retained the dash behaviour for now because it is unclear to me what the expected behaviour is if a later background job gets the same PID as a previous background job. I consider the dash behaviour a bug, but do not want to fix it in a way that introduces another bug. Cheers, Harald van Dijk
Re: When can shells remove "known" process IDs from the list?
Date:Fri, 29 Apr 2022 15:39:23 +0100 From:"Geoff Clare via austin-group-l at The Open Group" Message-ID: <20220429143923.GA22521@localhost> Sorry, been too busy to participate here much recently, will catch up someday soon (I hope). | However, today it threw a last curve ball when I was working on an | update to the description of set -b ... How many shells actually implement that? | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs | remain known until: | | 1. The command terminates and the application waits for the process ID. | | 2. Another asynchronous list is invoked before "$!" (corresponding to | the previous asynchronous list) is expanded in the current execution | environment. Does anyone implement that bit (#2) at all? In a non-interactive shell it might almost be possible, but in an interactive shell, if the job isn't in the list (whether $! has been referenced or not - usually it will not have been) because it has been removed, what is the shell supposed to do if the job stops? Further users (even in scripts) are allowed to use % %- %1 etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should work). I'd suggest that #2 should simply be removed. But do note that the definition of the jobs command says: When jobs reports the termination status of a job, the shell shall remove its process ID from the list of those ``known in the current shell execution environment''; see Section 2.9.3.1 (on page 2338). (quote from I8 Draft 2.1 -- but that text has been there forever, or seemingly). So that's another way that an entry is removed, and this one is "shall remove" whereas "remain known until" puts a minimum on how long the job is supposed to remain known, but doesn't actually require removal. For #2 that's obvious, shells aren't required to make that optimisation (that's some academic view of what was thought should be possible - but isn't in practice), but for #1 if the job isn't removed (when wait happens) then it could still be there, again, and again, forever - even if the system uses the same pid later (days, weeks, months later perhaps) for another job started by the same shell -- against which there is no protection of any kind currently, though a shell could do WNOWAIT waits so zombies remain in the process table, even though the shell has already collected the exit status - but that's difficult to actually code correctly, especially given the definition of how SIGCHLD works, which as best I can tell has to be used as the only thing that would make it even conceivable to use WNOWAIT. Without that, when the shell acts like I believe most, or all do, and cleans up zombies ASAP, just keeping the job in its jobs table, marked terminated, with the status ready to give back when requested, the kernel is free to assign the reclaimed pid to any new process it likes, whenever it likes. | My initial reaction to this was that the above quote from set -b is | likely a left-over from before the decision to disallow the historical | remove-before-prompting behaviour was made. I doubt that -b is particularly relevant to this, other than that it provides an alternate time at which termination status of a process can be shown. | However, then I spotted that the text from wait, which seems to be an | attempt to justify that decision, first says it was historical | behaviour for *interactive* shells but then talks about the problems | it could cause for *scripts*. So it seems to me that the | justification does not stand up to scrutiny. The justification doesn't, but for scripts I don't recall there ever really being an issue - the removal happens when the status of jobs which have changed status is reported just before PS1 is written, and non-interactive shells (scripts) don't do that. On the other hand, users of interactive shells are not in the habit of issuing wait commands (even jobs commands, without some reason do do so). They expect to be told when a background job has finished (without -b both working, and set, that might require causing new prompts to appear from time to time) and simply expect that when a job has been reported as done, it is done, and no longer exists. | It also appears that dash still implements remove-before-prompting. Does anyone not? | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to | add a third list item (for interactive shells only) and deleting the | above quoted text from the wait page. This is necessary, we would be making use of the shell too difficult for interactive users otherwise. But there is no particular need for an "interactive only" here, scripts can (though usually don't) use the jobs command as well (it is a convenient way to get rid of any jobs from the table that have finished, without knowing what they are, and without potentially hanging waiting for something
Re: When can shells remove "known" process IDs from the list?
It appears to me the set -b wording needs updating, to clarify "may remove the job's process ID" is intended to exclude the blocking circumstances listed, and since it's a "may", not "shall", whether those exclusions are handled properly now is more a quality of implementation than conformance issue. On Fri, Apr 29, 2022 at 10:40 AM, Geoff Clare via austin-group-l at The Open Group wrote: I've been gradually making progress on bug 1254 as a background task. However, today it threw a last curve ball when I was working on an update to the description of set -b ... That description includes this near the end: When the shell notifies the user a job has been completed, it may remove the job's process ID from the list of those known in the current shell execution environment This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs remain known until: 1. The command terminates and the application waits for the process ID. 2. Another asynchronous list is invoked before "$!" (corresponding to the previous asynchronous list) is expanded in the current execution environment. Then there is the following in the APPLICATION USAGE for wait: Historical implementations of interactive shells have discarded the exit status of terminated background processes before each shell prompt. Therefore, the status of background processes was usually lost unless it terminated while wait was waiting for it. This could be a serious problem when a job that was expected to run for a long time actually terminated quickly with a syntax or initialization error because the exit status returned was usually zero if the requested process ID was not found. This volume of POSIX.1-202x requires the implementation to keep the status of terminated jobs available until the status is requested, so that scripts like: [...] work without losing status on any of the jobs. My initial reaction to this was that the above quote from set -b is likely a left-over from before the decision to disallow the historical remove-before-prompting behaviour was made. However, then I spotted that the text from wait, which seems to be an attempt to justify that decision, first says it was historical behaviour for *interactive* shells but then talks about the problems it could cause for *scripts*. So it seems to me that the justification does not stand up to scrutiny. It also appears that dash still implements remove-before-prompting. There would seem to be two options to resolve this: A. Uphold the decision to disallow remove-before-prompting. This would mean removing the conflicting text from set -b and updating the justification on the wait page to something that holds water. (And dash would need to change in order to conform.) B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to add a third list item (for interactive shells only) and deleting the above quoted text from the wait page. I'm particularly interested to get the opinions of shell authors on this. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England