Re: When can shells remove "known" process IDs from the list?

2022-05-16 Thread Robert Elz via austin-group-l at The Open Group
Chet and I can continue thus conversation off list, what is
being discussed now has nothing at all to do with anything
related to posix.

kre



Re: When can shells remove "known" process IDs from the list?

2022-05-16 Thread Chet Ramey via austin-group-l at The Open Group

On 5/13/22 5:37 PM, Robert Elz wrote:

 Date:Sat, 14 May 2022 03:56:32 +0700
 From:"Robert Elz via austin-group-l at The Open Group" 

 Message-ID:  <2459.1652475...@jinx.noi.kre.to>

   |   | Show your work.

   | I no longer remember the exact command I used (cannot even locate the
   | message you're quoting from),

I finally did ...

This is what I see:


I don't see that.

$ echo $BASH_VERSION
5.1.16(2)-release
$ sleep 20 | sleep 20 & sleep 30 | sleep 30 & jobs -l ; pstree $$ ; ps jT
[1] 22954
[2] 22956
[1]- 22953 Running sleep 20
 22954   | sleep 20 &
[2]+ 22955 Running sleep 30
 22956   | sleep 30 &
-+= 22938 chet ./bash
 |--- 22953 chet sleep 20
 |--- 22954 chet sleep 20
 |--- 22955 chet sleep 30
 |--- 22956 chet sleep 30
 \-+- 22957 chet pstree 22938
   \--- 22958 root ps -axwwo user,pid,ppid,pgid,command
USER   PID  PPID  PGID   SESS JOBC STAT   TT   TIME COMMAND
root   811   544   811  00 Ss   s0190:00.05 login -pfl chet /bin/ba
chet   814   811   814  01 Ss0190:00.09 -bash
chet 22938   814 22938  01 S+   s0190:00.04 ./bash
chet 22953 22938 22938  01 S+   s0190:00.00 sleep 20
chet 22954 22938 22938  01 S+   s0190:00.00 sleep 20
chet 22955 22938 22938  01 S+   s0190:00.00 sleep 30
chet 22956 22938 22938  01 S+   s0190:00.00 sleep 30
root 22959 22938 22938  01 R+   s0190:00.00 ps jT
$ kill %1
$ ps jT
USER   PID  PPID  PGID   SESS JOBC STAT   TT   TIME COMMAND
root   811   544   811  00 Ss   s0190:00.05 login -pfl chet /bin/ba
chet   814   811   814  01 Ss0190:00.09 -bash
chet 22938   814 22938  01 S+   s0190:00.04 ./bash
chet 22955 22938 22938  01 S+   s0190:00.00 sleep 30
chet 22956 22938 22938  01 S+   s0190:00.00 sleep 30
root 22960 22938 22938  01 R+   s0190:00.00 ps jT
$


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-16 Thread Chet Ramey via austin-group-l at The Open Group

On 5/13/22 4:56 PM, Robert Elz wrote:

 Date:Fri, 13 May 2022 11:22:20 -0400
 From:Chet Ramey 
 Message-ID:  


   | Show your work.
   |
   | I tested this on macOS 12 and RHEL 7, using interactive shells with job
   | control enabled,

That is likely the difference.   The question was about what happens when
job control is not enabled.


The same thing. This example uses bash-5.2-beta on macOS 10.15, but the
same thing happens with bash-5.1.16.

$ ./bash
$ set +m
$ sleep 20 | sleep 20 &
[1] 22755
jenna.local(2)$ pstree $$
-+= 22753 chet ./bash
 |--- 22754 chet sleep 20
 |--- 22755 chet sleep 20
 \-+- 22756 chet pstree 22753
   \--- 22757 root ps -axwwo user,pid,ppid,pgid,command
$ kill %1
$ ps ax | grep sleep
22759 s018  S+ 0:00.00 grep sleep
$ sleep 20 | sleep 20 & pstree $$
[1] 22787
-+= 22753 chet ./bash
 |--- 22786 chet sleep 20
 |--- 22787 chet sleep 20
 \-+- 22788 chet pstree 22753
   \--- 22789 root ps -axwwo user,pid,ppid,pgid,command
$ kill %1
$ ps axuw | grep sleep
chet 22791   0.0  0.0  4408552764 s018  S+   10:25AM 
0:00.00 grep sleep


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Robert Elz via austin-group-l at The Open Group
Date:Sat, 14 May 2022 03:56:32 +0700
From:"Robert Elz via austin-group-l at The Open Group" 

Message-ID:  <2459.1652475...@jinx.noi.kre.to>

  |   | Show your work.

  | I no longer remember the exact command I used (cannot even locate the
  | message you're quoting from),

I finally did ...

This is what I see:

bash5 $ echo $BASH_VERSION
5.1.16(1)-release
bash5 $ jobs
bash5 $ set +m
bash5 $ sleep 20 | sleep 20 & sleep 30 | sleep 30 & jobs -l; ps jT
[1] 1868
[2] 1847
[1]- 29632 Running sleep 20
  1868   | sleep 20 &
[2]+  2715 Running sleep 30
  1847   | sleep 30 &
USER   PID  PPID PGID   SESS JOBC STAT TTY   TIME COMMAND
kre355  1847 5699 d0d6d70 S+   pts/26 0:00.00 sleep 30 
kre410 29632 5699 d0d6d70 S+   pts/26 0:00.00 sleep 20 
kre   1687  1868 5699 d0d6d70 S+   pts/26 0:00.00 sleep 20 
kre   1847  5699 5699 d0d6d70 S+   pts/26 0:00.00 -bash 
kre   1868  5699 5699 d0d6d70 S+   pts/26 0:00.00 -bash 
kre   2715  5699 5699 d0d6d70 S+   pts/26 0:00.00 -bash 
kre   4319  2715 5699 d0d6d70 R+   pts/26 0:00.00 sleep 30 (bash)
kre   5333  5699 5699 d0d6d70 O+   pts/26 0:00.00 ps -jT 
kre   5699  3620 5699 d0d6d70 Ss+  pts/26 0:00.03 -bash 
kre  29632  5699 5699 d0d6d70 S+   pts/26 0:00.00 -bash 
bash5 $ echo $$
5699
bash5 $ 

Note that pids 29632 and 1868 (which jobs claims are "sleep") are actually
bash, the sleep processes are 410 and 1687.   Similarly for job 2. Everything
is in process group 5699 (the interactive shell's pid).

When one kills %1 processes 29632 and 1868 get killed, processes 410 and 1687
do not.

You can decide whether the extra interposed bash processes are intentional or
not, as I said in the previous message, that is not wrong.  The inability to
signal the (unknown) grandchildren is expected (the same kind of thing would
happen if the command were "make" and there's a whole tree of make, compiler,
linker, ... processes running - this is unavoidable).

kre





Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Robert Elz via austin-group-l at The Open Group
Date:Fri, 13 May 2022 11:22:20 -0400
From:Chet Ramey 
Message-ID:  


  | Show your work.
  |
  | I tested this on macOS 12 and RHEL 7, using interactive shells with job
  | control enabled,

That is likely the difference.   The question was about what happens when
job control is not enabled.

When job control is enabled, the kill kills that job's process group, and
all of it gets signalled.   Without job control, that's not possible, the
shell can only kill its known children, their children (absent relaying of
the signal down the tree) never see it.

I no longer remember the exact command I used (cannot even locate the message
you're quoting from), which caused bash to fork a sub-shell, in which to
run the pipeline, rather than running it directly from the parent - but
that's not really the point, doing that was not wrong, whatever provoked it,
it simply meant that the parent shell did not know the actual processes
running in the pipe, so could not do anything to them.

kre




Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Chet Ramey via austin-group-l at The Open Group

On 5/5/22 7:46 AM, Geoff Clare via austin-group-l at The Open Group wrote:

[Robert intended to send the mail I'm replying to to the list, but it
was only sent to me. I've quoted it in full.]

Robert Elz  wrote, on 05 May 2022:



This leaves just bash of the shells I have to test.   bash is odd, at
first glance it seems to act like the ksh's, zsh & fbsh do.   But it
doesn't.   This seems to be because in a pipeline like

sleep 20 | sleep 20 &

creates a subshell for the '&' first, and then creates a new subshell
environment for each side of the pipe.   None of the other shells do that,
the processes in the pipeline are in subshell environments (in most anyway)
but the same one as the one created for the async process execution - that
is, the sleep processes are direct children of the parent shell, not
grandchildren as they are in bash.

When given "kill %1" it then seems to work just like those other shells, but
all that is actually killed is the forked copy of itself, leaving the sleep
processes running, orphaned.


Show your work.

I tested this on macOS 12 and RHEL 7, using interactive shells with job
control enabled, running the latest bash devel version, and could not
reproduce it.

The Linux version of pstree shows the process group; the macOS version
doesn't have that option. Both show the sleep processes are direct
descendents of the parent shell, but even if they aren't, bash clearly does
not leave the sleep processes orphaned.

macOS 12:

$ sleep 20 | sleep 20 &
[1] 16711
$ pstree $$
-+= 16694 chet ./bash
 |--= 16710 chet sleep 20
 |--- 16711 chet sleep 20
 \-+= 16712 chet pstree 16694
   \--- 16713 root ps -axwwo user,pid,ppid,pgid,command
$ kill %1
$ ps axuw | grep sleep
chet 16717   0.0  0.0 34142704632 s027  U+   11:04AM 
0:00.00 grep sleep

[1]+  Terminated: 15  sleep 20 | sleep 20

RHEL 7:

$ sleep 20 | sleep 20 &
[1] 106739
$ pstree -g $$
bash(106427)─┬─pstree(106743)
 ├─sleep(106738)
 └─sleep(106738)
$ kill %1
$ ps axuw | grep sleep
chet 106753  0.0  0.0 112812   960 pts/1R+   10:59   0:00 grep sleep
[1]+  Terminated  sleep 20 | sleep 20

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Steffen Nurpmeso via austin-group-l at The Open Group
chet.ra...@case.edu wrote in
 <217874a6-64d5-184b-68e8-0bedb322f...@case.edu>:
 |On 5/13/22 10:27 AM, Geoff Clare via austin-group-l at The Open Group \
 |wrote:
 |> Chet Ramey wrote, on 13 May 2022:
 |>> On 5/13/22 5:20 AM, Geoff Clare via austin-group-l at The Open Group \
 |>> wrote:
 |>>> The definition of "Job" is:
 ...
 |>> Why not? This is what allows jobs/kill/wait to use job control notation
 |>> in operands even when job control is not currently enabled. I'd argue
 |>> that that was intended.
 |> 
 |> My reading is that all the standard requires here is that if one or
 |> more jobs are created with job control enabled, and job control is
 |> subsequently disabled, you can still use "jobs" to list those jobs,
 |> and %n etc. with "kill" to refer to those jobs.
 |
 |Of course; it relies on your assertion that the standard requires job
 |control to be enabled to create a job and put it in the jobs list. I've
 |already said what I think about that, and most, if not all, shells behave
 |differently.

Not to mention the ones where "set -m" is broken somewhere deep
within.

After running against the wall of reliable asynchronous process
interaction from within a sh(1)ell script some years ago, i had to
rewrite it all a bit differently, and one core point now is

   [ -n "${JOBMON}" ] && set -m >/dev/null 2>&1
   (  # Place the job in its own directory to ease file management
  trap '' EXIT HUP INT QUIT TERM USR1 USR2
  ${mkdir} t.${JOBS}.d && cd t.${JOBS}.d &&
 eval t_${1} ${JOBS} ${1} &&
 ${rm} -f ../t.${JOBS}.id
   ) > t.${JOBS}.io &1 /dev/null 2>&1
   JOBLIST="${JOBLIST} ${i}"
   printf '%s\n%s\n' ${i} ${1} > t.${JOBS}.id

   # ..until we should sync or reach the maximum concurrent number
   [ ${JOBS} -lt ${JOBNO} ] && return

This works reliable on all tested systems (*BSD, Linux of several
kind, SunOS 5.{9,10,11}) with all tested (installed) shells.
(Beside the one with actually broken set -m, i have to say

 printf >&2 '%s! $JOBMON: $SHELL %s incapable, disabled!%s\n' \
"${COLOR_ERR_ON}" "${SHELL}" "${COLOR_ERR_OFF}"
 printf >&2 '%s!  No process groups available, killed tests may '\
'leave process "zombies"!%s\n' \
"${COLOR_ERR_ON}" "${COLOR_ERR_OFF}"

but that just cannot be helped.)

Of course it is still a mess that requires synchronization files
etc., but without this it just will not do.  It is still racy

  jtimeout() {
 i=0
 while [ ${i} -lt ${JOBS} ]; do
i=`add ${i} 1`
if [ -f t.${i}.id ] &&
  read pid < t.${i}.id >/dev/null 2>&1 &&
  kill -0 ${pid} >/dev/null 2>&1; then
   j=${pid}
   [ -n "${JOBMON}" ] && j=-${j}
   kill -KILL ${j} >/dev/null 2>&1
else
   ${rm} -f t.${i}.id
fi
 done
  }

But only a bit.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Chet Ramey via austin-group-l at The Open Group

On 5/13/22 10:27 AM, Geoff Clare via austin-group-l at The Open Group wrote:

Chet Ramey wrote, on 13 May 2022:


On 5/13/22 5:20 AM, Geoff Clare via austin-group-l at The Open Group wrote:


The definition of "Job" is:

  A set of processes, comprising a shell pipeline, and any processes
  descended from it, that are all in the same process group.

Notice it says "that are all in the same process group".  In the
case of a background command started with job control disabled, the
processes all have the same process group as the parent shell.
By a strict reading, this counts as a job, but I don't think that
was intended.


Why not? This is what allows jobs/kill/wait to use job control notation
in operands even when job control is not currently enabled. I'd argue
that that was intended.


My reading is that all the standard requires here is that if one or
more jobs are created with job control enabled, and job control is
subsequently disabled, you can still use "jobs" to list those jobs,
and %n etc. with "kill" to refer to those jobs.


Of course; it relies on your assertion that the standard requires job
control to be enabled to create a job and put it in the jobs list. I've
already said what I think about that, and most, if not all, shells behave
differently.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Geoff Clare via austin-group-l at The Open Group
Chet Ramey wrote, on 13 May 2022:
>
> On 5/13/22 5:20 AM, Geoff Clare via austin-group-l at The Open Group wrote:
> 
> > The definition of "Job" is:
> > 
> >  A set of processes, comprising a shell pipeline, and any processes
> >  descended from it, that are all in the same process group.
> > 
> > Notice it says "that are all in the same process group".  In the
> > case of a background command started with job control disabled, the
> > processes all have the same process group as the parent shell.
> > By a strict reading, this counts as a job, but I don't think that
> > was intended.
> 
> Why not? This is what allows jobs/kill/wait to use job control notation
> in operands even when job control is not currently enabled. I'd argue
> that that was intended.

My reading is that all the standard requires here is that if one or
more jobs are created with job control enabled, and job control is
subsequently disabled, you can still use "jobs" to list those jobs,
and %n etc. with "kill" to refer to those jobs.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Chet Ramey via austin-group-l at The Open Group

On 5/13/22 5:20 AM, Geoff Clare via austin-group-l at The Open Group wrote:


You are over reaching in the way you are reading that text.


I strongly disagree.


If you have to work that hard to make your case, it's a good indication
that the existing language is wrong -- or at least insufficient -- and
needs to be changed.


There is no such thing as a known process ID that is not a job.


Bash allows process substitutions to set $!, so users can wait for them,
but they are not jobs. Process substitution is, of course, an extension.



The definition of "Job" is:

 A set of processes, comprising a shell pipeline, and any processes
 descended from it, that are all in the same process group.

Notice it says "that are all in the same process group".  In the
case of a background command started with job control disabled, the
processes all have the same process group as the parent shell. > By a strict 
reading, this counts as a job, but I don't think that
was intended.


Why not? This is what allows jobs/kill/wait to use job control notation
in operands even when job control is not currently enabled. I'd argue
that that was intended.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Chet Ramey via austin-group-l at The Open Group

On 5/12/22 10:03 AM, Geoff Clare via austin-group-l at The Open Group wrote:


The normative text relating to creation of job numbers/IDs is all
conditional on job control being enabled.


Where is that? It's not in the definition of Job ID, it's not in 2.9.3
Asynchronous Lists, it's not in the `jobs' description, it's not part of
the definition of Background Job or Foreground Job, it's not in any
of fg/bg/kill/wait. I feel like I'm missing something obvious here.


You're looking in (some of) the right places, but missing the
significance of what's written there. 


If we're going to make basic concepts dependent on obscure language in
the standard that requires the reader to make the proper set of inferences,
the standard has failed. It's worse that it fails to capture what the
majority of shells do in practice.

This set of examples you give, which you might assert are definitive, is
not all that compelling. If the standard wants to specify something, why
can't it just say so in plain language? Why make it a puzzle to be solved?

If you have to work this hard to make your case, it's probably not that
obvious.



So for the known IDs list, it's pretty much `wait' and `jobs', right?


The phrase kre used was "when their termination status has been
reported to the user - however that happens".  That includes information
written by an interactive shell before it writes a prompt.  Although
the standard says this information is about the exit status of
"the background job", it is also, by association, information about
the exit status of a process in the known process IDs list.


Another reason that the language relating the two things, and describing
how they interact, needs to be clear and unambiguous, and handle all four
scenarios.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Chet Ramey via austin-group-l at The Open Group

On 5/11/22 6:31 PM, Robert Elz wrote:


   | For neither the first nor the last time.

Including now.


People can disagree.



   | > I think they should remain independent.
   | Sure, I agree.

I don't.  I cannot think of a single reason why the shell should be
forced to maintain two separate lists of its child processes.  The jobs
table needs to have them, so processes in the job can be identified as
they finish.  Duplicating that in another table, for no particular reason
I can imagine makes no sense to me.   Still, if others want to implement
it that way, I don't object - but the standard has never required that,
and should not, absent some very good reason, be changed to require it now.


It's going to take more work on the standard to make it be that way, then.
There will have to be more specific language about when and how the jobs
list is created, when jobs are added and removed, when and how jobs
correspond to known process IDs, and whether or not removing IDs from that
list just means removing the job from the table. If we're going to require
job control to be enabled to maintain a jobs list, at least a visible one,
then we have to have something else to use. It may be the jobs list
internally, if we end up fixing all the places in the standard that are
underspecified, and that would probably work.

It's my impression that the known IDs list is a remnant from the time when
job control was optional, and you didn't need to implement job control
unless you implemented the UPE. You still needed a way to keep track of
background processes, and the known IDs list was it.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: wait and stopped processes (was: When can shells remove "known" process IDs from the list?)

2022-05-13 Thread Chet Ramey via austin-group-l at The Open Group

On 5/11/22 6:56 PM, Robert Elz wrote:


   | Maybe. And yet I can't recall ever receiving a bug about this.


[...]


The circumstances to provoke a problem need to be contrived.


Exactly. It's a largely hypothetical scenario.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Robert Elz via austin-group-l at The Open Group
Date:Fri, 13 May 2022 10:20:49 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220513092049.GB17043@localhost>

  | [Robert Cc'ed this to austin-grou...@netbsd.org which presumably bounced.
  | I'm taking that as indication that he intended it to go to this list,
  | and am quoting it in full.]

Oops.   And yes, I did, and thanks.   Didn't even notice that this one
hadn't appeared on the list (I ignore bounce messages).

  | However, what the standard requires here does not match existing
  | practice in some shells and so the standard should change.

OK, let's just agree on that, whatever our opinions of what it
currently says.

  | It's not clear at all, and I would say the opposite is implied.
  | The definition of "Job" is:
  |
  | A set of processes, comprising a shell pipeline, and any processes
  | descended from it, that are all in the same process group.
  |
  | Notice it says "that are all in the same process group".

Yes, I did.

  | In the case of a background command started with job control disabled,
  | the processes all have the same process group

Exactly.   That meets the definition, doesn't it?

  | as the parent shell.

Not relevant.

  | By a strict reading, this counts as a job, but I don't think that
  | was intended.

Intended or not, that's what the standard says.   It also largely matches
what is implemented.

  | In any case we already know that the current definition of "job" is
  | very wrong, so using it to support either position is futile.

"very wrong" I think is too much - it is very close to the implementations.

But given the last clause, we probably need to wait upon proposed new
definitions, and specs for the relevant usages, to see if those are a
closer fit to reality.

kre



Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Geoff Clare via austin-group-l at The Open Group
[Robert Cc'ed this to austin-grou...@netbsd.org which presumably bounced.
I'm taking that as indication that he intended it to go to this list,
and am quoting it in full.]

Robert Elz wrote, on 12 May 2022:
>
>   | The standard needs to specify them separately because, as per the
>   | mail I just sent in reply to Chet, job numbers identify process groups
>   | and therefore cannot identify asynchronous commands started with job
>   | control disabled.
> 
> You are over reaching in the way you are reading that text.

I strongly disagree.

> The job notation needs to be able to refer to jobs with
> process groups, and for some shells (incl NetBSD for the
> kill command, currently) but there is nothing preventing
> a shell from extending tge jobs table (and job id notation)
> to cover non job control jobs.   And aside from bosh, all
> shells do.

The standard clearly requires that the job number written by "jobs"
identifies a process group.  A shell which includes a "job" in
the jobs output that does not have its own process group does not
conform to that requirement.

However, what the standard requires here does not match existing
practice in some shells and so the standard should change.

>   | However, it is an internal implementation detail how that is managed.
> 
> Agreed, but when one or the other is to be deleted, when those
> are not specified identically, it makes a noticeable difference.
> 
> There is also the question of whether a scriot can wait for
> a process that is in an asybc non-trivial pipeline, which is
> not the one which is $!.   In the jobs table, all the pids
> need to be maintained (all children of this shell) so the
> parent can associate them with the correct pipeline so
> termination of the job can be determined (not just termination
> of the process which happened to be $! when the pipeline
> started, and so the pipefail option can work correctly
> 
> I would prefer if the standard did not require retaining a known
> pid once the jobs entry that contains that pid is stated to
> be removed.  Nb: not require retaining, not not permit retaining.
> 
>   | If you want to have one table with some flag to say whether each
>   | entry is a job or a "known process ID that's not a job", that's fine.
> 
> There is no such thing as a known process ID that is not a job.
> That is quite clear in the XBD definition of a job.

It's not clear at all, and I would say the opposite is implied.
The definition of "Job" is:

A set of processes, comprising a shell pipeline, and any processes
descended from it, that are all in the same process group.

Notice it says "that are all in the same process group".  In the
case of a background command started with job control disabled, the
processes all have the same process group as the parent shell.
By a strict reading, this counts as a job, but I don't think that
was intended.

In any case we already know that the current definition of "job" is
very wrong, so using it to support either position is futile.

> It might not (without an extension to the standard being used)
> be possible to always use the %n (etc) notation to manipulate
> such jobs, but they very clearly are jobs.
> 
> Hence no such flag is required.   Knowing whether the job was
> started under job control, and hence has a pgrp of its own,
> is required, but that is a different thing entirely.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: When can shells remove "known" process IDs from the list?

2022-05-12 Thread Geoff Clare via austin-group-l at The Open Group
Robert Elz wrote, on 12 May 2022:
>
>   | > I think they should remain independent.
>   | Sure, I agree.
> 
> I don't.  I cannot think of a single reason why the shell should be
> forced to maintain two separate lists of its child processes.  The jobs
> table needs to have them, so processes in the job can be identified as
> they finish.  Duplicating that in another table, for no particular reason
> I can imagine makes no sense to me.   Still, if others want to implement
> it that way, I don't object - but the standard has never required that,
> and should not, absent some very good reason, be changed to require it now.

The standard needs to specify them separately because, as per the
mail I just sent in reply to Chet, job numbers identify process groups
and therefore cannot identify asynchronous commands started with job
control disabled.

However, it is an internal implementation detail how that is managed.
If you want to have one table with some flag to say whether each
entry is a job or a "known process ID that's not a job", that's fine.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: When can shells remove "known" process IDs from the list?

2022-05-12 Thread Geoff Clare via austin-group-l at The Open Group
Chet Ramey wrote, on 11 May 2022:
>
> On 5/10/22 12:03 PM, Geoff Clare via austin-group-l at The Open Group wrote:
> 
> >> I'd be interested in your reasoning. The standard simply says that jobs
> >> and kill (and wait should be added) work with job %X notation whether
> >> or not job control is enabled.
> > 
> > The normative text relating to creation of job numbers/IDs is all
> > conditional on job control being enabled.
> 
> Where is that? It's not in the definition of Job ID, it's not in 2.9.3
> Asynchronous Lists, it's not in the `jobs' description, it's not part of
> the definition of Background Job or Foreground Job, it's not in any
> of fg/bg/kill/wait. I feel like I'm missing something obvious here.

You're looking in (some of) the right places, but missing the
significance of what's written there.  For example, on the jobs page
it says in STDOUT that the job number written to standard output
is "A number that can be used to identify the process group to the
wait, fg, bg, and kill utilities."

Since it identifies a process *group*, it's not possible for a job
number to identify an asynchronous command that was started with
job control disabled (as it won't be run in its own process group).

This is confirmed by the OPERANDS sections for kill and wait, which
describe the second form for the pid operand as "A job control job ID
(see XBD Section 3.204, on page 66) that identifies a background
process group to be {signaled/waited for}."

Also, you mention "the definition of Job ID", but there is no such
definition. The term that is defined is "Job Control Job ID", which
implies that a "Job ID" is always something connected with job control.

> >> OK. I'm pretty sure everyone already does this for the jobs list. Not sure
> >> whether you want it to include the known IDs list.
> > 
> > I think kre intended it apply to the known IDs list as well, and I
> > was agreeing with that.
> 
> So for the known IDs list, it's pretty much `wait' and `jobs', right?

The phrase kre used was "when their termination status has been
reported to the user - however that happens".  That includes information
written by an interactive shell before it writes a prompt.  Although
the standard says this information is about the exit status of
"the background job", it is also, by association, information about
the exit status of a process in the known process IDs list.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: wait and stopped processes (was: When can shells remove "known" process IDs from the list?)

2022-05-11 Thread Robert Elz via austin-group-l at The Open Group
Date:Wed, 11 May 2022 09:58:38 -0400
From:"Chet Ramey via austin-group-l at The Open Group" 

Message-ID:  <4d0598b4-efb3-d5c2-1267-b8a807399...@case.edu>

  | > It is already what the standard requires, and with good reason.
  |
  | Sure. It simply isn't what many (most) shells do.

You're right about that, given this test (in an interactive shell, with set -m)

date; sleep 30 & X=$! ; ( sleep 5;  kill -STOP $X) &
echo sleep=$X kill=$!; wait $X; jobs -l; date

(which I entered on one line, but wrapped here for e-mail convenience)

All shells but FreeBSD and zsh (--emulate sh) finished in 5 seconds, leaving
a stopped sleep job running.   (We can ignore The NetBSD sh for this, it
is definitely broken - what happens depends upon that "sleep 5", as the
wait behaves differently if the waited upon process is already stopped,
vs if it stops while waiting).

The FreeBSD and zsh shells didn't terminate that command until a SIGCONT
was directed at the sleep process (rather more than 30 seconds after all
of this started).

  | Maybe. And yet I can't recall ever receiving a bug about this.

That is most likely because users generally don't wait in interactive
shells, and in non-interactive shells, 99.9% of the time if a job stops,
is parent shell stops along with it - when they are resumed, they
both resume, and simply continue from where they left off.

The circumstances to provoke a problem need to be contrived.

kre



Re: When can shells remove "known" process IDs from the list?

2022-05-11 Thread Robert Elz via austin-group-l at The Open Group
Date:Wed, 11 May 2022 09:17:15 -0400
From:"Chet Ramey via austin-group-l at The Open Group" 

Message-ID:  <573bc015-dd85-f86e-b89d-33a0bcc4b...@case.edu>

Again, apologies, still very little time for any of this.

  | For neither the first nor the last time.

Including now.

  | > I think they should remain independent.
  | Sure, I agree.

I don't.  I cannot think of a single reason why the shell should be
forced to maintain two separate lists of its child processes.  The jobs
table needs to have them, so processes in the job can be identified as
they finish.  Duplicating that in another table, for no particular reason
I can imagine makes no sense to me.   Still, if others want to implement
it that way, I don't object - but the standard has never required that,
and should not, absent some very good reason, be changed to require it now.

In a later message Chet said:
| > The normative text relating to creation of job numbers/IDs is all
| > conditional on job control being enabled.

| Where is that? It's not in the definition of Job ID, it's not in 2.9.3
| Asynchronous Lists, it's not in the `jobs' description, it's not part of the
| definition of Background Job or Foreground Job, it's not in any of fg/bg/kill/
| wait. I feel like I'm missing something obvious here. 

Again, I disagree.   You're missing nothing.   There has not been anything
like Geoff is postulating - there might be in his unpublished new draft text,
but there is no reason I can imagine that such a change should be adopted.

kre



Re: When can shells remove "known" process IDs from the list?

2022-05-11 Thread Steffen Nurpmeso via austin-group-l at The Open Group
chet.ra...@case.edu wrote in
 <195c7c59-8328-4ddc-b936-345f34ab1...@case.edu>:
 |On 5/10/22 12:03 PM, Geoff Clare via austin-group-l at The Open Group \
 |wrote:
 ...
 |So for the known IDs list, it's pretty much `wait' and `jobs', right?

Great words spoken easily.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: When can shells remove "known" process IDs from the list?

2022-05-11 Thread Chet Ramey via austin-group-l at The Open Group
On 5/10/22 12:03 PM, Geoff Clare via austin-group-l at The Open Group wrote:

>> If jobs and kill work, you should probably add wait to this description, or
>> add a separate paragraph to the wait rationale.
> 
> If it works with "wait" in all shells (that we care about), then I
> agree it would make sense to add it.

Just decide whether or not it makes sense. If it makes sense, add it.
Shell behavior is only selectively relevant.


>> I'd be interested in your reasoning. The standard simply says that jobs
>> and kill (and wait should be added) work with job %X notation whether
>> or not job control is enabled.
> 
> The normative text relating to creation of job numbers/IDs is all
> conditional on job control being enabled.

Where is that? It's not in the definition of Job ID, it's not in 2.9.3
Asynchronous Lists, it's not in the `jobs' description, it's not part of
the definition of Background Job or Foreground Job, it's not in any
of fg/bg/kill/wait. I feel like I'm missing something obvious here.


>> OK. I'm pretty sure everyone already does this for the jobs list. Not sure
>> whether you want it to include the known IDs list.
> 
> I think kre intended it apply to the known IDs list as well, and I
> was agreeing with that.

So for the known IDs list, it's pretty much `wait' and `jobs', right?

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: wait and stopped processes (was: When can shells remove "known" process IDs from the list?)

2022-05-11 Thread Chet Ramey via austin-group-l at The Open Group
On 5/10/22 11:50 AM, Geoff Clare via austin-group-l at The Open Group wrote:
> Chet Ramey wrote, on 06 May 2022:
>>
 And last, also in this area, is the question of stopped jobs and the wait
 command, and how those two are intended to interact.
>>>
>>> The wording in my current draft makes clear that wait waits for
>>> processes to terminate.  I could, if desired, add some rationale saying
>>> that some implementations have, as an extension, an option that allows
>>> wait to return when a process stops.
>>
>> That's not the current behavior. At best, it should be unspecified.
> 
> It is already what the standard requires, and with good reason.

Sure. It simply isn't what many (most) shells do.

> I have never, ever, seen a shell script use "wait" in a way that would
> work correctly if the wait returned when the process stopped.  The code
> invariably assumes that wait will not return until the process
> terminates. If it checks $? after the wait, it is always just to
> distinguish between different exit status values.

Maybe. And yet I can't recall ever receiving a bug about this.


> In shells where wait (with no options specified) returns when a
> process stops, that is a horrible misfeature.  Kre has already stated
> he will change NetBSD sh so that it doesn't do that. Hopefully the
> other ash-bashed shells will follow suit.

There are more shells than ash-based ones that do this. At least four
independent code bases have made the same choice.

> If you only change
> bash in POSIX mode, you will be doing your users a disservice.

I doubt that. There's no evidence that this is a problem for bash users.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-11 Thread Chet Ramey via austin-group-l at The Open Group
On 5/10/22 11:17 AM, Geoff Clare via austin-group-l at The Open Group wrote:

>> Anyway, I agree with disallowing remove-before-prompting.
> 
> Unfortunately that puts you in opposition to kre.

For neither the first nor the last time.

>> Or make it clear everywhere that removing a job from the jobs list
>> means removing its pid from the list of terminated asynchronous lists.
> 
> I think they should remain independent.

Sure, I agree. It just means more work specifying when the shell can
remove entries from either. I'll wait for your proposal.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-10 Thread Geoff Clare via austin-group-l at The Open Group
Chet Ramey wrote, on 06 May 2022:
>
> On 5/5/22 7:46 AM, Geoff Clare via austin-group-l at The Open Group wrote:
> 
> > The fact that the jobs command works with job control disabled is
> > mentioned in the rationale on the jobs page:
> > 
> >  The jobs utility is not dependent on the job control option, as
> >  are the seemingly related bg and fg utilities because jobs is
> >  useful for examining background jobs, regardless of the condition
> >  of job control. When the user has invoked a set +m command and job
> >  control has been turned off, jobs can still be used to examine the
> >  background jobs associated with that current session. Similarly,
> >  kill can then be used to kill background jobs with kill
> >  %.
> > 
> > so that's not an "issue".
> 
> If jobs and kill work, you should probably add wait to this description, or
> add a separate paragraph to the wait rationale.

If it works with "wait" in all shells (that we care about), then I
agree it would make sense to add it.

> > The reason I think #2 should say "if job control is disabled" is
> > because the standard talks separately about the list of "process IDs
> > known in the shell environment" and the job list / job IDs.
> 
> I think it needs to talk a little bit more clearly about the jobs list and
> what constitutes a job, not to mention how and when one gets created.

The changes I've already worked on for bug 1254 add a lot of job control
detail in a new "Job Control" section in XCU chapter 2.

> > Your testing above seems to be conflating the "known IDs" and the jobs
> > list. My reading of the standard is that entries in the jobs list only
> > need to be created when job control is enabled,
> 
> I'd be interested in your reasoning. The standard simply says that jobs
> and kill (and wait should be added) work with job %X notation whether
> or not job control is enabled.

The normative text relating to creation of job numbers/IDs is all
conditional on job control being enabled.

When the "jobs" rationale says:

... because jobs is useful for examining background jobs, regardless
of the condition of job control. When the user has invoked a set +m
command and job control has been turned off, jobs can still be used
to examine the background jobs associated with that current session.

it seems to me that "background jobs associated with that current session"
is referring to jobs that were created before the "set +m".

> > > If someone wants to implement it that way, I have no objection, but it
> > > should not be required.   shells should at least be permitted to remove
> > > jobs from the list of remembered stuff when their termination status has
> > > been reported to the user - however that happens.
> > 
> > I agree.
> 
> OK. I'm pretty sure everyone already does this for the jobs list. Not sure
> whether you want it to include the known IDs list.

I think kre intended it apply to the known IDs list as well, and I
was agreeing with that.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



wait and stopped processes (was: When can shells remove "known" process IDs from the list?)

2022-05-10 Thread Geoff Clare via austin-group-l at The Open Group
Chet Ramey wrote, on 06 May 2022:
>
> > > And last, also in this area, is the question of stopped jobs and the wait
> > > command, and how those two are intended to interact.
> > 
> > The wording in my current draft makes clear that wait waits for
> > processes to terminate.  I could, if desired, add some rationale saying
> > that some implementations have, as an extension, an option that allows
> > wait to return when a process stops.
> 
> That's not the current behavior. At best, it should be unspecified.

It is already what the standard requires, and with good reason.

I have never, ever, seen a shell script use "wait" in a way that would
work correctly if the wait returned when the process stopped.  The code
invariably assumes that wait will not return until the process
terminates. If it checks $? after the wait, it is always just to
distinguish between different exit status values.

Even if an application did want to check for the process stopping,
that is not possible (portably) since an exit status of (128+SIGSTOP)
can't be distinguished from a call to exit() with the same value.
Using "kill" or "ps" to check if the process still exists isn't
reliable, as another process could have been given the same pid.

In shells where wait (with no options specified) returns when a
process stops, that is a horrible misfeature.  Kre has already stated
he will change NetBSD sh so that it doesn't do that. Hopefully the
other ash-bashed shells will follow suit.  If you only change
bash in POSIX mode, you will be doing your users a disservice.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: When can shells remove "known" process IDs from the list?

2022-05-10 Thread Geoff Clare via austin-group-l at The Open Group
Chet Ramey wrote, on 06 May 2022:
>
> > There would seem to be two options to resolve this:
> > 
> > A. Uphold the decision to disallow remove-before-prompting.  This
> > would mean removing the conflicting text from set -b and updating the
> > justification on the wait page to something that holds water.
> > (And dash would need to change in order to conform.)
> 
> Are we talking about interactive shells where job control is enabled,
> disabled, or both? Or both interactive and non-interactive shells?

All interactive shells, regardless of whether job control is enabled.

> I guess I'm not clear about the goal of allowing remove-before-prompting
> unless it means the jobs list? After the shell has informed the user
> that job 1 has terminated, isn't it an error to try something like
> `wait %1', since the job may have been removed from the jobs list?

It doesn't mean the jobs list.  As you say later (and I said in an
email you hadn't got to at this point), the known process IDs list and
the background jobs list are separate things.

I assume the reason to do remove-before-prompting is to keep the
list of known process IDs tidy, so as to reduce the chances that the
list will reach its size limit when the oldest entry in the list is
still running (and presumably therefore still of interest to the user).

> Anyway, I agree with disallowing remove-before-prompting.

Unfortunately that puts you in opposition to kre.

> The standard already requires that the shell notify the user of status
> changes in any background jobs before prompting, and requires (or allows,
> at least) the shell to remove a terminated job about which the user has
> been notified from the jobs table.

That is all job-control specific; it doesn't affect the list of known
process IDs.

> This plus the text in the Asynchronous Lists section implies to me, again,
> that there are separate lists of jobs and terminated asynchronous lists,
> and shells can remove entries from each one separately. That's the model
> bash uses. But the set of jobs that the `jobs' builtin works from isn't
> really defined anywhere, and I don't think the standard includes the
> concept of a jobs list as such, even though we all know what it is. You
> can probably synthesize it from different pieces, but I don't think it's
> well defined.

It will be clearly specified in my proposed changes for bug 1254.

> So maybe you update the `wait' rationale to say something about how the
> jobs stay in the jobs list until the user is notified of their termination,
> say that the list of process IDs known in the current environment is not
> the jobs list, and update the appropriate sections to refer to one or the
> other.

I agree the distinction needs to be pointed out somewhere, but I think
it should be more prominent than the rationale for the wait utility.
The obvious place is XCU 2.12 (see below).

> Or make it clear everywhere that removing a job from the jobs list
> means removing its pid from the list of terminated asynchronous lists.

I think they should remain independent.

> This suggests that the concept of a job needs more detail, and now you need
> a definition of the current set of jobs the shell knows about to refer to
> from other sections.

The concept of a job is already getting major changes in my proposal
for bug 1254.

I think 2.12 Shell Execution Environment should have a separate entry
for the background jobs list, making it clear that it is not the same
things as the known process IDs list (which already has an entry).
There are already cross-references to 2.12 where the known process IDs
are mentioned, so it would also be the natural place to reference for
the background jobs list.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: When can shells remove "known" process IDs from the list?

2022-05-07 Thread Steffen Nurpmeso via austin-group-l at The Open Group
chet.ra...@case.edu wrote in
 <88762e56-0276-f936-cf4c-d48c8ddc2...@case.edu>:
 |On 4/29/22 4:23 PM, Robert Elz via austin-group-l at The Open Group wrote:
 ...
 |>  true & X=$!
 ...
 |They're not jobs! A pid is a pid. It doesn't matter whether it's the pid of
 |the job's controlling process (or whatever we want to call it). The
 |Asynchronous Lists text says you have to be able to wait for it. This is
 |how bash works, too.
 |
 |This is what happens when you have a jobs list and a list of terminated
 |asynchronous lists that are `known in the current shell environment'.
 ...

A bit off-topic, but it would be nice if scripts would be given
a hand to signal childs in a safe way.  I can wait(1) on a PID
maybe, but timeout(1) is not standardized (nor can it be already
i think, -- though maybe i should simply open an issue?), and so
there is no safe way to collect multiple PIDs while also being
able to kill(1) them when they exceed a time limit.  This can only
be done by means of synchronization of some stamp file or so, but
if i kill(1) a PID i do not know whether it was my PID or already
a reused PID that belongs to another program.  Yet the sh(1) does
know whether that PID is still our child or not.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: When can shells remove "known" process IDs from the list?

2022-05-06 Thread Chet Ramey via austin-group-l at The Open Group

On 5/5/22 7:46 AM, Geoff Clare via austin-group-l at The Open Group wrote:



The fact that the jobs command works with job control disabled is
mentioned in the rationale on the jobs page:

 The jobs utility is not dependent on the job control option, as
 are the seemingly related bg and fg utilities because jobs is
 useful for examining background jobs, regardless of the condition
 of job control. When the user has invoked a set +m command and job
 control has been turned off, jobs can still be used to examine the
 background jobs associated with that current session. Similarly,
 kill can then be used to kill background jobs with kill
 %.

so that's not an "issue".


If jobs and kill work, you should probably add wait to this description, or
add a separate paragraph to the wait rationale.






XBD 2.175 defines a job as

A set of processes, comprising a shell pipeline, and any
processes descended from it, that are all in the same process group.

Which says nothing very useful, and I am not sure is even correct.


Yes, I made the same point in a previous message.


The reason I think #2 should say "if job control is disabled" is
because the standard talks separately about the list of "process IDs
known in the shell environment" and the job list / job IDs. 


I think it needs to talk a little bit more clearly about the jobs list and
what constitutes a job, not to mention how and when one gets created.

Anyway, this also implies the existence of two separate lists.



Your testing above seems to be conflating the "known IDs" and the jobs
list. My reading of the standard is that entries in the jobs list only
need to be created when job control is enabled,


I'd be interested in your reasoning. The standard simply says that jobs
and kill (and wait should be added) work with job %X notation whether
or not job control is enabled. And in any event, that's not how shells
work.

I do agree that the current text implies two separate lists, and there's
insufficient explanation of how they interact. It certainly doesn't imply
that the `known IDs' stuff is only in effect when job control is not
enabled.


Independently of this, when
job control is disabled all of the requirements relating to "known IDs"
still apply and have nothing to do with %... job ID notation.


If you make that change. The known IDs description doesn't depend on job
control being enabled or disabled.




   | I think the description of the wait utility should be updated to require
   | removal from the list.

I would agree with that.


I wouldn't object.



If someone wants to implement it that way, I have no objection, but it
should not be required.   shells should at least be permitted to remove
jobs from the list of remembered stuff when their termination status has
been reported to the user - however that happens.


I agree.


OK. I'm pretty sure everyone already does this for the jobs list. Not sure
whether you want it to include the known IDs list.



That could be another valid choice, but I would prefer that all shells
wait for termination by default.


You might, but that's not the current state of the world.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-06 Thread Chet Ramey via austin-group-l at The Open Group

On 5/3/22 6:52 AM, Geoff Clare via austin-group-l at The Open Group wrote:

Robert Elz wrote, on 30 Apr 2022:


   | However, today it threw a last curve ball when I was working on an
   | update to the description of set -b ...

How many shells actually implement that?


They all accept it as an option, but for some it seems to be a no-op.
That's one of the changes I was working on when I spotted this problem.


Bash implements it. I doubt very many people use it.




   | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
   | remain known until:
   |
   |  1. The command terminates and the application waits for the process ID.
   |
   |  2. Another asynchronous list is invoked before "$!" (corresponding to
   | the previous asynchronous list) is expanded in the current execution
   | environment.

Does anyone implement that bit (#2) at all?  In a non-interactive shell it
might almost be possible, but in an interactive shell, if the job isn't in
the list (whether $! has been referenced or not - usually it will not have
been) because it has been removed, what is the shell supposed to do if the
job stops?   Further users (even in scripts) are allowed to use % %- %1
etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should
work).   I'd suggest that #2 should simply be removed.


I think #2 should say "If job control is disabled, ...".


Why? You can use job control notation with jobs/kill/wait even if job
control isn't enabled, which implies the presence of a job list separate
from the list of known IDs.



I think the description of the wait utility should be updated to require
removal from the list.


I agree, both the jobs list and the list of known IDs.

[...]



And last, also in this area, is the question of stopped jobs and the wait
command, and how those two are intended to interact.


The wording in my current draft makes clear that wait waits for
processes to terminate.  I could, if desired, add some rationale saying
that some implementations have, as an extension, an option that allows
wait to return when a process stops.


That's not the current behavior. At best, it should be unspecified.

Bash, yash, mksh, dash, the NetBSD sh, and gwsh allow the `wait' builtin to
wait for any process status change (e.g., SIGSTOP). ksh93, FreeBSD sh, and
zsh force the shell to wait until the process terminates. Bash provides an
option (`wait -f') to force a wait for process termination. I didn't check
whether other shells do.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-06 Thread Chet Ramey via austin-group-l at The Open Group

On 4/29/22 4:23 PM, Robert Elz via austin-group-l at The Open Group wrote:


   | You can test this by doing
   |
   |true &
   |
   |wait $!; echo $?
   |
   | This should print 0. Then do the same, except with the first command
   | changed to false &. That should print 1.

Yes, in the shells you mention it does, indicating that something different
is happening.   It is interesting that in bash you can do that wait over and
over again, and it keeps returning the 0 status (until one does a plain "wait"
command, even the "jobs" command doesn't remove it, though the standard
requires that it do so).   bash is the only shell that acts like that, whether
it is intentional or not I have no idea.


It's intentional, and has been in bash for a very long time.

As I said in another message, the jobs builtin not removing the pid from
the `remembered' list is probably an oversight. I'll fix it in posix mode
after bash-5.2 is released.



But try a different test

true & X=$!

(the assignment to X is just in case there is a shell which implements that
"no need to retain" stuff when $! is not referenced).

Then repeat that line over and over. (Consecutive lines).




zsh does something different, once a job has been reported as finished
at a prompt, it is removed from the jobs table, and you can no longer do
"wait %3" for it, but the pid and status seem to be remembered somewhere
else, and wait  gets the status from the job.   That seems odd to me,
it should be possible to use either form to wait on a job. 


They're not jobs! A pid is a pid. It doesn't matter whether it's the pid of
the job's controlling process (or whatever we want to call it). The
Asynchronous Lists text says you have to be able to wait for it. This is
how bash works, too.

This is what happens when you have a jobs list and a list of terminated
asynchronous lists that are `known in the current shell environment'.


bash is different again, it counts up the job numbers, like bosh and
yash, but as it reports each earlier one finished, removes it from the
jobs table, so the "jobs" command only ever shows (and then removes) the
last one started.   It still allows wait N to return the status, as many
times as you want to do that command, but not wait %n for any but the
most recently created one.


Right. The ascending job number depends on your policy for assigning new
job numbers, and you can only use job control notation to refer to entries
in the job list. But bash will let you wait for pid N as long as pid N is
in the list of terminated asynchronous processes.


The bigger issue is what do you do about users who can be connected to
their shell for weeks, running lots of background commands, and never
issuing a wait or jobs command?   Do you just keep remembering exit
status/pid pairs forever?   That doesn't sound sustainable to me.


Bash bounds the number remembered. It's at least CHILD_MAX, as POSIX
specifies, with an upper bound (right now, 32K -- very few sessions
start that many asynchronous jobs/processes). It checks for pid reuse: the
entry for pid N will always be the status of the most recent asynchronous
process with that pid. That might not be perfect, but it works fine in
practice.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-06 Thread Chet Ramey via austin-group-l at The Open Group

On 4/29/22 2:38 PM, Robert Elz via austin-group-l at The Open Group wrote:


   | However, today it threw a last curve ball when I was working on an
   | update to the description of set -b ...

How many shells actually implement that?


Bash does. I doubt anyone uses it.



   | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
   | remain known until:
   |
   |  1. The command terminates and the application waits for the process ID.
   |
   |  2. Another asynchronous list is invoked before "$!" (corresponding to
   | the previous asynchronous list) is expanded in the current execution
   | environment.

Does anyone implement that bit (#2) at all? 


I think the FreeBSD shell does.


In a non-interactive shell it
might almost be possible, but in an interactive shell, if the job isn't in
the list (whether $! has been referenced or not - usually it will not have
been) because it has been removed, what is the shell supposed to do if the
job stops?   Further users (even in scripts) are allowed to use % %- %1
etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should
work).   I'd suggest that #2 should simply be removed.


I think the standard implies that the jobs list and the list of terminated
process IDs `known in the current environment' are different things. It's
not clear.



But do note that the definition of the jobs command says:

When jobs reports the termination status of a job, the shell shall
remove its process ID from the list of those ``known in the current
shell execution environment''; see Section 2.9.3.1 (on page 2338).


This is one place where the two things overlap.



   | It also appears that dash still implements remove-before-prompting.

Does anyone not?


Lots of shells don't.



   | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to
   | add a third list item (for interactive shells only) and deleting the
   | above quoted text from the wait page.

This is necessary, we would be making use of the shell too difficult for
interactive users otherwise. 


What does "too difficult" mean? The shells that don't do remove-before-
prompting seem to be doing just fine.



While you're considering all of this, you might want to also consider what
is intended to happen if a script does

trap '' CHLD

and how that is supposed to interact with maintenance of the jobs command,
the wait command, and all else related.


It should be explicitly stated to be unspecified behavior, since SIGCHLD is
necessary to make process handling work.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: When can shells remove "known" process IDs from the list?

2022-05-06 Thread Chet Ramey via austin-group-l at The Open Group

On 4/29/22 10:39 AM, Geoff Clare via austin-group-l at The Open Group wrote:

I'm responding to these messages in order; sorry if I cover ground that's
already been covered.


I've been gradually making progress on bug 1254 as a background task.
However, today it threw a last curve ball when I was working on an
update to the description of set -b ...

That description includes this near the end:

 When the shell notifies the user a job has been completed, it may
 remove the job's process ID from the list of those known in the
 current shell execution environment


It would be correct if it referred to the `jobs utility' ("...notifies the
user via the "jobs" utility that a job..."), which already specifies that,
or explicitly said that asynchronous notification behaves as if it used the
jobs builtin and inherited its behavior.

(Which, to be clear, the bash `jobs' builtin doesn't do. I think that was
an oversight, and I'll fix it, at least in posix mode, sometime after
bash-5.2 comes out.)

I don't think that sentence as written, in this context, describes how bash 
implements asynchronous notification.




This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
remain known until:

  1. The command terminates and the application waits for the process ID.

  2. Another asynchronous list is invoked before "$!" (corresponding to
 the previous asynchronous list) is expanded in the current execution
 environment.

Then there is the following in the APPLICATION USAGE for wait:

 Historical implementations of interactive shells have discarded
 the exit status of terminated background processes before each
 shell prompt. Therefore, the status of background processes was
 usually lost unless it terminated while wait was waiting for it.
 This could be a serious problem when a job that was expected to
 run for a long time actually terminated quickly with a syntax or
 initialization error because the exit status returned was usually
 zero if the requested process ID was not found. This volume of
 POSIX.1-202x requires the implementation to keep the status of
 terminated jobs available until the status is requested, so that
 scripts like:
 [...]
 work without losing status on any of the jobs.


I don't think any shell removes jobs from the jobs list without notifying
the user about their termination status, at least in interactive shells.
So this text must be trying to refer to interactive and non-interactive
shells and reconcile the different ways the user is notified of terminated
jobs, since, unlike historical shells, the POSIX shells have job control
and allow it to be used non-interactively?

(This also implies that there are separate lists of jobs and terminated
asynchronous lists.)


There would seem to be two options to resolve this:

A. Uphold the decision to disallow remove-before-prompting.  This
would mean removing the conflicting text from set -b and updating the
justification on the wait page to something that holds water.
(And dash would need to change in order to conform.)


Are we talking about interactive shells where job control is enabled,
disabled, or both? Or both interactive and non-interactive shells?

I guess I'm not clear about the goal of allowing remove-before-prompting
unless it means the jobs list? After the shell has informed the user
that job 1 has terminated, isn't it an error to try something like
`wait %1', since the job may have been removed from the jobs list?

Anyway, I agree with disallowing remove-before-prompting.

The standard already requires that the shell notify the user of status
changes in any background jobs before prompting, and requires (or allows,
at least) the shell to remove a terminated job about which the user has
been notified from the jobs table.

This plus the text in the Asynchronous Lists section implies to me, again,
that there are separate lists of jobs and terminated asynchronous lists,
and shells can remove entries from each one separately. That's the model
bash uses. But the set of jobs that the `jobs' builtin works from isn't
really defined anywhere, and I don't think the standard includes the
concept of a jobs list as such, even though we all know what it is. You
can probably synthesize it from different pieces, but I don't think it's
well defined.

So maybe you update the `wait' rationale to say something about how the
jobs stay in the jobs list until the user is notified of their termination,
say that the list of process IDs known in the current environment is not
the jobs list, and update the appropriate sections to refer to one or the
other. Or make it clear everywhere that removing a job from the jobs list
means removing its pid from the list of terminated asynchronous lists.

This suggests that the concept of a job needs more detail, and now you need
a definition of the current set of jobs the shell knows about to refer to
from other sections.

Chet
--
``The lyf 

Re: When can shells remove "known" process IDs from the list?

2022-05-05 Thread Geoff Clare via austin-group-l at The Open Group
[Robert intended to send the mail I'm replying to to the list, but it
was only sent to me. I've quoted it in full.]

Robert Elz  wrote, on 05 May 2022:
>
>   | > How many shells actually implement that?
>   |
>   | They all accept it as an option, but for some it seems to be a no-op.
> 
> Oh yes, sorry, that is what I meant - just parsing the option is
> trivial, it is actually causing the option to do something that I
> was wondering about.
> 
>   | I think #2 should say "If job control is disabled, ...".
> 
> That might be an option, but does it really matter?   That is, does any
> shell that anyone knows of actually make use of this allowance to avoid
> saving the exit status of anything?
> 
> If not, then we can just delete that, and not have to worry about the
> next issue - which is that, even with job control disabled, I think the
> jobs command, and  job type notation, all still works.

Even if no shell currently implements it, I don't see the point of
disallowing an optimisation that the standard currently allows.

The fact that the jobs command works with job control disabled is
mentioned in the rationale on the jobs page:

The jobs utility is not dependent on the job control option, as
are the seemingly related bg and fg utilities because jobs is
useful for examining background jobs, regardless of the condition
of job control. When the user has invoked a set +m command and job
control has been turned off, jobs can still be used to examine the
background jobs associated with that current session. Similarly,
kill can then be used to kill background jobs with kill
%.

so that's not an "issue".

> What enabling
> job control does is arrange to place jobs in process groups of their own,
> different from the shell (for background jobs) and to manage the process
> group of the controlling tty.
> 
> But it is not as clear as it might be what is supposed to work.
> 
> XBD 2.175 defines a job as
> 
>   A set of processes, comprising a shell pipeline, and any
>   processes descended from it, that are all in the same process group.
> 
> Which says nothing very useful, and I am not sure is even correct.

At this point you start going over old ground. A major part of the
work I've been doing on bug 1254 is because this definition is wrong.
I started a mailing list thread about it in July 2019, `Bug 1254 gets
worse: "Job" definition is wrong', which you contributed to.

At some point I will check whether I've accounted for the things you
say here, but for now I'm going to skip over most of it and concentrate
on the parts related to the question I asked.

> Eg: From one shell I can start another, and from that shell, start
> several jobs in different process groups.   To the initial shell all
> of this is one job - though attempts to send signals to it might
> reach anyone or no-one if using signals to process groups (the child
> shell will be changing its process group to match whichever of its
> children is the current foreground job, so there might be nothing in
> the pgrp that the initial shell thinks refers to that job).
> 
> For distinguishing jobs in a shell with job control turned off, it
> says nothing at all, as ignoring that previous issue (when the job, or
> some of its descendants change their pgrp) a shell pipeline is all one
> process group, with job control (-m) enabled or not.  What differs is
> whether other shell pipelines are in the same pgrp as the first one or
> not, and that definition doesn't touch on that.   So, it is reasonable to
> conclude that any shell pipeline is a job, for this definition.
> 
> Then 3.176 defines Job Control, there's nothing particularly relevant
> (nor incorrect) about what that says, it isn't useful for present purposes.
> 
> Then 3.177 defines Job Control Job ID
> 
>   A handle that is used to refer to a job. The job control job ID
>   can be any of the forms shown in the following table:
> 
> There's no need to reproduce the table here, it is just the various forms
> of % notation that are defined.
> 
> Note there's nothing in 3.175 or 3.177 about requiring job control to be
> enabled in order to get jobs, or job control job ID handles.
> 
> The jobs command definition just says:
> 
>   The jobs utility shall display the status of jobs that were started
>   in the current shell environment; see Section 2.12 (on page 2348).
> 
> The xref is just about what "current shell environment" means, and isn't
> relevant here.   Nothing in that about "when job control is enabled" or
> anything similar.
> 
> The kill command says:
> 
>   OPERANDS
> The following operands shall be supported:
> 
>   pid One of the following:
> 
> [ omitting #1, that's just a decimal pid, or negative of a pgrp id,
> and adds nothing useful here]
> 
>   2. A job control job ID (see XBD Section 3.177, on page 54)
>  that identifies a background process group to be signaled.
>   

Re: When can shells remove "known" process IDs from the list?

2022-05-03 Thread Geoff Clare via austin-group-l at The Open Group
Robert Elz wrote, on 30 Apr 2022:
>
>   | However, today it threw a last curve ball when I was working on an
>   | update to the description of set -b ...
> 
> How many shells actually implement that?

They all accept it as an option, but for some it seems to be a no-op.
That's one of the changes I was working on when I spotted this problem.

>   | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
>   | remain known until:
>   |
>   |  1. The command terminates and the application waits for the process ID.
>   |
>   |  2. Another asynchronous list is invoked before "$!" (corresponding to
>   | the previous asynchronous list) is expanded in the current execution
>   | environment.
> 
> Does anyone implement that bit (#2) at all?  In a non-interactive shell it
> might almost be possible, but in an interactive shell, if the job isn't in
> the list (whether $! has been referenced or not - usually it will not have
> been) because it has been removed, what is the shell supposed to do if the
> job stops?   Further users (even in scripts) are allowed to use % %- %1
> etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should
> work).   I'd suggest that #2 should simply be removed.

I think #2 should say "If job control is disabled, ...".

> But do note that the definition of the jobs command says:
> 
>   When jobs reports the termination status of a job, the shell shall
>   remove its process ID from the list of those ``known in the current
>   shell execution environment''; see Section 2.9.3.1 (on page 2338).
> 
> (quote from I8 Draft 2.1 -- but that text has been there forever, or 
> seemingly).
> 

Good catch. That should be added to the numbered list in 2.9.3.1.

> So that's another way that an entry is removed, and this one is "shall remove"
> whereas "remain known until" puts a minimum on how long the job is supposed
> to remain known, but doesn't actually require removal.   For #2 that's 
> obvious,
> shells aren't required to make that optimisation (that's some academic view of
> what was thought should be possible - but isn't in practice), but for #1 if
> the job isn't removed (when wait happens) then it could still be there, again,
> and again, forever

I think the description of the wait utility should be updated to require
removal from the list.

>   | My initial reaction to this was that the above quote from set -b is
>   | likely a left-over from before the decision to disallow the historical
>   | remove-before-prompting behaviour was made.
> 
> I doubt that -b is particularly relevant to this, other than that it provides
> an alternate time at which termination status of a process can be shown.
> 
>   | However, then I spotted that the text from wait, which seems to be an
>   | attempt to justify that decision, first says it was historical
>   | behaviour for *interactive* shells but then talks about the problems
>   | it could cause for *scripts*.  So it seems to me that the
>   | justification does not stand up to scrutiny.
> 
> The justification doesn't, but for scripts I don't recall there ever
> really being an issue - the removal happens when the status of jobs which
> have changed status is reported just before PS1 is written, and
> non-interactive shells (scripts) don't do that.
> 
> On the other hand, users of interactive shells are not in the habit of
> issuing wait commands (even jobs commands, without some reason do do so).

I have done it occasionally, when I have a bunch of background jobs
running and I don't care about their individual status, I just want
to be told when they've all finished:

command1 &
.
.
commandN &
wait; echo ALL DONE

However, in this particular scenario it wouldn't matter if command1, say,
had already finished and been removed from the list when I typed the wait
command.

> They expect to be told when a background job has finished (without -b both
> working, and set, that might require causing new prompts to appear from time
> to time) and simply expect that when a job has been reported as done, it is
> done, and no longer exists.
> 
>   | It also appears that dash still implements remove-before-prompting.
> 
> Does anyone not?

Most shells do not.  Harald's reply has the details.

>   | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to
>   | add a third list item (for interactive shells only) and deleting the
>   | above quoted text from the wait page.
> 
> This is necessary, we would be making use of the shell too difficult for
> interactive users otherwise.   But there is no particular need for an
> "interactive only" here, scripts can (though usually don't) use the jobs
> command as well (it is a convenient way to get rid of any jobs from the
> table that have finished, without knowing what they are, and without
> potentially hanging waiting for something still running).

The third item I'm referring to would just be for removal before
prompting, so obviously 

Re: When can shells remove "known" process IDs from the list?

2022-04-29 Thread Robert Elz via austin-group-l at The Open Group
Date:Fri, 29 Apr 2022 20:11:55 +0100
From:"Harald van Dijk via austin-group-l at The Open Group" 

Message-ID:  

  | >| It also appears that dash still implements remove-before-prompting.
  |
  | busybox ash and my shell do as well, but both are derived from dash and 
  | have merely retained dash's behaviour.

All ash derived shells work that way.

  | > Does anyone not?
  |
  | bash does not. bosh does not. ksh does not. mksh does not. posh does 
  | not. yash does not. zsh does not.

I did a test (not the same one you did) after I sent the mail, and saw
that bosh and yash don't.   For the other shells, it is not nearly as
clearcut what is happening.

  | You can test this by doing
  |
  |true &
  |
  |wait $!; echo $?
  |
  | This should print 0. Then do the same, except with the first command 
  | changed to false &. That should print 1.

Yes, in the shells you mention it does, indicating that something different
is happening.   It is interesting that in bash you can do that wait over and
over again, and it keeps returning the 0 status (until one does a plain "wait"
command, even the "jobs" command doesn't remove it, though the standard
requires that it do so).   bash is the only shell that acts like that, whether
it is intentional or not I have no idea.

But try a different test

true & X=$!

(the assignment to X is just in case there is a shell which implements that
"no need to retain" stuff when $! is not referenced).

Then repeat that line over and over. (Consecutive lines).

In ash derived shells (and pdksh) the first will report job 1 starting
(assuming you had none already running), the 2nd line will report job 2
starting, and before prompting for the 3rd, report job 1 has finished.
The third will be job 1 again, and report job 2 has finished, and that
continues over and over again.

This is all consistent with how we know that they work.

In bosh and yash, the job number just keeps on climbing, even though they
report the previous job finished as each subsequent one is started.  That's
also consistent with how they operate.   A simple "wait N" for one of the
jobs removes that one from the list, then more true& commands add more jobs.
A simple "wait" clears up everything.   In yash "jobs" reports them all 
finished and clears everything, as it should.  In bosh "jobs" reports them all
finished, but clears nothing (the jobs command can be repeated over and over
and keeps reporting all the completed jobs).   That's clearly broken.

zsh does something different, once a job has been reported as finished
at a prompt, it is removed from the jobs table, and you can no longer do
"wait %3" for it, but the pid and status seem to be remembered somewhere
else, and wait  gets the status from the job.   That seems odd to me,
it should be possible to use either form to wait on a job.   (I should note
that there is something odd about my zsh install - I tend to need to type
two newlines after a command to get it executed, both are seen by the shell.
Most of the time that's just mildly annoying, when I forget the 2nd, nothing
happens, and I have to wake up and remember that zsh is waiting for the 2nd
before it will do anything with the command - but in testing like this, where
the newlines generate prompts, and the accompanying the prompt is an action
we care about, it kind of ruins the test.)

ksh93 is similar (without the double newline issue).

mksh is almost similar, but in it I saw
internal error: j_async: bad nzombie (161)
twice (once, then more testing, then again), which does not look good.
I don't know what the 161 represents, it was not the same each time, but
is not a pid of any of the jobs started.  A count?

In that one, with this sequence, there are only ever 2 jobs (as in job
numbers) assigned, as each is started, the previous one is reported finished,
and removed from the jobs table.  It is possible to wait %n for the job
number most recently started, but only that one (were the commands to run
for longer, then presumably it would be possible to wait on any not completed
and reported as completed).

bash is different again, it counts up the job numbers, like bosh and
yash, but as it reports each earlier one finished, removes it from the
jobs table, so the "jobs" command only ever shows (and then removes) the
last one started.   It still allows wait N to return the status, as many
times as you want to do that command, but not wait %n for any but the
most recently created one.

  | I consider the dash behaviour a bug, but do not want to 
  | fix it in a way that introduces another bug.

While removing jobs that have been reported (ie: removing them as
soon as possible) might reduce the risk of getting duplicate pids,
it doesn't actually solve the problem.   In particular, the removal
only happens in interactive shells (ones which prompt) so does nothing
at all for scripts, which have the same issue.   It can also happen in
an interactive 

Re: When can shells remove "known" process IDs from the list?

2022-04-29 Thread Harald van Dijk via austin-group-l at The Open Group

On 29/04/2022 19:38, Robert Elz via austin-group-l at The Open Group wrote:

 Date:Fri, 29 Apr 2022 15:39:23 +0100
 From:"Geoff Clare via austin-group-l at The Open Group" 

 Message-ID:  <20220429143923.GA22521@localhost>

   | It also appears that dash still implements remove-before-prompting.


pdksh does as well. However, pdksh is no longer maintained and the 
maintained shells that derive from pdksh have changed this.


busybox ash and my shell do as well, but both are derived from dash and 
have merely retained dash's behaviour.



Does anyone not?


bash does not. bosh does not. ksh does not. mksh does not. posh does 
not. yash does not. zsh does not.


You can test this by doing

  true &
  
  wait $!; echo $?

This should print 0. Then do the same, except with the first command 
changed to false &. That should print 1.


For the record, in my shell, a fork of dash, I have retained the dash 
behaviour for now because it is unclear to me what the expected 
behaviour is if a later background job gets the same PID as a previous 
background job. I consider the dash behaviour a bug, but do not want to 
fix it in a way that introduces another bug.


Cheers,
Harald van Dijk



Re: When can shells remove "known" process IDs from the list?

2022-04-29 Thread Robert Elz via austin-group-l at The Open Group
Date:Fri, 29 Apr 2022 15:39:23 +0100
From:"Geoff Clare via austin-group-l at The Open Group" 

Message-ID:  <20220429143923.GA22521@localhost>

Sorry, been too busy to participate here much recently, will catch up
someday soon (I hope).

  | However, today it threw a last curve ball when I was working on an
  | update to the description of set -b ...

How many shells actually implement that?

  | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
  | remain known until:
  |
  |  1. The command terminates and the application waits for the process ID.
  |
  |  2. Another asynchronous list is invoked before "$!" (corresponding to
  | the previous asynchronous list) is expanded in the current execution
  | environment.

Does anyone implement that bit (#2) at all?  In a non-interactive shell it
might almost be possible, but in an interactive shell, if the job isn't in
the list (whether $! has been referenced or not - usually it will not have
been) because it has been removed, what is the shell supposed to do if the
job stops?   Further users (even in scripts) are allowed to use % %- %1
etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should
work).   I'd suggest that #2 should simply be removed.

But do note that the definition of the jobs command says:

When jobs reports the termination status of a job, the shell shall
remove its process ID from the list of those ``known in the current
shell execution environment''; see Section 2.9.3.1 (on page 2338).

(quote from I8 Draft 2.1 -- but that text has been there forever, or seemingly).

So that's another way that an entry is removed, and this one is "shall remove"
whereas "remain known until" puts a minimum on how long the job is supposed
to remain known, but doesn't actually require removal.   For #2 that's obvious,
shells aren't required to make that optimisation (that's some academic view of
what was thought should be possible - but isn't in practice), but for #1 if
the job isn't removed (when wait happens) then it could still be there, again,
and again, forever - even if the system uses the same pid later (days, weeks,
months later perhaps) for another job started by the same shell -- against which
there is no protection of any kind currently, though a shell could do WNOWAIT
waits so zombies remain in the process table, even though the shell has 
already collected the exit status - but that's difficult to actually
code correctly, especially given the definition of how SIGCHLD works, which
as best I can tell has to be used as the only thing that would make it
even conceivable to use WNOWAIT.   Without that, when the shell acts like
I believe most, or all do, and cleans up zombies ASAP, just keeping the
job in its jobs table, marked terminated, with the status ready to give
back when requested, the kernel is free to assign the reclaimed pid to any
new process it likes, whenever it likes.

  | My initial reaction to this was that the above quote from set -b is
  | likely a left-over from before the decision to disallow the historical
  | remove-before-prompting behaviour was made.

I doubt that -b is particularly relevant to this, other than that it provides
an alternate time at which termination status of a process can be shown.

  | However, then I spotted that the text from wait, which seems to be an
  | attempt to justify that decision, first says it was historical
  | behaviour for *interactive* shells but then talks about the problems
  | it could cause for *scripts*.  So it seems to me that the
  | justification does not stand up to scrutiny.

The justification doesn't, but for scripts I don't recall there ever
really being an issue - the removal happens when the status of jobs which
have changed status is reported just before PS1 is written, and
non-interactive shells (scripts) don't do that.

On the other hand, users of interactive shells are not in the habit of
issuing wait commands (even jobs commands, without some reason do do so).
They expect to be told when a background job has finished (without -b both
working, and set, that might require causing new prompts to appear from time
to time) and simply expect that when a job has been reported as done, it is
done, and no longer exists.

  | It also appears that dash still implements remove-before-prompting.

Does anyone not?

  | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to
  | add a third list item (for interactive shells only) and deleting the
  | above quoted text from the wait page.

This is necessary, we would be making use of the shell too difficult for
interactive users otherwise.   But there is no particular need for an
"interactive only" here, scripts can (though usually don't) use the jobs
command as well (it is a convenient way to get rid of any jobs from the
table that have finished, without knowing what they are, and without
potentially hanging waiting for something 

Re: When can shells remove "known" process IDs from the list?

2022-04-29 Thread shwaresyst via austin-group-l at The Open Group
It appears to me the set -b wording needs updating, to clarify "may remove the 
job's process ID" is intended to exclude the blocking circumstances listed, and 
since it's a "may", not "shall", whether those exclusions are handled properly 
now is more a quality of implementation than conformance issue.
 
 
  On Fri, Apr 29, 2022 at 10:40 AM, Geoff Clare via austin-group-l at The Open 
Group wrote:   I've been gradually making 
progress on bug 1254 as a background task.
However, today it threw a last curve ball when I was working on an
update to the description of set -b ...

That description includes this near the end:

    When the shell notifies the user a job has been completed, it may
    remove the job's process ID from the list of those known in the
    current shell execution environment

This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
remain known until:

 1. The command terminates and the application waits for the process ID.

 2. Another asynchronous list is invoked before "$!" (corresponding to
    the previous asynchronous list) is expanded in the current execution
    environment.

Then there is the following in the APPLICATION USAGE for wait:

    Historical implementations of interactive shells have discarded
    the exit status of terminated background processes before each
    shell prompt. Therefore, the status of background processes was
    usually lost unless it terminated while wait was waiting for it.
    This could be a serious problem when a job that was expected to
    run for a long time actually terminated quickly with a syntax or
    initialization error because the exit status returned was usually
    zero if the requested process ID was not found. This volume of
    POSIX.1-202x requires the implementation to keep the status of
    terminated jobs available until the status is requested, so that
    scripts like:
    [...]
    work without losing status on any of the jobs.

My initial reaction to this was that the above quote from set -b is
likely a left-over from before the decision to disallow the historical
remove-before-prompting behaviour was made.

However, then I spotted that the text from wait, which seems to be an
attempt to justify that decision, first says it was historical
behaviour for *interactive* shells but then talks about the problems
it could cause for *scripts*.  So it seems to me that the
justification does not stand up to scrutiny.

It also appears that dash still implements remove-before-prompting.

There would seem to be two options to resolve this:

A. Uphold the decision to disallow remove-before-prompting.  This
would mean removing the conflicting text from set -b and updating the
justification on the wait page to something that holds water.
(And dash would need to change in order to conform.)

B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to
add a third list item (for interactive shells only) and deleting the
above quoted text from the wait page.

I'm particularly interested to get the opinions of shell authors on
this.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

  


When can shells remove "known" process IDs from the list?

2022-04-29 Thread Geoff Clare via austin-group-l at The Open Group
I've been gradually making progress on bug 1254 as a background task.
However, today it threw a last curve ball when I was working on an
update to the description of set -b ...

That description includes this near the end:

When the shell notifies the user a job has been completed, it may
remove the job's process ID from the list of those known in the
current shell execution environment

This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
remain known until:

 1. The command terminates and the application waits for the process ID.

 2. Another asynchronous list is invoked before "$!" (corresponding to
the previous asynchronous list) is expanded in the current execution
environment.

Then there is the following in the APPLICATION USAGE for wait:

Historical implementations of interactive shells have discarded
the exit status of terminated background processes before each
shell prompt. Therefore, the status of background processes was
usually lost unless it terminated while wait was waiting for it.
This could be a serious problem when a job that was expected to
run for a long time actually terminated quickly with a syntax or
initialization error because the exit status returned was usually
zero if the requested process ID was not found. This volume of
POSIX.1-202x requires the implementation to keep the status of
terminated jobs available until the status is requested, so that
scripts like:
[...]
work without losing status on any of the jobs.

My initial reaction to this was that the above quote from set -b is
likely a left-over from before the decision to disallow the historical
remove-before-prompting behaviour was made.

However, then I spotted that the text from wait, which seems to be an
attempt to justify that decision, first says it was historical
behaviour for *interactive* shells but then talks about the problems
it could cause for *scripts*.  So it seems to me that the
justification does not stand up to scrutiny.

It also appears that dash still implements remove-before-prompting.

There would seem to be two options to resolve this:

A. Uphold the decision to disallow remove-before-prompting.  This
would mean removing the conflicting text from set -b and updating the
justification on the wait page to something that holds water.
(And dash would need to change in order to conform.)

B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to
add a third list item (for interactive shells only) and deleting the
above quoted text from the wait page.

I'm particularly interested to get the opinions of shell authors on
this.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England