Re: wait(2) and SIGCHLD

2020-08-16 Thread Mouse
>>> but isn't what's supposed to happen when a child's parent is
>>> ignoring SIGCHLD - the child should skip zombie state, and simply
>>> be cleaned up.
>> And how is "reparent to init" not an acceptable means of
>> implementing that?
> Acceptable or not, it would seem to not match our own documentation.

Point.  That manpage wording should be updated a little.

>> I thought I'd seen some code that rendered init immune to SIGKILL
>> and possibly SIGSTOP too [...]
> SIGSTOP is one of two signals that a process supposedly should not be
> able to intercept.  Of course, init is special enough that normal
> rules might not apply...

Yes, the code I was thinking of was inside the kernel, where of course
rules like that apply only insofar as the code chooses to let them.

>> Right, they shouldn't be.  But init shouldn't be stopped, either.

>> Similarly, I think it should be impossible to ptrace init, [...]

> How special do one really want init to be?

As special as it needs to be.

I'm not as confident now as I was when I wrote that that ptracing init
should be impossible.  I do think it should be possible to configure a
system such that it's impossible, and that that should be the default.
But, as someone who routinely goes under the hood, I think it could be
very useful to be able to set a system up so that it's possible.

As a data point: I booted a scratch system (4.0.1, because that's all I
have on the most convenient scratch hardware), and neither
"kill -STOP 1" nor "kill -KILL 1" had any effect visible to ps ax.  I
don't know where/how they're getting stopped, but they are.

Mouse


Re: wait(2) and SIGCHLD

2020-08-16 Thread Johnny Billquist

On 2020-08-16 21:17, Mouse wrote:

They don't vanish, they get reparented to init(8) which then wakes
up and reaps them.

That probably would work, approximately,


Well, it does work, to at least a first approximation.


but isn't what's supposed to happen when a child's parent is ignoring
SIGCHLD - the child should skip zombie state, and simply be cleaned
up.


And how is "reparent to init" not an acceptable means of implementing
that?


Acceptable or not, it would seem to not match our own documentation.

From the sigaction() man-page:

 SA_NOCLDWAIT   If set, the system will not create a zombie when 
the child
exits, but the child process will be automatically 
waited

for.  The same effect can be achieved by setting the
signal handler for SIGCHLD to SIG_IGN.


The difference would be detectable if init were sent a SIGSTOP
(assuming that isn't one which would cause a system panic)


I don't think it would panic, but I think that, if it really does stop
init, it's a bug that it does so.  I thought I'd seen some code that
rendered init immune to SIGKILL and possibly SIGSTOP too (maybe by
forcing them into init's blocked-signals set? I forget).  But I can't
seem to find it now.


SIGSTOP is one of two signals that a process supposedly should not be 
able to intercept. Of course, init is special enough that normal rules 
might not apply...



so it would stop reaping children (temporarily) - processes of the
type in question should not be showing up as zombies.


Right, they shouldn't be.  But init shouldn't be stopped, either.

Similarly, I think it should be impossible to ptrace init, and I have a
fuzzy memory that it was on at least one system I tried it on.

I'll be poking around a bit more.


How special do one really want init to be?

  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: SIGCHLD and sigaction()

2020-08-16 Thread Mouse
>> I don't understand what problem queued SIGCHLD was invented to
>> address.

> My impression is that it allows you to get notified of state changes
> of your child processes.  If one signal could annonce several state
> changes, how would you know what these state changes are?

You'd call wait4(2) (or waitpid or wait3) with WNOHANG until it
returned 0 (or returned -1 with ECHILD), collecting one child status
change each time.  You'd still need to do more or less the same with
queued SIGCHLD; the only difference is it would let you skip the
WNOHANG and call it exactly once per signal.  (Unless a single child
changed state twice, such as by being stopped and then killed, in which
case the implementation had better queue wait stati as well or you'll
be calling wait too often!)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: SIGCHLD and sigaction()

2020-08-16 Thread Edgar Fuß
> I don't understand what problem queued SIGCHLD was invented to address.
My impression is that it allows you to get notified of state changes of your 
child processes. If one signal could annonce several state changes, how 
would you know what these state changes are?


Re: wait(2) and SIGCHLD

2020-08-16 Thread Mouse
>> They don't vanish, they get reparented to init(8) which then wakes
>> up and reaps them.
> That probably would work, approximately,

Well, it does work, to at least a first approximation.

> but isn't what's supposed to happen when a child's parent is ignoring
> SIGCHLD - the child should skip zombie state, and simply be cleaned
> up.

And how is "reparent to init" not an acceptable means of implementing
that?

> The difference would be detectable if init were sent a SIGSTOP
> (assuming that isn't one which would cause a system panic)

I don't think it would panic, but I think that, if it really does stop
init, it's a bug that it does so.  I thought I'd seen some code that
rendered init immune to SIGKILL and possibly SIGSTOP too (maybe by
forcing them into init's blocked-signals set? I forget).  But I can't
seem to find it now.

> so it would stop reaping children (temporarily) - processes of the
> type in question should not be showing up as zombies.

Right, they shouldn't be.  But init shouldn't be stopped, either.

Similarly, I think it should be impossible to ptrace init, and I have a
fuzzy memory that it was on at least one system I tried it on.

I'll be poking around a bit more.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: wait(2) and SIGCHLD

2020-08-16 Thread Christos Zoulas
In article <28808.1597602...@jinx.noi.kre.to>,
Robert Elz   wrote:
>Date:Sun, 16 Aug 2020 16:13:57 - (UTC)
>From:chris...@astron.com (Christos Zoulas)
>Message-ID:  
>
>  | They don't vanish, they get reparented to init(8) which then wakes up
>  | and reaps them.
>
>That probably would work, approximately, but isn't what's supposed to
>happen when a child's parent is ignoring SIGCHLD - the child should
>skip zombie state, and simply be cleaned up.
>
>The difference would be detectable if init were sent a SIGSTOP
>(assuming that isn't one which would cause a system panic)
>so it would stop reaping children (temporarily) - processes of
>the type in question should not be showing up as zombies.

FreeBSD does what we do (reparent to init). Linux has autoreap
which moves the state of the process to DEAD without going through
ZOMBIE and adds it to the dead queue.

christos



Re: wait(2) and SIGCHLD

2020-08-16 Thread Robert Elz
Date:Sun, 16 Aug 2020 16:13:57 - (UTC)
From:chris...@astron.com (Christos Zoulas)
Message-ID:  

  | They don't vanish, they get reparented to init(8) which then wakes up
  | and reaps them.

That probably would work, approximately, but isn't what's supposed to
happen when a child's parent is ignoring SIGCHLD - the child should
skip zombie state, and simply be cleaned up.

The difference would be detectable if init were sent a SIGSTOP
(assuming that isn't one which would cause a system panic)
so it would stop reaping children (temporarily) - processes of
the type in question should not be showing up as zombies.

kre



Re: wait(2) and SIGCHLD

2020-08-16 Thread Christos Zoulas
In article <5919.1597441...@jinx.noi.kre.to>,
Robert Elz   wrote:
>Date:Fri, 14 Aug 2020 20:01:18 +0200
>From:Edgar =?iso-8859-1?B?RnXf?= 
>Message-ID:  <20200814180117.gq61...@trav.math.uni-bonn.de>
>
>  | 3. I don't see where POSIX defines or allows this, but given 2., I'm surely
>  |missing something.
>
>It is specified to work this way in POSIX, though right now I don't
>have the time to go dig out exactly where.
>
>Setting SIGCHLD to SIG_IGN effectively means that you want to ignore
>your children - they then don't report any exit status to their parent,
>but simply vanish when they exit.   Thus when the parent does a wait()
>it has no children, and gets ECHLD.

They don't vanish, they get reparented to init(8) which then wakes up
and reaps them.

>Leave (or set) SIGCHLD to SIG_DFL and you don't get signals, but child
>processes do report status to their parent.   Catch SIGCHLD and you'll
>get signalled when a child exits (I'm not sure if NetBSD guarantees one
>signal delivery for each exited child or just a signal if there are
>some unspecified number of exited children).
>
>The actions on an ignored SIGCHLD is SysV inherited behaviour,
>Bell Labs (v7/32V) and CSRG BSD systems didn't act this way.

Yup, I edded this:
1.199(christos 30-Mar-05): #define  P_CLDSIGIGN 0x0008 /* 
Process is ignoring SIGCHLD */

christos



Re: futexes

2020-08-16 Thread Jason Thorpe


> On Aug 16, 2020, at 5:58 AM, Robert Swindells  wrote:
> 
> 
> Taylor R Campbell  wrote:
>>> Date: Sat, 15 Aug 2020 19:59:24 +0100
>>> From: Robert Swindells 
>>> 
>>> Is anyone working on the proposed solution to kern/55230 ?
>> 
>> thorpej was working on it and has a patch -- I thought it got
>> committed, but I guess not?  There might have been some hard-to-fix
>> bug remainining in it but I forget the details.
> 
> Would it help to have more people testing and/or looking at it ?

The fix is non-trivial, and requires a fundamental change to how futexes work.

That said, I've made the change, and it fixes the unit test, and to exercise it 
more thoroughly, I also started converting several pthread locking interfaces 
to use futex beneath, while also enhancing the test cases for those pthread 
interfaces to verify proper priority ordering.  However, the changes started to 
grow without bound and then I ran into a time crunch.

What I should do is wind my code back to the basic "just fix the futex 
internals" change, and publish there so that others can work on it.  The main 
sticking point is that Linux Java was getting stuck ... all of my unit tests 
were passing, however.  This is why I started working on native users -- to 
provide more coverage and possibly make it easier to debug that problem.

-- thorpej



Re: futexes

2020-08-16 Thread Robert Swindells


Taylor R Campbell  wrote:
>> Date: Sat, 15 Aug 2020 19:59:24 +0100
>> From: Robert Swindells 
>> 
>> Is anyone working on the proposed solution to kern/55230 ?
>
>thorpej was working on it and has a patch -- I thought it got
>committed, but I guess not?  There might have been some hard-to-fix
>bug remainining in it but I forget the details.

Would it help to have more people testing and/or looking at it ?


Re: SIGCHLD and sigaction()

2020-08-16 Thread Mouse
>>> When I install a SIGCHLD handler via sigaction() using SA_SIGINFO,
>>> is it guaranteed that my handler is called (at least) once per
>>> death-of-a-child?
>> "Maybe."  It depends on how portable you want to be.
>> [...]
> While we're on this topic. Unix signals don't exactly work like
> hardware interrupts anyhow, I suspect, and it's a thing that have
> constantly befuddled me.

(Caveat: this is all just my understanding.  I think it's accurate, but
I welcome corrections.)

No, they don't.  To the extent that they do, they work like (latched)
edge-triggered interrupts.  There is no signal analog to a
level-triggered interrupt.  And, as you say, there are some important
differences even beyond that.

> As far as I can tell, there is a problematic race condition in the
> old signal mechanism, and that is the reason (I believe) why the new
> semantics were introduced).

I think you are right.  But it's not just "old" and "new".  There are
really old signal semantics, where the handler is uninstalled, I think
it is, when the signal is delivered, and it's up to the code to
reinstall it.  This is vulnerable to, if nothing else, the race where a
second signal arrives before the handler gets reinstalled, though
that's less important for SIGCHLD (see below).  (There may be an even
older form of signal handling, but if so I know nothing about it.)

This then led to (so-called) reliable signals, where the handler stays
installed (or at least can be set to do so) but signals can be blocked,
and get blocked (or, again, at least can be set to be) when the handler
is called.  However, there is still only a single pending bit per
signal.  I think these came in sometime in the 4BSD era, but I don't
recall details - indeed, I'm not sure I ever knew details.

Turning SIGCHLD into something queued, something more like bytes in a
pipe, is, in my perception, more recent yet.  I'm not sure who invented
them (but see below).

There are traces of the first form in modern NetBSD, in the form of the
SA_RESETHAND bit.  I don't *think* that gives full unreliable-signal
semantics, but I haven't checked in enough detail to be sure - I think
you'd also need an SA_DONTBLOCK flag or something of the sort, or the
handler would have to explicitly unblock the signal.

> You have two child processes.  One exit, and you get into your signal
> handler.  In there you then call wait to reap the child and process
> things.  You then call wait again, and repeat until there are no
> children left to reap, as you only get one signal, even if you get
> multiple children that exits.  When no more unreaped children exist,
> you exit the signal handler, and a new signal can be delivered.

Right.

> However, what happens if the second child exists between the call to
> wait, and the exit from the signal handler?  It would seem the signal
> would get lost, since we are in the process of handling the signal,
> and a new signal is not delivered during this time.

If you're using unreliable signals - the first sort I outlined above -
then yes, there is either this race, or, if you reinstall the handler
before you do your last wait call, a different race.  (Provided your
handler doesn't mind being called recursively between reinstall and
return, this may be tolerable.)

If you're using reliable signals but without queued SIGCHLD, this is
not a problem, because the second SIGCHLD (and any additional later
SIGCHLDs) will set the pending bit for SIGCHLD.  As soon as you return
from the handler (or explicitly unblock the signal), it will be
delivered.

> Now, have I misunderstood something about how non-queued signal
> handling works, or is/was there a problem there?

There was, but it was a problem with the oldest of the above three
kinds of signal.  Queued SIGCHLD was not necessary for it - I don't
understand what problem queued SIGCHLD was invented to address.  The
only thing I can think of was that it came from SRV4, which had a NIH
attitude towards BSD reliable signals, so they invented queued SIGCHLD
to get reliable child death handling, and then someone (POSIX maybe?)
decided it would be good to include all of both mechanisms.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: SIGCHLD and sigaction()

2020-08-16 Thread Johnny Billquist

On 2020-08-16 12:49, Johnny Billquist wrote:

On 2020-08-15 22:46, Mouse wrote:

When I install a SIGCHLD handler via sigaction() using SA_SIGINFO, is
it guaranteed that my handler is called (at least) once per
death-of-a-child?


"Maybe."  It depends on how portable you want to be.

Historically, "no": in some older systems, a second SIGCHLD delivered
when there's already one pending delivery gets, lost same as any other
signal.

Then someone - POSIX? SVR4? I don't know - decided to invent a flavour
of signal that's more like writes to a pipe: multiple of them can be
pending at once.  Some systems decided this was sane and implemented
it.

Personally, I don't like it; I think signals should be much like
hardware interrupts in that a second instance happening before the
first is serviced gets silently merged.


While we're on this topic. Unix signals don't exactly work like hardware 
interrupts anyhow, I suspect, and it's a thing that have constantly 
befuddled me. As far as I can tell, there is a problematic race 
condition in the old signal mechanism, and that is the reason (I 
believe) why the new semantics were introduced).


The problem goes like this:

You have two child processes. One exit, and you get into your signal 
handler. In there you then call wait to reap the child and process 
things. You then call wait again, and repeat until there are no children 
left to reap, as you only get one signal, even if you get multiple 
children that exits. When no more unreaped children exist, you exit the 
signal handler, and a new signal can be delivered.


However, what happens if the second child exists between the call to 
wait, and the exit from the signal handler? It would seem the signal 
would get lost, since we are in the process of handling the signal, and 
a new signal is not delivered during this time.


In real hardware this usually don't happen, because the actual interrupt 
request can be reissued by the device while you are in the interrupt 
handler. There are some hardware interrupt designs, with edge triggered 
interrupts, where similar problems can exist, and those you have to be 
very careful with how you handle them so you don't get to the same kind 
of race condition.


Now, have I misunderstood something about how non-queued signal handling 
works, or is/was there a problem there?


Reading the current documentation, I would assume that at the call to 
the signal handler, the signal is blocked, and also removed from pending 
signals, so a new even would queue up a new signal to be delivered when 
returning from the signal handler. However, the text above is from 
trying to recall how it used to be going back in time, to when you had 
to re-install the signal handler after each activation. I can't seem to 
find documentation for how it worked back in the day. I can't even 
remember when/where I was reading up on that and thinking there might be 
a problem here, but it was a long time ago. So this is possibly just of 
historical interest.


By the way, I haven't seen any explicit mention of the pending signal 
being cleared at signal handler entry, so that is just my assumption 
right now. If that is wrong, then I would expect there is a race 
condition in there. Maybe someone else knows where that detail is 
documented?


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: SIGCHLD and sigaction()

2020-08-16 Thread Johnny Billquist

On 2020-08-15 22:46, Mouse wrote:

When I install a SIGCHLD handler via sigaction() using SA_SIGINFO, is
it guaranteed that my handler is called (at least) once per
death-of-a-child?


"Maybe."  It depends on how portable you want to be.

Historically, "no": in some older systems, a second SIGCHLD delivered
when there's already one pending delivery gets, lost same as any other
signal.

Then someone - POSIX? SVR4? I don't know - decided to invent a flavour
of signal that's more like writes to a pipe: multiple of them can be
pending at once.  Some systems decided this was sane and implemented
it.

Personally, I don't like it; I think signals should be much like
hardware interrupts in that a second instance happening before the
first is serviced gets silently merged.


While we're on this topic. Unix signals don't exactly work like hardware 
interrupts anyhow, I suspect, and it's a thing that have constantly 
befuddled me. As far as I can tell, there is a problematic race 
condition in the old signal mechanism, and that is the reason (I 
believe) why the new semantics were introduced).


The problem goes like this:

You have two child processes. One exit, and you get into your signal 
handler. In there you then call wait to reap the child and process 
things. You then call wait again, and repeat until there are no children 
left to reap, as you only get one signal, even if you get multiple 
children that exits. When no more unreaped children exist, you exit the 
signal handler, and a new signal can be delivered.


However, what happens if the second child exists between the call to 
wait, and the exit from the signal handler? It would seem the signal 
would get lost, since we are in the process of handling the signal, and 
a new signal is not delivered during this time.


In real hardware this usually don't happen, because the actual interrupt 
request can be reissued by the device while you are in the interrupt 
handler. There are some hardware interrupt designs, with edge triggered 
interrupts, where similar problems can exist, and those you have to be 
very careful with how you handle them so you don't get to the same kind 
of race condition.


Now, have I misunderstood something about how non-queued signal handling 
works, or is/was there a problem there?


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


re: pmap_activate() with non-curlwp?

2020-08-16 Thread matthew green
Jason Thorpe writes:
> From my reading of the code, it seems that there are no longer any
> circumstances where pmap_activate() will be called with non-curlwp, at
> least in MI code.
> 
> Is this a correct reading?

seems right, and only vax has one MD caller that appears to
not be curlwp but soon-to-be-curlwp.


.mrg.