Re: wait(2) and SIGCHLD
>>> but isn't what's supposed to happen when a child's parent is >>> ignoring SIGCHLD - the child should skip zombie state, and simply >>> be cleaned up. >> And how is "reparent to init" not an acceptable means of >> implementing that? > Acceptable or not, it would seem to not match our own documentation. Point. That manpage wording should be updated a little. >> I thought I'd seen some code that rendered init immune to SIGKILL >> and possibly SIGSTOP too [...] > SIGSTOP is one of two signals that a process supposedly should not be > able to intercept. Of course, init is special enough that normal > rules might not apply... Yes, the code I was thinking of was inside the kernel, where of course rules like that apply only insofar as the code chooses to let them. >> Right, they shouldn't be. But init shouldn't be stopped, either. >> Similarly, I think it should be impossible to ptrace init, [...] > How special do one really want init to be? As special as it needs to be. I'm not as confident now as I was when I wrote that that ptracing init should be impossible. I do think it should be possible to configure a system such that it's impossible, and that that should be the default. But, as someone who routinely goes under the hood, I think it could be very useful to be able to set a system up so that it's possible. As a data point: I booted a scratch system (4.0.1, because that's all I have on the most convenient scratch hardware), and neither "kill -STOP 1" nor "kill -KILL 1" had any effect visible to ps ax. I don't know where/how they're getting stopped, but they are. Mouse
Re: wait(2) and SIGCHLD
On 2020-08-16 21:17, Mouse wrote: They don't vanish, they get reparented to init(8) which then wakes up and reaps them. That probably would work, approximately, Well, it does work, to at least a first approximation. but isn't what's supposed to happen when a child's parent is ignoring SIGCHLD - the child should skip zombie state, and simply be cleaned up. And how is "reparent to init" not an acceptable means of implementing that? Acceptable or not, it would seem to not match our own documentation. From the sigaction() man-page: SA_NOCLDWAIT If set, the system will not create a zombie when the child exits, but the child process will be automatically waited for. The same effect can be achieved by setting the signal handler for SIGCHLD to SIG_IGN. The difference would be detectable if init were sent a SIGSTOP (assuming that isn't one which would cause a system panic) I don't think it would panic, but I think that, if it really does stop init, it's a bug that it does so. I thought I'd seen some code that rendered init immune to SIGKILL and possibly SIGSTOP too (maybe by forcing them into init's blocked-signals set? I forget). But I can't seem to find it now. SIGSTOP is one of two signals that a process supposedly should not be able to intercept. Of course, init is special enough that normal rules might not apply... so it would stop reaping children (temporarily) - processes of the type in question should not be showing up as zombies. Right, they shouldn't be. But init shouldn't be stopped, either. Similarly, I think it should be impossible to ptrace init, and I have a fuzzy memory that it was on at least one system I tried it on. I'll be poking around a bit more. How special do one really want init to be? Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: SIGCHLD and sigaction()
>> I don't understand what problem queued SIGCHLD was invented to >> address. > My impression is that it allows you to get notified of state changes > of your child processes. If one signal could annonce several state > changes, how would you know what these state changes are? You'd call wait4(2) (or waitpid or wait3) with WNOHANG until it returned 0 (or returned -1 with ECHILD), collecting one child status change each time. You'd still need to do more or less the same with queued SIGCHLD; the only difference is it would let you skip the WNOHANG and call it exactly once per signal. (Unless a single child changed state twice, such as by being stopped and then killed, in which case the implementation had better queue wait stati as well or you'll be calling wait too often!) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: SIGCHLD and sigaction()
> I don't understand what problem queued SIGCHLD was invented to address. My impression is that it allows you to get notified of state changes of your child processes. If one signal could annonce several state changes, how would you know what these state changes are?
Re: wait(2) and SIGCHLD
>> They don't vanish, they get reparented to init(8) which then wakes >> up and reaps them. > That probably would work, approximately, Well, it does work, to at least a first approximation. > but isn't what's supposed to happen when a child's parent is ignoring > SIGCHLD - the child should skip zombie state, and simply be cleaned > up. And how is "reparent to init" not an acceptable means of implementing that? > The difference would be detectable if init were sent a SIGSTOP > (assuming that isn't one which would cause a system panic) I don't think it would panic, but I think that, if it really does stop init, it's a bug that it does so. I thought I'd seen some code that rendered init immune to SIGKILL and possibly SIGSTOP too (maybe by forcing them into init's blocked-signals set? I forget). But I can't seem to find it now. > so it would stop reaping children (temporarily) - processes of the > type in question should not be showing up as zombies. Right, they shouldn't be. But init shouldn't be stopped, either. Similarly, I think it should be impossible to ptrace init, and I have a fuzzy memory that it was on at least one system I tried it on. I'll be poking around a bit more. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: wait(2) and SIGCHLD
In article <28808.1597602...@jinx.noi.kre.to>, Robert Elz wrote: >Date:Sun, 16 Aug 2020 16:13:57 - (UTC) >From:chris...@astron.com (Christos Zoulas) >Message-ID: > > | They don't vanish, they get reparented to init(8) which then wakes up > | and reaps them. > >That probably would work, approximately, but isn't what's supposed to >happen when a child's parent is ignoring SIGCHLD - the child should >skip zombie state, and simply be cleaned up. > >The difference would be detectable if init were sent a SIGSTOP >(assuming that isn't one which would cause a system panic) >so it would stop reaping children (temporarily) - processes of >the type in question should not be showing up as zombies. FreeBSD does what we do (reparent to init). Linux has autoreap which moves the state of the process to DEAD without going through ZOMBIE and adds it to the dead queue. christos
Re: wait(2) and SIGCHLD
Date:Sun, 16 Aug 2020 16:13:57 - (UTC) From:chris...@astron.com (Christos Zoulas) Message-ID: | They don't vanish, they get reparented to init(8) which then wakes up | and reaps them. That probably would work, approximately, but isn't what's supposed to happen when a child's parent is ignoring SIGCHLD - the child should skip zombie state, and simply be cleaned up. The difference would be detectable if init were sent a SIGSTOP (assuming that isn't one which would cause a system panic) so it would stop reaping children (temporarily) - processes of the type in question should not be showing up as zombies. kre
Re: wait(2) and SIGCHLD
In article <5919.1597441...@jinx.noi.kre.to>, Robert Elz wrote: >Date:Fri, 14 Aug 2020 20:01:18 +0200 >From:Edgar =?iso-8859-1?B?RnXf?= >Message-ID: <20200814180117.gq61...@trav.math.uni-bonn.de> > > | 3. I don't see where POSIX defines or allows this, but given 2., I'm surely > |missing something. > >It is specified to work this way in POSIX, though right now I don't >have the time to go dig out exactly where. > >Setting SIGCHLD to SIG_IGN effectively means that you want to ignore >your children - they then don't report any exit status to their parent, >but simply vanish when they exit. Thus when the parent does a wait() >it has no children, and gets ECHLD. They don't vanish, they get reparented to init(8) which then wakes up and reaps them. >Leave (or set) SIGCHLD to SIG_DFL and you don't get signals, but child >processes do report status to their parent. Catch SIGCHLD and you'll >get signalled when a child exits (I'm not sure if NetBSD guarantees one >signal delivery for each exited child or just a signal if there are >some unspecified number of exited children). > >The actions on an ignored SIGCHLD is SysV inherited behaviour, >Bell Labs (v7/32V) and CSRG BSD systems didn't act this way. Yup, I edded this: 1.199(christos 30-Mar-05): #define P_CLDSIGIGN 0x0008 /* Process is ignoring SIGCHLD */ christos
Re: futexes
> On Aug 16, 2020, at 5:58 AM, Robert Swindells wrote: > > > Taylor R Campbell wrote: >>> Date: Sat, 15 Aug 2020 19:59:24 +0100 >>> From: Robert Swindells >>> >>> Is anyone working on the proposed solution to kern/55230 ? >> >> thorpej was working on it and has a patch -- I thought it got >> committed, but I guess not? There might have been some hard-to-fix >> bug remainining in it but I forget the details. > > Would it help to have more people testing and/or looking at it ? The fix is non-trivial, and requires a fundamental change to how futexes work. That said, I've made the change, and it fixes the unit test, and to exercise it more thoroughly, I also started converting several pthread locking interfaces to use futex beneath, while also enhancing the test cases for those pthread interfaces to verify proper priority ordering. However, the changes started to grow without bound and then I ran into a time crunch. What I should do is wind my code back to the basic "just fix the futex internals" change, and publish there so that others can work on it. The main sticking point is that Linux Java was getting stuck ... all of my unit tests were passing, however. This is why I started working on native users -- to provide more coverage and possibly make it easier to debug that problem. -- thorpej
Re: futexes
Taylor R Campbell wrote: >> Date: Sat, 15 Aug 2020 19:59:24 +0100 >> From: Robert Swindells >> >> Is anyone working on the proposed solution to kern/55230 ? > >thorpej was working on it and has a patch -- I thought it got >committed, but I guess not? There might have been some hard-to-fix >bug remainining in it but I forget the details. Would it help to have more people testing and/or looking at it ?
Re: SIGCHLD and sigaction()
>>> When I install a SIGCHLD handler via sigaction() using SA_SIGINFO, >>> is it guaranteed that my handler is called (at least) once per >>> death-of-a-child? >> "Maybe." It depends on how portable you want to be. >> [...] > While we're on this topic. Unix signals don't exactly work like > hardware interrupts anyhow, I suspect, and it's a thing that have > constantly befuddled me. (Caveat: this is all just my understanding. I think it's accurate, but I welcome corrections.) No, they don't. To the extent that they do, they work like (latched) edge-triggered interrupts. There is no signal analog to a level-triggered interrupt. And, as you say, there are some important differences even beyond that. > As far as I can tell, there is a problematic race condition in the > old signal mechanism, and that is the reason (I believe) why the new > semantics were introduced). I think you are right. But it's not just "old" and "new". There are really old signal semantics, where the handler is uninstalled, I think it is, when the signal is delivered, and it's up to the code to reinstall it. This is vulnerable to, if nothing else, the race where a second signal arrives before the handler gets reinstalled, though that's less important for SIGCHLD (see below). (There may be an even older form of signal handling, but if so I know nothing about it.) This then led to (so-called) reliable signals, where the handler stays installed (or at least can be set to do so) but signals can be blocked, and get blocked (or, again, at least can be set to be) when the handler is called. However, there is still only a single pending bit per signal. I think these came in sometime in the 4BSD era, but I don't recall details - indeed, I'm not sure I ever knew details. Turning SIGCHLD into something queued, something more like bytes in a pipe, is, in my perception, more recent yet. I'm not sure who invented them (but see below). There are traces of the first form in modern NetBSD, in the form of the SA_RESETHAND bit. I don't *think* that gives full unreliable-signal semantics, but I haven't checked in enough detail to be sure - I think you'd also need an SA_DONTBLOCK flag or something of the sort, or the handler would have to explicitly unblock the signal. > You have two child processes. One exit, and you get into your signal > handler. In there you then call wait to reap the child and process > things. You then call wait again, and repeat until there are no > children left to reap, as you only get one signal, even if you get > multiple children that exits. When no more unreaped children exist, > you exit the signal handler, and a new signal can be delivered. Right. > However, what happens if the second child exists between the call to > wait, and the exit from the signal handler? It would seem the signal > would get lost, since we are in the process of handling the signal, > and a new signal is not delivered during this time. If you're using unreliable signals - the first sort I outlined above - then yes, there is either this race, or, if you reinstall the handler before you do your last wait call, a different race. (Provided your handler doesn't mind being called recursively between reinstall and return, this may be tolerable.) If you're using reliable signals but without queued SIGCHLD, this is not a problem, because the second SIGCHLD (and any additional later SIGCHLDs) will set the pending bit for SIGCHLD. As soon as you return from the handler (or explicitly unblock the signal), it will be delivered. > Now, have I misunderstood something about how non-queued signal > handling works, or is/was there a problem there? There was, but it was a problem with the oldest of the above three kinds of signal. Queued SIGCHLD was not necessary for it - I don't understand what problem queued SIGCHLD was invented to address. The only thing I can think of was that it came from SRV4, which had a NIH attitude towards BSD reliable signals, so they invented queued SIGCHLD to get reliable child death handling, and then someone (POSIX maybe?) decided it would be good to include all of both mechanisms. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: SIGCHLD and sigaction()
On 2020-08-16 12:49, Johnny Billquist wrote: On 2020-08-15 22:46, Mouse wrote: When I install a SIGCHLD handler via sigaction() using SA_SIGINFO, is it guaranteed that my handler is called (at least) once per death-of-a-child? "Maybe." It depends on how portable you want to be. Historically, "no": in some older systems, a second SIGCHLD delivered when there's already one pending delivery gets, lost same as any other signal. Then someone - POSIX? SVR4? I don't know - decided to invent a flavour of signal that's more like writes to a pipe: multiple of them can be pending at once. Some systems decided this was sane and implemented it. Personally, I don't like it; I think signals should be much like hardware interrupts in that a second instance happening before the first is serviced gets silently merged. While we're on this topic. Unix signals don't exactly work like hardware interrupts anyhow, I suspect, and it's a thing that have constantly befuddled me. As far as I can tell, there is a problematic race condition in the old signal mechanism, and that is the reason (I believe) why the new semantics were introduced). The problem goes like this: You have two child processes. One exit, and you get into your signal handler. In there you then call wait to reap the child and process things. You then call wait again, and repeat until there are no children left to reap, as you only get one signal, even if you get multiple children that exits. When no more unreaped children exist, you exit the signal handler, and a new signal can be delivered. However, what happens if the second child exists between the call to wait, and the exit from the signal handler? It would seem the signal would get lost, since we are in the process of handling the signal, and a new signal is not delivered during this time. In real hardware this usually don't happen, because the actual interrupt request can be reissued by the device while you are in the interrupt handler. There are some hardware interrupt designs, with edge triggered interrupts, where similar problems can exist, and those you have to be very careful with how you handle them so you don't get to the same kind of race condition. Now, have I misunderstood something about how non-queued signal handling works, or is/was there a problem there? Reading the current documentation, I would assume that at the call to the signal handler, the signal is blocked, and also removed from pending signals, so a new even would queue up a new signal to be delivered when returning from the signal handler. However, the text above is from trying to recall how it used to be going back in time, to when you had to re-install the signal handler after each activation. I can't seem to find documentation for how it worked back in the day. I can't even remember when/where I was reading up on that and thinking there might be a problem here, but it was a long time ago. So this is possibly just of historical interest. By the way, I haven't seen any explicit mention of the pending signal being cleared at signal handler entry, so that is just my assumption right now. If that is wrong, then I would expect there is a race condition in there. Maybe someone else knows where that detail is documented? Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
Re: SIGCHLD and sigaction()
On 2020-08-15 22:46, Mouse wrote: When I install a SIGCHLD handler via sigaction() using SA_SIGINFO, is it guaranteed that my handler is called (at least) once per death-of-a-child? "Maybe." It depends on how portable you want to be. Historically, "no": in some older systems, a second SIGCHLD delivered when there's already one pending delivery gets, lost same as any other signal. Then someone - POSIX? SVR4? I don't know - decided to invent a flavour of signal that's more like writes to a pipe: multiple of them can be pending at once. Some systems decided this was sane and implemented it. Personally, I don't like it; I think signals should be much like hardware interrupts in that a second instance happening before the first is serviced gets silently merged. While we're on this topic. Unix signals don't exactly work like hardware interrupts anyhow, I suspect, and it's a thing that have constantly befuddled me. As far as I can tell, there is a problematic race condition in the old signal mechanism, and that is the reason (I believe) why the new semantics were introduced). The problem goes like this: You have two child processes. One exit, and you get into your signal handler. In there you then call wait to reap the child and process things. You then call wait again, and repeat until there are no children left to reap, as you only get one signal, even if you get multiple children that exits. When no more unreaped children exist, you exit the signal handler, and a new signal can be delivered. However, what happens if the second child exists between the call to wait, and the exit from the signal handler? It would seem the signal would get lost, since we are in the process of handling the signal, and a new signal is not delivered during this time. In real hardware this usually don't happen, because the actual interrupt request can be reissued by the device while you are in the interrupt handler. There are some hardware interrupt designs, with edge triggered interrupts, where similar problems can exist, and those you have to be very careful with how you handle them so you don't get to the same kind of race condition. Now, have I misunderstood something about how non-queued signal handling works, or is/was there a problem there? Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: b...@softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol
re: pmap_activate() with non-curlwp?
Jason Thorpe writes: > From my reading of the code, it seems that there are no longer any > circumstances where pmap_activate() will be called with non-curlwp, at > least in MI code. > > Is this a correct reading? seems right, and only vax has one MD caller that appears to not be curlwp but soon-to-be-curlwp. .mrg.