Re: "wait" loses signals
> Date:Mon, 24 Feb 2020 06:44:12 -0800 > From:"Daniel Colascione" > Message-ID: > > | That executing traps except in case you lose one rare race is > painfully > | obvious. > > Maybe you misunderstand the issue, no traps are lost, if they were > that would indeed be a bug, the trap will always be executed in the > cases in question, the only issue is when that happens. They're not executed before the wait as is supposed to happen though, so we can hang when we shouldn't. > | This opposition to doing more than the bare minimum that the standard > | requires makes this task all the much harder. > > I am not at all opposed to doing more than the standard requires, the > shell I maintain does more (not nearly as many addons as bash, but > considerably more than bash - and in some areas we're ahead, we already > have a wait command where it is possible to wait for any one of a set > of processes (or jobs) and be told which one completed, for example). > > I'm also not opposed to doing less when the standard is nonsense, which > it is in a couple of places. > > But "I want x" or "I think it should be y" aren't good enough reasons to > change something, and making the shell useful for (very primitive) IPC > isn't a good reason for making updates. Yes, it is, because people find this style of IPC useful today, and it's worthwhile to make this use reliable. > | Making people go elsewhere *on purpose* by refusing to fix bugs is not > | good software engineering. > > Of course. I don't see a bug. You can interpret any random bit of brokenness as a feature. Whether the behavior is a "bug" or not is irrelevant: bash _should_ be handling these traps as early as possible, because that simplifies the programming model without hurting anything else. > | We're talking about fixing an existing shell feature, not adding a new > one. > > OK, here's an alternative, I want the shell to be able to do arithmetic on > arbitrarily large (and small) numbers. All that is needed to fix it is > to link in the bignum library and use it (and extend the parser a little > to > handle real numbers). This situation is more like bash supporting arbitrary-precision addition and giving the wrong answer when the number is prime. "Oh, we never promised support for _prime_ sums. It's not a bug. It's just a thing the shell doesn't do." > | This moralistic outlook is not helpful. It doesn't *matter* whether a > | program is right or wrong or making unjustified assumptions or not. > > That is unbelievable. That is all that matters. If the program is > wrong, the program needs to be fixed, not the world altered so that the > program suddely works. You want to increase the number of correct programs in the world. Sometimes the fix is to declare incorrect programs broken and have people fix them. Other times, in situations like this one, it's better to just change the infrastructure so that the program is correct. > > | Punishing programs does not make the world does not make the world > better. > > It does. The bad ones fail, and are replaced by better ones. Computer security was even more of a horrible nightmare than it is today back when people had this attitude. "Why should we use stack hardening? If a program writes beyond the end of an array, that's a bug in the program." Nice sentiment. Doesn't work.
Re: "wait" loses signals
Date:Mon, 24 Feb 2020 06:44:12 -0800 From:"Daniel Colascione" Message-ID: | That executing traps except in case you lose one rare race is painfully | obvious. Maybe you misunderstand the issue, no traps are lost, if they were that would indeed be a bug, the trap will always be executed in the cases in question, the only issue is when that happens. | I refuse to let the standard cap the quality of a shell's implementation. So you should. No-one is suggesting that there is any reason that any shell cannot do this better, if the authors feel the cost trade off is worth the benefit. | Missing signals [...] Since this appears to be based upon a misunderstanding, I will ignore that. | A standard is a bare minimum. That's close enough to correct. | This opposition to doing more than the bare minimum that the standard | requires makes this task all the much harder. I am not at all opposed to doing more than the standard requires, the shell I maintain does more (not nearly as many addons as bash, but considerably more than bash - and in some areas we're ahead, we already have a wait command where it is possible to wait for any one of a set of processes (or jobs) and be told which one completed, for example). I'm also not opposed to doing less when the standard is nonsense, which it is in a couple of places. But "I want x" or "I think it should be y" aren't good enough reasons to change something, and making the shell useful for (very primitive) IPC isn't a good reason for making updates. | Making people go elsewhere *on purpose* by refusing to fix bugs is not | good software engineering. Of course. I don't see a bug. | We're talking about fixing an existing shell feature, not adding a new one. OK, here's an alternative, I want the shell to be able to do arithmetic on arbitrarily large (and small) numbers. All that is needed to fix it is to link in the bignum library and use it (and extend the parser a little to handle real numbers). Can I call it a bug that bash only does arithmetic on integers, and has a limit on their size (64 bits I believe), and demand that Chet fix it?Know that I am perfectly aware that the standard doesn't require what I want, but remember that is the bare minimum, we can do better (bash already does, 32 bits is all that is required, as I remember). | This moralistic outlook is not helpful. It doesn't *matter* whether a | program is right or wrong or making unjustified assumptions or not. That is unbelievable. That is all that matters. If the program is wrong, the program needs to be fixed, not the world altered so that the program suddely works. | Punishing programs does not make the world does not make the world better. It does. The bad ones fail, and are replaced by better ones. kre
Re: "wait" loses signals
On 24/02/2020 08:59, Robert Elz wrote: har...@gigawatt.nl said: | In the same way, I think that except when overridden by 2.11, the "when" | in "Otherwise, the argument action shall be read and executed by the | shell when one of the corresponding conditions arises." should be | interpreted as "as soon as". The only way to do that literally would be to run the trap from the signal handler, as that is "as soon as" the condition arises. But I think we all know that is simply not possible. So let's read that as "as soon as possible after" instead. Sure. That's getting more reasonable, but someone needs to decide just what is possible - will running the trap handler mess up the shell's internal state while a new command is parsed and executed? Eg: what if we had VAR=$(grep -c some_string file*.c) and a (trapped) signal arrives while grep is running (more correctly, while the process running the command substitution, which runs grep, is running). We know we cannot interrupt the wait for that foreground process to run the trap handler, so we don't - but do we execute the trap handler before we assign the answer to VAR ? Although 2.11 that you referred to states "When a signal for which a trap has been set is received while the shell is waiting for the completion of a utility executing a foreground command", that is not what any shell implements. Instead, what shells implement is more like "while the shell is waiting for the completion of a foreground command". Consider for instance (sleep 5): the sleep command run in a subshell. The parent shell is not waiting for the completion of a utility executing a foreground command, the parent shell is waiting for the completion of the subshell, which is not a utility. Nevertheless, shells do not run any trap action until after the subshell has completed. This is just sloppy wording in the standard. It is probably written this way so that it is clear that given { foo; bar; }, if a signal is received while foo is running, any trap action runs before bar. The whole compound command shouldn't be considered the foreground command, only foo should be. In your example, I would expect the whole of VAR=$(...) to be considered the foreground command that the shell is waiting for, and that is what almost all shells do. A notable exception is zsh. This kind of thing is why shells in general only normally even look to see if there is a trap handler waiting to run after completing executing commands, not in the middle of one. The relevance of this is that if a signal arrives while the wait command is executing (or as Chet suggested, while doing whatever housekeeping is needed to prepare to run it, like looking to see what command comes next) but before the relevant wait*() system call is running, the trap won't be run until after the wait command completes. That's the way shells have always worked, and the way the standard (for that very reason) says can be relied upon by scripts - which is much of its purpose, to tell script writers what they can expect will work, and what will not necessarily work. You say "have always worked", but I'd like to point out that this whole thing started because I was looking at code that Herbert Xu had changed in dash to avoid this race back in 2009. That's over 10 years ago now. The behaviour of dash before that, and several shells now, can not, or at least not now, be said to be how shells have always worked. Cheers, Harald van Dijk
Re: "wait" loses signals
On 2/24/20 5:18 PM, Chet Ramey wrote: The first case is trickier: there's always going to be a window between the time the shell checks for pending traps and the time the wait builtin starts to run. You can't really close it unless you're willing to run the trap out of the signal handler, which everyone agrees is a bad idea, but you can squeeze it down to practially nothing. dash uses something along these lines: sigfillset(&mask); sigprocmask(SIG_SETMASK, &mask, &mask); while (!pending_sig) sigsuspend(&mask); sigprocmask(SIG_SETMASK, &mask, NULL); if (pending_sig) handle_signals(pending_sig); pid = waitpid(... WNOHANG); It sleeps in sigsuspend(), not in waitpid(). This way we wait for both signals *and* children (by virtue of getting SIGCHLD for them).
Re: "wait" loses signals
On 2/24/20 7:58 AM, Daniel Colascione wrote: > No, it's not that much trouble to fix the bug. The techniques for fixing > this kind of signal race are well-known. In particular, instead of > waitpid, you use a self-pipe and signal the pipe in the signal handler, > and you have a signal handler for SIGCHLD. You've just substituted a real IPC mechanism (pipes) for the one people are trying to make signals into. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: "wait" loses signals
On 2/24/20 3:59 AM, Robert Elz wrote: > The relevance of this is that if a signal arrives while the wait command > is executing (or as Chet suggested, while doing whatever housekeeping is > needed to prepare to run it, like looking to see what command comes next) > but before the relevant wait*() system call is running, the trap won't > be run until after the wait command completes. There are two separate cases here: if the signal arrives before the wait command has begun executing (during `housekeeping') or if it arrives after the wait command has begun running but before it calls whatever system call it uses to wait for children. The second case is relatively easy to solve; Jilles wrote a message detailing the alternatives. Bash uses the longjmp-out-of-the-trap-signal- handler mechanism. The trap handler only has to know that the wait builtin is running and that there's a valid saved environment to longjmp to. The first case is trickier: there's always going to be a window between the time the shell checks for pending traps and the time the wait builtin starts to run. You can't really close it unless you're willing to run the trap out of the signal handler, which everyone agrees is a bad idea, but you can squeeze it down to practially nothing. I think I've got a way to close that and make signals that arrive in that first case act as if they arrived `while the shell is waiting by means of the wait utility'. It's not much code and not disruptive. With that, bash runs the original test script (100,000 iterations) on RHEL7 and macOS without a `stray' sleep. It's in the git devel branch. I'm going to defer the question of whether or not that's the `right' thing to do -- people have been trying to make signals into an IPC mechanism since Berkeley introduced `reliable signals'. Can we all take a breath now? -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: "wait" loses signals
> Date:Mon, 24 Feb 2020 04:58:31 -0800 > From:"Daniel Colascione" > Message-ID: <07d1441d41280e6f9535048d6485.squir...@dancol.org> > > | That is a poor excuse for not fixing bugs. > > Only if they are bugs. That executing traps except in case you lose one rare race is painfully obvious. > > | Maybe you can torture the standards into confessing that this > | behavior isn't a bug. > > No torture required. Once again, the standard documents the way users > can expect shells to behave. I refuse to let the standard cap the quality of a shell's implementation. Missing signals this way is pure negative. It doesn't add to any capability or help any user. It can only make computing unreliable and hurt real users trying to automate things with shell. > That is what a standard is - a common set > of agreed operations A standard is a bare minimum. > attempt to get shells to change this way of > working, and if you can get a suitable set to agree, and implement > something > new, that more meets your needs, then perhaps that might one day become > the > standard, This opposition to doing more than the bare minimum that the standard requires makes this task all the much harder. > | This behavior nevertheless surprises people > > Lots of things surprise people. Sometimes people deserve to be surprised. This isn't one of those times. > | and nevertheless precludes various things > | people want to do with a shell. > > That was my point, that you just labelled a poor excuse. Not everything > is suitable for implementation in sh. Sometimes you simply have to go > elsewhere. Making people go elsewhere *on purpose* by refusing to fix bugs is not good software engineering. > Wanting to do it in shell doesn't make it reasonable or > possible. It is reasonable and possible. All that's needed is to make an existing operation that's almost perfectly reliable in fact perfectly reliable, and as I've mentioned, it's not that hard. > I want the shell to feed my dog, where is the dogfood option? We're talking about fixing an existing shell feature, not adding a new one. > | Don't you think it's better that programs > | work reliably than that they don't? > > Yes, when they are written correctly. By fixing this bug, we make a class of programs correct automatically. > > | Of course something working intuitively 99.9% of the time and > | hanging 0.1% of the time is a bug. > > Nonsense. An alternative explanation is that your intuition is wrong, > and that it often works that way is just by chance. We're talking about a documented feature that users expect to work a certain way and that almost always *does* work that way and that diverges from this behavior only under rare circumstances. Not the same as spacebar heating. > The program is > broken because it is making unjustified assumptions about how things are > specified to work. This moralistic outlook is not helpful. It doesn't *matter* whether a program is right or wrong or making unjustified assumptions or not. Punishing programs does not make the world does not make the world better. When a piece of infrastructure can transform these programs from incorrect to correct at next to zero cost, it behooves that infrastructure component to do that. > This is the kind of common error that people who > program (in any language) by guesswork often make "I saw Fred did this, > and I tried it, and it worked for me like I thought it would, so it > must do this similar thing like I think it will too". Rubbish. Ever hear of the "pit of success"? It's the idea that software gets better when you make the intuitive thing happen to be the correct thing. Why should we require a degree of cleverness greater than what a domain requires? Why *not* make it so that, to the greatest extent possible, shouldn't we let "I saw Fred do this" lead people to good patterns? Like I said before, making things difficult on purpose doesn't actually achieve anything. [1] https://docs.microsoft.com/en-us/archive/blogs/brada/the-pit-of-success > > | I've never understood the philosophy of people who want to leave > | bugs unfixed. > > Nor have I, except sometimes perhaps when it comes to costs. But the > issue here is whether this is a bug. Your belief that it is does not make > it so. Your belief that this behavior is acceptable doesn't make it so --- except under a pointlessly literal interpretation of the standards. > | No, it's not that much trouble to fix the bug. > > It isn't, if it needs fixing - but any fix for this will slow the shell > (for what that matters, but some people care). Further there are simpler > cheaper techniques than the one described. The fix for this issue will not meaningfully affect the speed of the shell. Instead of waiting on waitpid directly, we wait on a pipe. Plenty of programs do this already. Micro-optimizing for system call count will hardly slow the shell: other factors matter a lot
Re: "wait" loses signals
Date:Mon, 24 Feb 2020 04:58:31 -0800 From:"Daniel Colascione" Message-ID: <07d1441d41280e6f9535048d6485.squir...@dancol.org> | That is a poor excuse for not fixing bugs. Only if they are bugs. | Maybe you can torture the standards into confessing that this | behavior isn't a bug. No torture required. Once again, the standard documents the way users can expect shells to behave. That is what a standard is - a common set of agreed operations (or whatever is apporpriate for the object being standardised). It does not (or should not) ever invent new stuff and require it. Shells have always worked this way, so that is how the standard is written - that is what users can expect to happen (that is why it is called a "standard" after all). Once again, you are free to attempt to get shells to change this way of working, and if you can get a suitable set to agree, and implement something new, that more meets your needs, then perhaps that might one day become the standard, and later appear in the standards document. New and/or changed features to happen, expecially when they don't break backwards compatibility, which this wouldn't. | This behavior nevertheless surprises people Lots of things surprise people. | and nevertheless precludes various things | people want to do with a shell. That was my point, that you just labelled a poor excuse. Not everything is suitable for implementation in sh. Sometimes you simply have to go elsewhere. Wanting to do it in shell doesn't make it reasonable or possible. I want the shell to feed my dog, where is the dogfood option? | Don't you think it's better that programs | work reliably than that they don't? Yes, when they are written correctly. | Of course something working intuitively 99.9% of the time and | hanging 0.1% of the time is a bug. Nonsense. An alternative explanation is that your intuition is wrong, and that it often works that way is just by chance. The program is broken because it is making unjustified assumptions about how things are specified to work. This is the kind of common error that people who program (in any language) by guesswork often make "I saw Fred did this, and I tried it, and it worked for me like I thought it would, so it must do this similar thing like I think it will too". Rubbish. | I've never understood the philosophy of people who want to leave | bugs unfixed. Nor have I, except sometimes perhaps when it comes to costs. But the issue here is whether this is a bug. Your belief that it is does not make it so. | No, it's not that much trouble to fix the bug. It isn't, if it needs fixing - but any fix for this will slow the shell (for what that matters, but some people care). Further there are simpler cheaper techniques than the one described. | If we had a pwaitpid (like pselect) we could use that too. Yes, if. If that existed a fix would be almost cost free. If. I suspect that before you can get bash (note: I am no authority and have no voice in these decisions, I work on a different shell) to make use of something like that it would need to be implemented in quite a lot of systems, including the commercial ones, which tend to be very conservative about adding new features for fun. kre
Re: "wait" loses signals
> There are lots of programming languages around, they each have their > particular > niche - the reason their inventors created them in the first place. Use > an > appropriate one, rather than attempting to shoehorn some feature that is > needed > into a language that was never intended for it - just because you happen > to > be a big fan of that language. Spread your wings, learn a new one That is a poor excuse for not fixing bugs. Maybe you can torture the standards into confessing that this behavior isn't a bug. This behavior nevertheless surprises people and nevertheless precludes various things people want to do with a shell. Don't you think it's better that programs work reliably than that they don't? Of course something working intuitively 99.9% of the time and hanging 0.1% of the time is a bug. It's not appropriate to treat that 0.1% hang as some kind of cosmic punishment for using shell in a manner you find inappropriate: remember when we believed in mechanism, not policy? Nor is the presence of the bug in other shells adequate justification for leaving this one in a bad state. I've never understood the philosophy of people who want to leave bugs unfixed. No, it's not that much trouble to fix the bug. The techniques for fixing this kind of signal race are well-known. In particular, instead of waitpid, you use a self-pipe and signal the pipe in the signal handler, and you have a signal handler for SIGCHLD. If we had a pwaitpid (like pselect) we could use that too. If I could get Chet to look at my patches, I'd fix it myself.
Re: "wait" loses signals
Date:Mon, 24 Feb 2020 11:50:55 +0100 From:Denys Vlasenko Message-ID: <47762f41-e393-30cd-50ed-43c6bdd29...@redhat.com> | This is racy. Even if you try to code is as tightly as possible: Absolutely, I agree. The question is more whether it really matters. | Standard does not say that. It says "when the shell is waiting for an | asynchronous command to complete", it does not say "when the shell is | waiting in a waitpid() syscall". That's because the standard has no notion of "system calls", just functions, but the shell is not actually waiting (it is doing something else) until the system call causes it to pause if the desired (or any) child is not ready for reaping. | Yes, you are right, you can argue that shell is minimally fulfilling | standard's requirement if it does something like my code example. It doesn't even need to do that. As I said, the standard's primary purpose is to advise script writers what they can depend upon the shell providing. And a race free wrt traps wait utility is not one of those things. That's because what scripts can rely upon is based upon what shells implement (or implemented at the time - with some more recent additions for some more modern functionality that has been widely adopted). Even now, as was demonstrated, most shells have this "issue" - hence the standard simply cannot tell users that they can rely on something else. Any attempt to read it otherwise than that is simply wrong, and obviously so (though sometimes it is possible to argue that the wording used does not express the intent obviously enough - or accoasionally - at all, but when that happens, all you will ever get as the best possible result is corrected wording that says what it intended to say in the first place). The standard also serves to advise shell authors what they need to do to provide a shell which should run all conformant shell applications, but it would be grossly unfair (and improper) to require of new shells something that old ones didn't do. But that side of it is less relevant to this discussion, except that it doesn't tell shell authors to make sure there are no race conditions wrt traps in the wait utility (it would do that in quite different language than this, but that would be the point, if it were there). | I am arguing that it can be made better: That part is arguable | it can be coded so that signal has no time window to arrive before | waitpid() but have its trap delayed to after "wait" builtin ends | (which might be "never", mind you). It can be so coded, but when done (correctly, and assuming a trapped signal has arrived) it won't be never, the signal will interrupt the sys call that actually pauses (which will most likely not be wait*() in this case, but that's irrelevant) and the wait would correctly exit. A few shells have done that. The question is whether it is worth going to that extra effort - or in other words, is it really better. As best I can tell, it only really matters to shell scripts attempting to use signals/traps as an IPC mechanism, and that I simply don't believe they should be doing - programs that need that kind of functionality should be written in a language that provides more suitable mechanisms (and usually not only for simple one bit message passing that a signal offers). There are lots of programming languages around, they each have their particular niche - the reason their inventors created them in the first place. Use an appropriate one, rather than attempting to shoehorn some feature that is needed into a language that was never intended for it - just because you happen to be a big fan of that language. Spread your wings, learn a new one - the hard part about any programming isn't the programming language, it is getting the desired concepts and structures straight - do that and any competent programmer can make a working program in any suitable language (ie: not expecting anyone to write an operating system in COBOL) fairly quickly. They'll make it better after they get used to the idioms of the language, but providing the method needed to solve the problem is known first (that's usually the hard part, for anything non trivial) the actual coding into a working, if not necessarily ideal, form is simple. kre
Re: "wait" loses signals
On 2/24/20 9:59 AM, Robert Elz wrote: And that is, when the wait/waitpid/wait3/wait4/waitid/wait6 (whatever the shell uses) system call returns EINTR, the wait utility exited with a status indicating it was interrupted by that signal (status > 128 means 128+SIGno) and runs the trap. This is racy. Even if you try to code is as tightly as possible: if (got_sigs) { handle signals } got_sigs = 0; pid = waitpid(...); /* without WNOHANG */ if (pid < 0 && errno == EINTR) { handle signals } since signals can be delivered not only while waitpid() syscall is in kernel, but also when we are only about to enter the kernel - and in this case, the shell's sighandler will set the flag variable, but then we enter the kernel *and sleep*. Because that is what shells actually did - the alternative being to simply restart the wait on EINTR like many other system calls that are interrupted by signals are conventionally restarted. Like it or not, that's what shells did, what most still do, and what the standard says must be done. Standard does not say that. It says "when the shell is waiting for an asynchronous command to complete", it does not say "when the shell is waiting in a waitpid() syscall". Yes, you are right, you can argue that shell is minimally fulfilling standard's requirement if it does something like my code example. I am arguing that it can be made better: it can be coded so that signal has no time window to arrive before waitpid() but have its trap delayed to after "wait" builtin ends (which might be "never", mind you).
Re: "wait" loses signals
Date:Fri, 21 Feb 2020 10:07:25 -0500 From:Chet Ramey Message-ID: | That's just not reasonable. You're saying signals that are received before | the wait builtin begins executing (say, while the command is being parsed, | or the shell is doing some other bookkeeping task) should be considered | to have arrived while the wait builtin is executing. I'm pretty sure that's | not consistent with the letter or the spirit of the standard. It quite clearly isn't consistent, what the standard says is: When the shell is waiting, by means of the wait utility, for asynchronous commands to complete, the reception of a signal for which a trap has been set shall cause the wait utility to return immediately with an exit status >128, immediately after which the trap associated with that signal shall be taken. Note: "when the shell us waiting for an asynchronous command to complete" (when that happens as a result of the user/script executing the wait utility) then ... What Denys is failing to realise, is that the standard describes what shells do (or more accurately perhaps, did, in the late 1980's or early 1990's) not what someone might want them to do. And that is, when the wait/waitpid/wait3/wait4/waitid/wait6 (whatever the shell uses) system call returns EINTR, the wait utility exited with a status indicating it was interrupted by that signal (status > 128 means 128+SIGno) and runs the trap. Because that is what shells actually did - the alternative being to simply restart the wait on EINTR like many other system calls that are interrupted by signals are conventionally restarted. Like it or not, that's what shells did, what most still do, and what the standard says must be done. Apart from that, and not interrupting a wait for a foreground process, the standard says very little about when traps should be run, and sorry Harald, but your "as soon as" from ... har...@gigawatt.nl said: | In the same way, I think that except when overridden by 2.11, the "when" | in "Otherwise, the argument action shall be read and executed by the | shell when one of the corresponding conditions arises." should be | interpreted as "as soon as". The only way to do that literally would be to run the trap from the signal handler, as that is "as soon as" the condition arises. But I think we all know that is simply not possible. So let's read that as "as soon as possible after" instead. That's getting more reasonable, but someone needs to decide just what is possible - will running the trap handler mess up the shell's internal state while a new command is parsed and executed? Eg: what if we had VAR=$(grep -c some_string file*.c) and a (trapped) signal arrives while grep is running (more correctly, while the process running the command substitution, which runs grep, is running). We know we cannot interrupt the wait for that foreground process to run the trap handler, so we don't - but do we execute the trap handler before we assign the answer to VAR ? This kind of thing is why shells in general only normally even look to see if there is a trap handler waiting to run after completing executing commands, not in the middle of one. The relevance of this is that if a signal arrives while the wait command is executing (or as Chet suggested, while doing whatever housekeeping is needed to prepare to run it, like looking to see what command comes next) but before the relevant wait*() system call is running, the trap won't be run until after the wait command completes. That's the way shells have always worked, and the way the standard (for that very reason) says can be relied upon by scripts - which is much of its purpose, to tell script writers what they can expect will work, and what will not necessarily work. Now the standard doesn't preclude a shell from looking for pending traps as frequently as it wants to, every second line of C code in the shell could be if (traps_pending) run_trap_handler(); But most shell authors (I believe) wouldn't consider that reasonable. The standard also doesn't preclude a shell from taking extra measures to push the arrival of a signal in the wait utility down to occur in the wait system call (or whatever replaces it). Old shells didn't do that, as there simply was no mechanism for that, and using SIGCHLD was always problematic because of its quite different implementation of different (now ancient) systems, hence we have what we have. The standard is not a legislature, and does not change the rules just because what is there doesn't look reasonable, or you don't like it. If you want things changed, convince the major shell maintainers that this race condition is something they should make their shell go slower to fix (because that's really all it takes on modern systems) and wait for them to comply. When most major shells (perhaps all major shells, and some of the others) have implemented what you want
Re: "wait" loses signals
On 2/21/20 4:07 PM, Chet Ramey wrote: On 2/21/20 9:44 AM, Denys Vlasenko wrote: Yes, and here we are "after command", specifically after "{...} &" command. Since we got a trapped signal, we must run its trap. Did you look at the scenario in my message? What scenario? The scenario in the message you replied to. As I said, there are just two possibilities: signal is received before the point when shell checks for received signals after "{...} &" command; or signal is received after that point, and thus signal is considered to be received "inside wait builtin". That's just not reasonable. Yes it is. You're saying signals that are received before the wait builtin begins executing (say, while the command is being parsed, or the shell is doing some other bookkeeping task) should be considered to have arrived while the wait builtin is executing. OF COURSE! How else do you think this can possibly be seen? I'm pretty sure that's not consistent with the letter or the spirit of the standard. IOW, you think that between "command 1 finished executing" and "command 2 starts executing" there can be sort of signal black hole time period, where signals can be simply ignored. Now *this* is just not reasonable, since this would make traps unreliable.
Re: "wait" loses signals
On 2/21/20 9:44 AM, Denys Vlasenko wrote: >>> Yes, and here we are "after command", specifically after "{...} &" command. >>> Since we got a trapped signal, we must run its trap. >> >> Did you look at the scenario in my message? > > What scenario? The scenario in the message you replied to. > As I said, there are just two possibilities: > signal is received before the point when shell checks for received > signals after "{...} &" command; > or signal is received after that point, and thus signal is > considered to be received "inside wait builtin". That's just not reasonable. You're saying signals that are received before the wait builtin begins executing (say, while the command is being parsed, or the shell is doing some other bookkeeping task) should be considered to have arrived while the wait builtin is executing. I'm pretty sure that's not consistent with the letter or the spirit of the standard. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: "wait" loses signals
On 2/20/20 4:27 PM, Chet Ramey wrote: On 2/20/20 3:02 AM, Denys Vlasenko wrote: On 2/19/20 9:30 PM, Chet Ramey wrote: On 2/19/20 5:29 AM, Denys Vlasenko wrote: A bug report from Harald van Dijk: test2.sh: trap 'kill $!; exit' TERM { kill $$; exec sleep 9; } & wait $! The above script ought exit quickly, and not leave a stray "sleep" child: (1) if "kill $$" signal is delivered before "wait", then TERM trap will kill the child, and exit. This strikes me as a shaky assumption, dependent on when the shell receives the SIGTERM and when it runs traps. The undisputable fact is that after shell forks a child to run the "{...} &" subshell, it will receive the SIGTERM signal. And since it has a trap for it, it should be run. (There's nothing in POSIX that says when pending traps are processed. Bash runs them after commands.) Yes, and here we are "after command", specifically after "{...} &" command. Since we got a trapped signal, we must run its trap. Did you look at the scenario in my message? What scenario? As I said, there are just two possibilities: signal is received before the point when shell checks for received signals after "{...} &" command; or signal is received after that point, and thus signal is considered to be received "inside wait builtin". In both cases, trap should be run. Keep in mind that you can't run the trap out of the signal handler. Yes, running anything remotely complex out of signal handlers is a bad idea: signals can arrive somewhere in the middle of stdio, or memory allocation, or something similarly critical. Reentering one of those can deadlock. Properly-written programs are careful to record signal reception in a flag variable, or a pipe, etc, then return from signal handler, and act on it later, not in a signal handler.
Re: "wait" loses signals
On 20/02/2020 15:55, Robert Elz wrote: Date:Thu, 20 Feb 2020 09:16:05 + From:Harald van Dijk Message-ID: | In that case, I think we can interpret the "when" in the description | of the trap command literally except when 2.11 overrides it. I think it should be interpreted just like its normal English usage, as in: when I win the lottery I am going to buy a Ferrari or I am going to buy a Ferrari when I win the lottery (which both say the same thing). These are both ambiguous statements. The meaning of both depends on context and emphasis, and because context and emphasis are missing in a standalone written sentence, we are left to infer it. The word order may lead to a different inference for the two sentences. It doesn't mean that the instant the lottery winnings arrive (tomorrow please!) I will be at the luxury imported car dealers, rather it states a pre-cpndition which will trigger an event which is to follow, sometime, thereafter. I can see at least three different meanings. A: Jake bought a Porsche when he won the lottery. When I win the lottery, I am going to buy a Ferrari. [if/after] A: What are you going to do when you win the lottery? B: When I win the lottery, I am going to buy a Ferrari. [as soon as] A: How come you have five Ferraris in your garage? B: When I win the lottery, I am going to buy a Ferrari. [whenever; said by someone who has already won the lottery five times] Thus When one of the correspomding conditions arrises (standards speak for "when a signal has been delivered") the argument action shall be read and executed... is "sometime after a signal has been delvered, run the trap action". Based on how the word is used elsewhere in the standard, I think the "as soon as" meaning is more likely here. Two random examples elsewhere from the standard: File Read, Write, and Creation When a file that does not exist is created, [...] 1. The user ID of the file shall be set to the effective user ID of the calling process. It would be absurd to claim that the user ID might be initially set to some completely unrelated user ID, and then changed to the effective user ID of the calling process some time later. 2.5.1 Positional Parameters [...] Positional parameters are initially assigned when the shell is invoked (see sh), [...] It would be equally absurd to claim that this allows sh -c 'echo $1' - hello to print a blank line because the initial assignment of the positional parameters may happen after the first expansion of $1. In the same way, I think that except when overridden by 2.11, the "when" in "Otherwise, the argument action shall be read and executed by the shell when one of the corresponding conditions arises." should be interpreted as "as soon as". Cheers, Harald van Dijk
Re: "wait" loses signals
Date:Thu, 20 Feb 2020 09:16:05 + From:Harald van Dijk Message-ID: | In that case, I think we can interpret the "when" in the description | of the trap command literally except when 2.11 overrides it. I think it should be interpreted just like its normal English usage, as in: when I win the lottery I am going to buy a Ferrari or I am going to buy a Ferrari when I win the lottery (which both say the same thing). It doesn't mean that the instant the lottery winnings arrive (tomorrow please!) I will be at the luxury imported car dealers, rather it states a pre-cpndition which will trigger an event which is to follow, sometime, thereafter. Thus When one of the correspomding conditions arrises (standards speak for "when a signal has been delivered") the argument action shall be read and executed... is "sometime after a signal has been delvered, run the trap action". kre
Re: "wait" loses signals
On 2/20/20 3:02 AM, Denys Vlasenko wrote: > On 2/19/20 9:30 PM, Chet Ramey wrote: >> On 2/19/20 5:29 AM, Denys Vlasenko wrote: >>> A bug report from Harald van Dijk: >>> >>> test2.sh: >>> trap 'kill $!; exit' TERM >>> { kill $$; exec sleep 9; } & >>> wait $! >>> >>> The above script ought exit quickly, and not leave a stray >>> "sleep" child: >>> (1) if "kill $$" signal is delivered before "wait", >>> then TERM trap will kill the child, and exit. >> >> This strikes me as a shaky assumption, dependent on when the shell receives >> the SIGTERM and when it runs traps. > > The undisputable fact is that after shell forks a child > to run the "{...} &" subshell, it will receive the SIGTERM signal. > > And since it has a trap for it, it should be run. > >> (There's nothing in POSIX that says >> when pending traps are processed. Bash runs them after commands.) > > Yes, and here we are "after command", specifically after "{...} &" command. > Since we got a trapped signal, we must run its trap. Did you look at the scenario in my message? Keep in mind that you can't run the trap out of the signal handler. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: "wait" loses signals
On 20/02/2020 01:55, Robert Elz wrote: Date:Wed, 19 Feb 2020 23:53:56 + From:Harald van Dijk Message-ID: <9b9d435b-3d2f-99bd-eb3d-4a676ce89...@gigawatt.nl> | POSIX says in the description of the trap command "Otherwise, the | argument action shall be read and executed by the shell when one of the | corresponding conditions arises." Because it says "when", not "after", | if interpreted literally, it does not even allow waiting until the | current command finishes executing. You need to look at XCU 2.11 not just the description of the trap command itself. Ah, thanks, that makes an exception for when the shell is waiting for a command to complete. It's the same as what bash documents. In that case, I think we can interpret the "when" in the description of the trap command literally except when 2.11 overrides it. Cheers, Harald van Dijk
Re: "wait" loses signals
On 2/19/20 9:30 PM, Chet Ramey wrote: On 2/19/20 5:29 AM, Denys Vlasenko wrote: A bug report from Harald van Dijk: test2.sh: trap 'kill $!; exit' TERM { kill $$; exec sleep 9; } & wait $! The above script ought exit quickly, and not leave a stray "sleep" child: (1) if "kill $$" signal is delivered before "wait", then TERM trap will kill the child, and exit. This strikes me as a shaky assumption, dependent on when the shell receives the SIGTERM and when it runs traps. The undisputable fact is that after shell forks a child to run the "{...} &" subshell, it will receive the SIGTERM signal. And since it has a trap for it, it should be run. (There's nothing in POSIX that says when pending traps are processed. Bash runs them after commands.) Yes, and here we are "after command", specifically after "{...} &" command. Since we got a trapped signal, we must run its trap.
Re: "wait" loses signals
Date:Wed, 19 Feb 2020 23:53:56 + From:Harald van Dijk Message-ID: <9b9d435b-3d2f-99bd-eb3d-4a676ce89...@gigawatt.nl> | POSIX says in the description of the trap command "Otherwise, the | argument action shall be read and executed by the shell when one of the | corresponding conditions arises." Because it says "when", not "after", | if interpreted literally, it does not even allow waiting until the | current command finishes executing. You need to look at XCU 2.11 not just the description of the trap command itself. kre
Re: "wait" loses signals
On 19/02/2020 20:30, Chet Ramey wrote: On 2/19/20 5:29 AM, Denys Vlasenko wrote: A bug report from Harald van Dijk: test2.sh: trap 'kill $!; exit' TERM { kill $$; exec sleep 9; } & wait $! The above script ought exit quickly, and not leave a stray "sleep" child: (1) if "kill $$" signal is delivered before "wait", then TERM trap will kill the child, and exit. This strikes me as a shaky assumption, dependent on when the shell receives the SIGTERM and when it runs traps. (There's nothing in POSIX that says when pending traps are processed. Bash runs them after commands.) The bash documentation says traps will not be executed until the command completes if it receives a signal while waiting for the command to complete, but it does not say the same for when it receives a signal before waiting for a command to complete. This may be an oversight in the documentation. POSIX says in the description of the trap command "Otherwise, the argument action shall be read and executed by the shell when one of the corresponding conditions arises." Because it says "when", not "after", if interpreted literally, it does not even allow waiting until the current command finishes executing. I realise that that is definitely not the way it is meant to be interpreted, but I am not sure what is. I consider the assumption that the test script is supposed to work a reasonable one, but it is possible that this is considered strictly a QoI issue. But to be clear, regardless of what POSIX requires, I was less concerned with prodding other shell authors into changing their shells and more with seeing what I can do in my shell. I want to have a shell that is capable of handling scripts like this, but it is fine with me if other shells do not share that as a goal. Thanks for looking into this despite your scepticism on the validity of the test. Your description of what happens in bash when this ends up sleeping probably applies to all shells that behave the same way. Cheers, Harald van Dijk
Re: "wait" loses signals
On 2/19/20 5:29 AM, Denys Vlasenko wrote: > A bug report from Harald van Dijk: > > test2.sh: > trap 'kill $!; exit' TERM > { kill $$; exec sleep 9; } & > wait $! > > The above script ought exit quickly, and not leave a stray > "sleep" child: > (1) if "kill $$" signal is delivered before "wait", > then TERM trap will kill the child, and exit. This strikes me as a shaky assumption, dependent on when the shell receives the SIGTERM and when it runs traps. (There's nothing in POSIX that says when pending traps are processed. Bash runs them after commands.) > (2) if "kill $$" signal is delivered to "wait", > it must be interrupted by the signal, > then TERM trap will kill the child, and exit. This is well-defined by POSIX. > > The helper to loop the above: > > test1.sh: > i=1 > while test "$i" -lt 10; do > echo "$i" > "$@" test2.sh > i=$((i + 1)) > done > > To run: sh test1.sh > > bash 4.4.23 fails pretty quickly: > > $ sh test1.sh bash > 1 > ... > 581 > _ It seems inherently racy. I ran this with a lightly-instrumented bash and discovered that signals that arrived when `wait' was running were always processed correctly and killed the process. There were a few times when the signal arrived while `wait' was not running, and some of these cases did not interrupt wait or cause trap execution. Consider this scenario. 1. Bash forks and starts the background process 2. The parent fork returns 3. The parent bash checks for traps, and finds none 4. SIGTERM arrives, the trap signal handler sets a `pending trap' flag for SIGTERM 5. The parent shell runs the `wait' builtin. 6. `wait' is not interrupted by a signal, runs to completion, and the trap runs The window for this is extremely small. I just ran the scripts on RHEL7 and had to go through the loop script multiple times before I saw the 9-second sleep. I saw it more often on Mac OS X, so the scheduler probably plays a role. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
"wait" loses signals
A bug report from Harald van Dijk: test2.sh: trap 'kill $!; exit' TERM { kill $$; exec sleep 9; } & wait $! The above script ought exit quickly, and not leave a stray "sleep" child: (1) if "kill $$" signal is delivered before "wait", then TERM trap will kill the child, and exit. (2) if "kill $$" signal is delivered to "wait", it must be interrupted by the signal, then TERM trap will kill the child, and exit. The helper to loop the above: test1.sh: i=1 while test "$i" -lt 10; do echo "$i" "$@" test2.sh i=$((i + 1)) done To run: sh test1.sh bash 4.4.23 fails pretty quickly: $ sh test1.sh bash 1 ... 581 _ Under strace, it seems that "wait" enters wait4() syscall and waits for the child. (The fact that the pause is 9 seconds is another hint).