Re: SIGINT handling during async functions
On 2/6/23 10:26 PM, Martin D Kealey wrote: By orthogonal, I meant these things should ideally be managed by separate controls: 1. ignoring signals (or not) 2. redirecting filedescriptors 3. immediately waiting on the process (or not) 4. creating new process groups 5. sending a signal to about-to-be orphaned children when the shell exits In particular I'm thinking of options along the lines of: nohup --no-redir --[block/default/keep]=[INT,QUIT,HUP,...] (exact names not important; hopefully --long-options are self-explanatory) I feel like this will turn into something like daemon(1), but if someone wants to take a shot -- using a new name, obviously -- let's talk about it. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: SIGINT handling during async functions
On Fri, 3 Feb 2023 at 07:17, Chet Ramey wrote: > On 1/28/23 5:56 AM, Martin D Kealey wrote: > > Firstly, let's just leave aside "POSIX requires this" for a bit. > Be that as it may, POSIX exists and this is a requirement. It's also how > other shells behave. > Of course. I'm only contemplating making changes in extended mode, not POSIX mode. > > I contend that it's inconsistent for the actions of "nohup" and "&" to > NOT > > be fully orthogonal. > > Maybe, but their historical behavior has always differed: `nohup' ignores > SIGHUP, and background processes ignore SIGINT/SIGQUIT. You could say those > are "fully orthogonal," setting aside the sometimes-confusing manipulation > of input and output FDs. Is the latter what you mean by orthogonality? > Sorry, nohup was a terrible way to illustrate this, since conflates other things that I wasn't considering; rather I meant something more along the lines of "some hypothetical command structured like nohup but which affects SIGINT & SIGQUIT instead of SIGHUP, and which doesn't redirect stdio" (and which doesn't make the job automatically backgrounded, though that already applies to nohup). By orthogonal, I meant these things should ideally be managed by separate controls: 1. ignoring signals (or not) 2. redirecting filedescriptors 3. immediately waiting on the process (or not) 4. creating new process groups 5. sending a signal to about-to-be orphaned children when the shell exits The problem, as I see it, is that there's no shell syntax that *only* does #3. Yes one could write a shell function, but it'd be pretty contorted, and waste a lot of effort for things that would be undone when using other features at the same time, leading to a situation where people would only bother to use it when they're *not *going to give explicit signal dispositions; so it's still not *practically* orthogonal. > In the meantime, « shopt -s background_without_magic » (*2) gets my vote, > > I don't see any advantage over the mechanism above. > The value proposition in making changes isn't that "this can't already be done *somehow*", but rather the unorthogonality of the current features is suboptimal language design, and poor for user understanding. Rather than a global shopt setting that stops "&" from blocking SIGINT & SIGQUIT, which I'll grant is a hard sell, perhaps an entirely new notation would be possible, using some combination of "&" with other punctuation that isn't already defined, such as "&;" or "&|". > along with incorporating « nohup » as a built-in (so that Bash can > > guarantee its behaviour, and add options to improve its internal > > orthogonality.). > > What guarantees would you like? I put those the wrong way round. To add extensions, and guarantee that they're available to every Bash script, nohup would have to be a built-in. In particular I'm thinking of options along the lines of: nohup --no-redir --[block/default/keep]=[INT,QUIT,HUP,...] (exact names not important; hopefully --long-options are self-explanatory) > *1: I have very occasionally had interactive single-user shell running on > > /dev/console, which doesn't appear to count as a tty because it doesn't > > respond to tcsetpgrp. > > Try running something in a Docker container; that doesn't guarantee a > controlling terminal. > That's a very good point, and I suspect it's for the same underlying reason: that inside the container, the "top" process has pid 1 or pgrp 0 or somesuch, and somewhere this is interpreted as "set the terminal so that it has no pgrp". -Martin
Re: SIGINT handling during async functions
On 1/28/23 5:56 AM, Martin D Kealey wrote: Firstly, let's just leave aside "POSIX requires this" for a bit. I know that the requirement is there, and I think it is one of those broken things that ought to have been dropped from POSIX, or at least reduced to optional rather than required. Be that as it may, POSIX exists and this is a requirement. It's also how other shells behave. On 1/21/23 7:55 AM, Tycho Kirchner wrote: > Please consider a script launching several commands in background > and waiting for their completion: (note these last 4 words; I suspect they exclude the "common case experience" of people who think that it's only natural to want to insulate new daemons from tty signals) Sure. I contend that it's inconsistent for the actions of "nohup" and "&" to NOT be fully orthogonal. Maybe, but their historical behavior has always differed: `nohup' ignores SIGHUP, and background processes ignore SIGINT/SIGQUIT. You could say those are "fully orthogonal," setting aside the sometimes-confusing manipulation of input and output FDs. Is the latter what you mean by orthogonality? And I contend that a daemon that unexpectedly dies is much more obvious than a bunch of internal processes that are unexpectedly left running; you have to proactively check for orphaned processes, and their continued action may cause weird bugs. You can get background processes that have SIGINT and SIGQUIT set to SIG_DFL today with the (buggy) existing bash behavior. The same POSIX- blessed technique will work in fixed future versions: { trap - SIGINT SIGQUIT ; program; } & # `exec program' if you prefer instead of program & and this has the advantage -- or not -- of granularity. I haven't encountered an interactive shell in a tty without job control in the last 35 years. (*1) I contend that it's past time this POSIX misfeature was retired. You absolutely can have that discussion with the POSIX group; since job control remains an optional POSIX feature, you might want to incorporate a proposal to make it mandatory. In the meantime, « shopt -s background_without_magic » (*2) gets my vote, I don't see any advantage over the mechanism above. along with incorporating « nohup » as a built-in (so that Bash can guarantee its behaviour, and add options to improve its internal orthogonality.). What guarantees would you like? Or, what do you consider the essential parts of nohup's behavior that should be guaranteed that are not now? nohup's been around for what, 40+ years now; its behavior is pretty well known. There's little advantage to making it a builtin other than to nohup builtins, and you basically can do that already. *1: I have very occasionally had interactive single-user shell running on /dev/console, which doesn't appear to count as a tty because it doesn't respond to tcsetpgrp. Try running something in a Docker container; that doesn't guarantee a controlling terminal. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: SIGINT handling during async functions
Firstly, let's just leave aside "POSIX requires this" for a bit. I know that the requirement is there, and I think it is one of those broken things that ought to have been dropped from POSIX, or at least reduced to optional rather than required. On Tue, 24 Jan 2023 at 07:35, Chet Ramey wrote: > On 1/21/23 7:55 AM, Tycho Kirchner wrote: > > Please consider a script launching several commands in background > > and waiting for their completion: > (note these last 4 words; I suspect they exclude the "common case experience" of people who think that it's only natural to want to insulate new daemons from tty signals) > > > > cmd1 & > > cmd2 & > > wait > > > > [...] In my experience, what the user usually wants in such a case is > to abort cmd1, cmd2 as well as the script having launched them. > > Odd, my experience is the opposite. I have run commands asynchronously > from scripts quite often in my previous lives, with the intent of > insulating them from signals. > Which behaviour seems "intuitive" probably depends on which of two patterns one uses more often: 1. create a daemon that is intended to continue running after the script finishes; or 2. create a number of cooperating parallel processes that are entirely internal to the script, which *should* exit at or before the end of the script. I'm old enough to have used the Bourne shell before it had tty job control, so I can see why *in the interactive case* it makes sense to prevent tty signals from affecting any "background" process launched directly from the interactive shell. I contend that it's inconsistent for the actions of "nohup" and "&" to NOT be fully orthogonal. And I contend that a daemon that unexpectedly dies is much more obvious than a bunch of internal processes that are unexpectedly left running; you have to proactively check for orphaned processes, and their continued action may cause weird bugs. I haven't encountered an interactive shell in a tty without job control in the last 35 years. (*1) I contend that it's past time this POSIX misfeature was retired. In the meantime, « shopt -s background_without_magic » (*2) gets my vote, along with incorporating « nohup » as a built-in (so that Bash can guarantee its behaviour, and add options to improve its internal orthogonality.). -Martin *1: I have very occasionally had interactive single-user shell running on /dev/console, which doesn't appear to count as a tty because it doesn't respond to tcsetpgrp. *2: or perhaps with finer granularity « shopt -u bg_block_signals bg_null_stdin »
Re: SIGINT handling during async functions
On 1/21/23 7:55 AM, Tycho Kirchner wrote: Am 16.01.23 um 18:26 schrieb Chet Ramey: The fix is to add enough state machinery to detect this situation and behave in a way that can satisfy both the standard and the later interpretation, while being careful not to undo this work later. This is obviously not how bash worked in the past. Thanks for the explanation. While editing the state machinery I would like to suggest to add a new shopt, let's call it keepsigint, which a user may set to preserve the SIGINT trap set in the parent shell for all asynchronous commands. Is this really what you want, since none of the scenarious you describe use it? I suggest that what you are asking for is a way to set the signal disposition to SIG_DFL instead of SIG_IGN. While the POSIX behavior to ignore SIGINT for background processes if job control is disabled makes totally sense for interactive shells, for scripts to me it often appears not constructive. How often do you have job control disabled in interactive shells? It seems to me that scripts are the primary motivaation for this behavior. Please consider a script launching several commands in background and waiting for their completion: cmd1 & cmd2 & wait If the user having launched this script from the interactive terminal aborts it by hitting Ctrl+C, by default, the shell sends SIGINT to the process group (pgid) of the script. However, while cmd1 and cmd2 get their signal, they usually (if they don't override it) ignore it due to above POSIX requirement. In my experience, what the user usually wants in such a case is to abort cmd1, cmd2 as well as the script having launched them. Odd, my experience is the opposite. I have run commands asynchronously from scripts quite often in my previous lives, with the intent of insulating them from signals. It's pretty clear that the historical bash behavior, which ends up the way you want, is not correct. Anyway, if your goal is to allow CMD to have SIG_DFL for SIGINT and SIGQUIT, the POSIX way to do it is similar to your fourth option. { trap - SIGINT ; exec cmd1; } & { trap - SIGINT ; exec cmd2; } & which interp 751 says has to work (you can pick and choose your use of `exec' depending on what you're running, of course). It obviously doesn't work in bash-5.2, but bash-5.2 doesn't have that new option, either, and already makes such asynchronous commands interruptible. As to `sanity': I would argue that expecting any behavior other than to have asynchronous commands with SIGINT and SIGQUIT set to SIG_IGN is not a reasonable expectation. The Bourne family of shells has always behaved that way. Now, you might have to work around it, but the workaround should be possible, not the default. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: SIGINT handling during async functions
On Sat, Jan 21, 2023 at 01:55:27PM +0100, Tycho Kirchner wrote: > cmd1 & > cmd2 & > wait > > If the user having launched this script from the interactive terminal aborts > it by hitting Ctrl+C, by default, the shell sends SIGINT to the process group > (pgid) of the script. However, while cmd1 and cmd2 get their signal, they > usually (if they don't override it) ignore it due to above POSIX requirement. > In my experience, what the user usually wants in such a case is to abort > cmd1, cmd2 as well as the script having launched them. A given user might *want* that, but that's not what is going to happen, nor what is supposed to happen. If a user wants that behavior, they will need to set up a trap of their own, store the PIDs of the background processes, and kill them in the trap. There is nothing that should be changed in bash with regard to this. Bash is *already* doing a better job than POSIX requires, with its EXIT traps that actually do what one expects.
Re: SIGINT handling during async functions
Am 16.01.23 um 18:26 schrieb Chet Ramey: The fix is to add enough state machinery to detect this situation and behave in a way that can satisfy both the standard and the later interpretation, while being careful not to undo this work later. This is obviously not how bash worked in the past. Thanks for the explanation. While editing the state machinery I would like to suggest to add a new shopt, let's call it keepsigint, which a user may set to preserve the SIGINT trap set in the parent shell for all asynchronous commands. While the POSIX behavior to ignore SIGINT for background processes if job control is disabled makes totally sense for interactive shells, for scripts to me it often appears not constructive. Please consider a script launching several commands in background and waiting for their completion: cmd1 & cmd2 & wait If the user having launched this script from the interactive terminal aborts it by hitting Ctrl+C, by default, the shell sends SIGINT to the process group (pgid) of the script. However, while cmd1 and cmd2 get their signal, they usually (if they don't override it) ignore it due to above POSIX requirement. In my experience, what the user usually wants in such a case is to abort cmd1, cmd2 as well as the script having launched them. Of course there are ways to kill cmd1 and cmd2 (and possible grandchildren) explicitly, e.g. by sending an additional TERM signal to the process group, e.g. (at the top of the script) trap 'trap "" TERM; env kill -TERM -- -$$; exit 130' INT However, this is usually only safe, when we are the process group leader (otherwise we might kill our parent as well!), so we need an additional [ $$ -eq $(($(ps -o pgid= -p "$$"))) ] || exec setsid --wait "${BASH_SOURCE[0]}" "$@" at the top of our script to create a new process group if necessary. Further, applications may react differently on TERM and INT, making the "signal conversion" undesirable in the general case. Finally, asynchronously running bash scripts may print "Terminated" messages which are usually not of interest for a user having aborted the command manually. Another option would be to enable jobcontrol within the script and kill the commands that way, e.g. set -m; cmd1 & jobs=($(jobs -p)); env kill -INT -- "${jobs[@]/#/-}" However, jobcontrol disables the possibility to suspend the "whole script" with Ctrl+Z and bears the risk to eventually loose some jobs while without jobcontrol, killing the single pgid kills all leftovers with high certainty. A third way is to launch cmd1 and cmd2 with env --default-signal=SIGINT,SIGQUIT cmd1 & so they do not ignore SIGINT. That's fine, but has to be repeated for every command. Further, process substitutions and functions cannot be called that way. A fourth way is to explicitly set the INT trap within an async command group before executing the command, like { trap 'true' INT; exec cmd1; } & Personally I regularly use below __async__ function for async commands, command groups, process substitutions and functions and I'm fine with that. But all these four options require some typing (and reading) overhead and just don't feel "sane". I think, bash would really benefit from a 'keepsigint' option. What are your thoughts about that? Thanks and kind regards Tycho _ __async__ bash -c 'echo first; trap -p; sleep 6'; wait { __async__; exec bash -c 'echo second; trap -p; sleep 6'; } & wait foofunc(){ bash -c 'echo foofunc; trap -p; sleep 6'; }; __async__ foofunc; wait cat <(__async__; exec bash -c 'echo psub; trap -p; sleep 6';) __async__(){ local int_trap int_trap="$(trap -p INT)" [ -z "$int_trap" ] && int_trap="trap -- 'exit 130' SIGINT" if [ "${#@}" -eq 0 ]; then # Already running async, just set parent's INT handler. eval "$int_trap"; return fi if [[ $(type -t "$1") == file ]]; then # exec into external file so pid is same as if called like 'cmd &' { eval "$int_trap"; exec "$@"; } & else { eval "$int_trap"; "$@"; } & fi }
Re: SIGINT handling during async functions
On 1/12/23 6:34 PM, Tycho Kirchner wrote: Hi, we found quite some inconsistency and weirdness in the handling of SIGINT's during async function calls and were wondering, whether those are expected. All calls were executed from a script with jobcontrol turned off (set +m) while pressing Ctrl+C shortly afterwards. Thanks for the report. The basic issue is that the process started to execute the background command (`asynchronous list') does have the SIGINT and SIGQUIT dispositions set to SIG_IGN, but the processes it creates don't. The issue is that the processes in this list have to ignore SIGINT ("the commands in the list shall inherit from the shell a signal action of ignored (SIG_IGN) for the SIGINT and SIGQUIT signals" from the normative standard text) but they have to be allowed to use trap to change the signal dispositions (POSIX interp 751). The first problem is that if, say, a shell forked to run the shell function tries to initialize its signals and finds SIGINT ignored, it will assume that SIGINT should be `hard ignored', and non-interactive shells are not allowed to change that ("Signals that were ignored on entry to a non- interactive shell cannot be trapped or reset"). This is what happens with the `trap -p' and why the trap seems to change. We had a rousing discussion about precisely what "on entry" means a few years back. The second problem is figuring out how to set the SIGINT disposition in the child, since it's no longer a simple "what did I inherit from my parent?" So what do we do about that? Well, you want to preserve the original disposition of SIGINT in the child process that sets the handler to SIG_IGN, or figure out a different way that the child process can change that disposition, so an inherited value of SIG_IGN doesn't prevent a shell from setting a new trap. You also want to prevent the shell from setting the SIGINT disposition of processes it forks to this preserved previous disposition, since they're all supposed to get SIG_IGN by default (but see below!). The fix is to add enough state machinery to detect this situation and behave in a way that can satisfy both the standard and the later interpretation, while being careful not to undo this work later. This is obviously not how bash worked in the past. It gets tricky. Say a shell forked to run this asynchronous list runs trap to change the SIGINT disposition. It inherited SIG_IGN from its parent, but now the processes it forks need SIGINT to be set to SIG_DFL instead of SIG_IGN ("traps caught by the shell shall be set to the default values"). So this new state sort of ripples across different operations. The main INT handler is never executed in foofunc (is that expected?) Yes, subshells never inherit traps. while the new (default) handler either aborts command execution in case of 'foofunc &' or continues execution in case of '{ foofunc; } &'. Inconsistent handling of the above requirements. While on 'foofunc &' 'trap -p' at the beginning of foofunc (wrongly) prints the main handler, That's not wrong; the shell has to preserve the trap strings while changing the disposition and only change the string if a new trap is set. It's all very ad-hoc. in case of '{ foofunc; } &' it suddenly prints the ignore handler "trap -- '' SIGINT" and remains indeed uninterruptible. > Thus printing the trap apparently changes bash's behavior. Kind of. Reinitializing the signals reveals the real handler, and `trap -p' just displays it. It's setting SIGINT to be `hard ignored' that is the problem here. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: SIGINT handling during async functions
Date:Fri, 13 Jan 2023 08:29:25 +0100 From:Tycho Kirchner Message-ID: <6df2fd46-18e8-775d-a670-bd29ffdf3...@mail.de> | However, did you actually actually put the short snippets into a script, No, I didn't, and now I have, I see what you mean, bash does look to be doing something wrong wrt the state of the signals in the subshell it forks (sometimes). That's weird. | __ | bash -c 'echo first; trap -p' & wait | { bash -c 'echo second; trap -p'; } & wait | { trap -p >/dev/null; bash -c 'echo third; trap -p'; } & wait | __ | $ ./test.sh | first | trap -- '' SIGINT | trap -- '' SIGQUIT In that case, you're running bash -c asynchronously (in the background), so SIGINT and SIGQUIT are ignored, the child bash starts in that environment, the trap -p shows that those signals remain ignored, all is behaving as it should. | second But that one is wrong, running the same thing, inside a group, should change nothing at all. The group (if it is actually run at all, since it contains only one command in this case, it could simply be optimised away, which would produce identical code to execute as the first case, though that's not required) is run as a subshell (async), which means it (the subshell created) should have SIGINT and SIGQUIT ignored, just the same as in the first case. Nothing should be changing that when that group invokes bash -c, so those signals should be remaining ignored when that process is invoked (that it happens to be another instance of bash is irrelevant for that), so that bash should start in the same state as the one in the first test. Yet it clearly doesn't. Note that it is the bash running the script that is doing odd things, not the "bash -c" invoked within it. Run the script with a different shell (bosh, zsh, ksh93, dash, the FreeBSD and NetBSD shells) and everything acts the same (the script still running "bash -c ...") for all 3 tests (though some shells require removing the '-p' arg to trap in the 3rd case, at least in the versions I have, as they do not (yet, in the versions I have anyway) support "trap -p". That changes nothing when the script is run with bash, (using just "trap" there instead of "trap -p") so I mostly left it that way.[Aside: bosh also ignores SIGTTIN in an async command (when job control is disabled) which is probably a good idea, but isn't required by anything, but that difference is irrelevant here - it ignores it in all 3 cases, along with SIGINT and SIGQUIT] | third | trap -- '' SIGINT And that one is even stranger. For some reason in this case, when invoked (the trap -p sending its results to /dev/null in your script is actually writing that same output - only SIGINT is being ignored there, which explains why only SIGINT is ignored inside the "bash -c" though why having a trap command there is making that kind of difference (it does mean that the group cannot simply be optimised away however), apparently causing only SIGINT to be ignored (since it wasn't in the 2nd case, though both it and SIGQUIT should have been) I can't guess. You're quite correct, this is all badly broken. And note, it has been broken (just not quite the same way) for a very long time: jacaranda$ bash2 /tmp/test.sh first trap -- '' SIGINT trap -- '' SIGQUIT second third That's different, but still broken, but actually better than bash 5, since at least the results from the 2nd and 3rd tests are the same, the added trap command in the 3rd test is changing nothing (In all cases, for all tests, the "bash -c" invoked inside the script is bash5 - but since that simple code is doing exactly what it should, that's irrelevant, that could be replaced by any shell that supports "trap -p"). | So, even in this simple case, differences are observable. Yes, they are. Apologies for my hasty response, I was concentrating on the wrong issues (as some kind of explanation - it was the early hours of the morning, for me, I should have been asleep, but I just had to read mail one more time...) And just for the record, I'm running bash 5.2.15(1)-release on NetBSD 10.99.1 (amd64 processor - or x86_64 if you prefer - same as yours, just different OS). The bash2 I ran was 2.05b.0(1)-release kre
Re: SIGINT handling during async functions
Am 13.01.23 um 03:02 schrieb Robert Elz: Date:Fri, 13 Jan 2023 00:34:02 +0100 From:Tycho Kirchner Message-ID: <7d59c17d-792e-0ac7-fd86-b3b2e7d4b...@mail.de> | we found quite some inconsistency and weirdness | in the handling of SIGINT's during async function calls Not inconsistent or weird, and has nothing to do with function calls. | and were wondering, whether those are expected. Expected and required. | The main INT handler is never executed in foofunc [...] | Thus printing the trap apparently changes bash's behavior. Nonsense (the conclusion)> When an async command (any command, not just functions, or blocks enclosed in { } ) is run with job control disabled, SIGINT is ignored for that async command. (SIGQUIT too). That has been the way shells work since before either the Bourne shell (and all later shells based upon it, like bash) or job control, were invented. That is all you are seeing here. kre Dear Robert Elz, thanks for the quick response. However, did you actually actually put the short snippets into a script, executed it and verified that their behavior is the same? In particular, did you check, whether the respective 'sleep' commands kept running, after hitting Ctrl+C? On my test system, the 'sleep 3' within foofounc **is** killed in the first three code snippets, proving your statements wrong. **Only** in case of the 4th snippet, where the trap is printed at the beginning of foofunc, the 'sleep 3' command keeps running after hitting Ctrl+C. Let me give another example. Put the following commands into a script test.sh and execute it. __ bash -c 'echo first; trap -p' & wait { bash -c 'echo second; trap -p'; } & wait { trap -p >/dev/null; bash -c 'echo third; trap -p'; } & wait __ $ ./test.sh first trap -- '' SIGINT trap -- '' SIGQUIT second third trap -- '' SIGINT __ So, even in this simple case, differences are observable. Kind regards Tycho
Re: SIGINT handling during async functions
Oh, the differences in what trap -p is printing is because of special case handling for trap in a subshell environment, when the trap command is the first (maybe only) command executed (details vary between shells). That is mostly intended to allow T=$(trap -p) to work, but is usually applied to any subsell environment (it is simpler that way). An async command is a subshell environment. When you do foofunc& the trap command thus prints the trap from the parent's environment, but when you embed that ina group, the traps get reset to those for the subshell before the trap command gets to run, so you see that instead. Everything is working as intended. kre
Re: SIGINT handling during async functions
Date:Fri, 13 Jan 2023 00:34:02 +0100 From:Tycho Kirchner Message-ID: <7d59c17d-792e-0ac7-fd86-b3b2e7d4b...@mail.de> | we found quite some inconsistency and weirdness | in the handling of SIGINT's during async function calls Not inconsistent or weird, and has nothing to do with function calls. | and were wondering, whether those are expected. Expected and required. | The main INT handler is never executed in foofunc [...] | Thus printing the trap apparently changes bash's behavior. Nonsense (the conclusion). When an async command (any command, not just functions, or blocks enclosed in { } ) is run with job control disabled, SIGINT is ignored for that async command. (SIGQUIT too). That has been the way shells work since before either the Bourne shell (and all later shells based upon it, like bash) or job control, were invented. That is all you are seeing here. kre
SIGINT handling during async functions
Hi, we found quite some inconsistency and weirdness in the handling of SIGINT's during async function calls and were wondering, whether those are expected. All calls were executed from a script with jobcontrol turned off (set +m) while pressing Ctrl+C shortly afterwards. In summary: The main INT handler is never executed in foofunc (is that expected?) while the new (default) handler either aborts command execution in case of 'foofunc &' or continues execution in case of '{ foofunc; } &'. While on 'foofunc &' 'trap -p' at the beginning of foofunc (wrongly) prints the main handler, in case of '{ foofunc; } &' it suddenly prints the ignore handler "trap -- '' SIGINT" and remains indeed uninterruptible. Thus printing the trap apparently changes bash's behavior. Tested bash versions: GNU bash, Version 5.1.4(1)-release (x86_64-pc-linux-gnu) GNU bash, Version 5.2.2(1)-release (x86_64-pc-linux-gnu) on Debian Bullseye. Thanks and kind regards Tycho t='echo INT ${FUNCNAME[0]-main} >&2' trap "$t" INT foofunc(){ sleep 3; echo foo >&2; } foofunc & sleep 5 --> INT main # foofunc INT-handler is reset to default ('foo' is not printed). # Note that 'trap -p' within foofunc wrongly prints above INT handler. t='echo INT ${FUNCNAME[0]-main} >&2' trap "$t" INT foofunc(){ trap "$t" INT; sleep 3; echo foo >&2; } foofunc & sleep 5 --> INT main INT foofunc foo # foofunc custom INT-handler works. t='echo INT ${FUNCNAME[0]-main} >&2' trap "$t" INT foofunc(){ sleep 3; echo foo >&2; } { foofunc; } & sleep 5 --> INT main foo # Opposing to 'foofunc &' foo _is_ printed so apparently we have a # different default trap handler here. t='echo INT ${FUNCNAME[0]-main} >&2' trap "$t" INT foofunc(){ trap -p; sleep 3; echo foo >&2; } { foofunc; } & sleep 5 --> trap -- '' SIGINT ^CINT main $ foo # Here, when the trap is printed, INT is reported as "ignored" and foofunc # becomes indeed uninterruptible. So, 'trap -p' changes bash's behavior.
Re: SIGINT handling
On 9/21/15 5:07 PM, Stephane Chazelas wrote: > The problem is that here the parent's SIGINT handler is run upon > the return from waitpid(), just after. My patch doesn't rely on > EINTR from waitpid() (which doesn't happen here, waitpid() returns > with the pid of the child that did an exit() upon receiving > SIGINT), just on the "status" returned by the child, so doesn't > have the problem. I wonder if the kernel is restarting the waitpid() even though the signal handler was installed without SA_RESTART. > What do you suggest we do to fix that issue? I think your additional test for wait_sigint_received coupled with a check that the child died for some other reason than SIGINT, in addition to the EINTR test, is a reasonable fix. >> This still counts as catching and handling the SIGINT, and the shell >> should not act as if the foreground process died as a result of one. > > That's the point I'm arguing on. > > If the command handled SIGINT and returned with 130, I argue it > is considering itself and telling its parent as having been > "interrupted" No, it's not. If a shell exits with status 130, it's saying that the last command it executed was killed by SIGINT, or happened to exit with status 130 for some random reason. It's very possible for a non-interactive shell to restore its original SIGINT handler and resend SIGINT to itself, if it's concerned about telling its parent that it's been interrupted. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
2015-09-24 14:53:16 -0400, Chet Ramey: > On 9/24/15 9:57 AM, Stephane Chazelas wrote: > > > IMO, the best approach would be to give up on WCE altogether > > which is more source of frustration anyway than it has ever > > helped. I live very well with a /bin/sh (dash) and interactive > > shell (zsh) that don't do it. > > We'll agree to disagree. [...] Now that we're settled on WCE, would you agree that a=$(cmd-that-catches-sigint) should behave like (cmd-that-catches-sigint) (as in, not exit the shell as per WCE)? What about $PIPESTATUS? In: cmd-that-catches-sigint | cmd-that-does-not or cmd-that-does-not | cmd-that-catches-sigint Should we exit on SIGINT or leave that command run in background? Should pipefail have an influence on the behaviour? What about lastpipe? What about when using the wait builtin? Why should: cmd & wait "$!" be treated differently from cmd ? Because cmd's stdin is /dev/null and so is unlikely to be an interactive command? So we admit WCE is a kludge -- Stephane
Re: SIGINT handling
On 9/24/15 9:57 AM, Stephane Chazelas wrote: > IMO, the best approach would be to give up on WCE altogether > which is more source of frustration anyway than it has ever > helped. I live very well with a /bin/sh (dash) and interactive > shell (zsh) that don't do it. We'll agree to disagree. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
On 9/22/15 8:18 AM, Greg Wooledge wrote: > On Mon, Sep 21, 2015 at 10:07:55PM +0100, Stephane Chazelas wrote: >> Maybe the test scenario was not clear: >> >> bash -c 'cmd; echo hi' >> >> is run from an interactive shell, cmd is a long running >> application (the problem that sparked this discussion was with >> ping and I showed examples with an inline-script calling sleep) > > Just for the record, ping is the *classic* example of an incorrectly > written application that traps SIGINT but doesn't kill itself with > SIGINT afterward. (This seems to be true on multiple systems -- at > the very least, HP-UX and Linux pings both suffer from it.) > > A loop like this works as expected: > > while true; do > sleep 1 > done > > A loop like this does not: > > while true; do > ping -c 1 some.host # or on HP-UX, ping some.host -n 1 > done If you decide, as bash has, to allow the foreground job to determine what to do with SIGINT, you have to cope with software like ping. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
Given that the bug was introduced by Linus' patch (to fix a bug that anyway is in all shell implementations that do WCE) and that it's caused by a behaviour that seems to be specific to the Linux kernel (that the kernel seems to be messing up with the order of delivery of the SIGCHLD (or return from waitpid()) and SIGINT), we may want to bring the issue up to him. Here, the behaviour could be seen as a kernel bug, since the child should clearly die *after* the SIGINT has been issued to the parent (since the ^C should insert the SIGINT in the signal queue of both parent and child processes at the same time) so it's wrong for SIGINT to be handled *after* waitpid() returns. But of course one can also argue that the order of signal delivery is not guaranteed in general anyway. IMO, the best approach would be to give up on WCE altogether which is more source of frustration anyway than it has ever helped. I live very well with a /bin/sh (dash) and interactive shell (zsh) that don't do it. WCE may be good in a perfect world where everything does it (everything that calls waitpid() without using system(3)), but if not, I hardly see the point. What's the point of bash doing it when sh, find -exec, xargs, watch, git (like in that emacs bug report) don't do it. it seems to me that finding another way to address it (like emacs approach of putting itself on its own in a new forground job if it's not already a process group leader) for the rare cases where it's useful (like the vi -> :! case) would be better. -- Stephane
Re: SIGINT handling
2015-09-24 09:36:08 +0100, Pádraig Brady: [...] > > (gdb) handle SIGINT nostop pass [...] > > In case it's relevant, I'm not entirely sure of gdb's signal handling: > https://sourceware.org/bugzilla/show_bug.cgi?id=18364 Yes, I wondered about that. I'd expect the "handle SIGINT nostop pass", to take gdb out of the loop, but I've not verified it and I suspect ptracing could have side effects. It's easy to corroborate with printfs though here which I just did: $ ./bash -c './a; echo x' ^Cwait_sigint_received=1 pid=-1 wait_sigint_received=1 pid=956 x $ ./bash -c './a; echo x' ^Cwait_sigint_received=1 pid=958 $ diff -pu jobs.c\~ jobs.c --- jobs.c~ 2015-09-20 20:03:14.692119372 +0100 +++ jobs.c 2015-09-24 11:49:03.963122465 +0100 @@ -3262,6 +3262,7 @@ itrace("waitchld: waitpid returns %d blo require the child to actually die due to SIGINT to act on the SIGINT we received; otherwise we assume the child handled it and let it go. */ + fprintf(stderr, "wait_sigint_received=%d pid=%d\n", wait_sigint_received, pid); if (pid < 0 && errno == EINTR && wait_sigint_received) child_caught_sigint = 1; -- Stephane
Re: SIGINT handling
On 24/09/15 07:20, Stephane Chazelas wrote: > 2015-09-24 07:01:23 +0100, Stephane Chazelas: >> 2015-09-23 21:27:00 -0400, Chet Ramey: >>> On 9/19/15 5:31 PM, Stephane Chazelas wrote: >>> In case it was caused by some Debian patch, I recompiled the code of 4.3.42 from gnu.org and the one from the devel branch on the git repository (commit bash-20150911 snapshot) and still: $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' ^Chi $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' ^Chi $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' ^C $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' ^Chi Sometimes (and the frequency of occurrences is erratic, generally roughly 80% of "hi"s but at times, I don't see a "hi" in a while), the "hi" doesn't show up. Note that I press ^C well after sleep has started. >>> >>> It would be nice to see a system call trace for this so we can check >>> what's going on with the timing. >> >> I don't have them logged but I did several tests in gdb >> with "handle SIGINT nostop pass" and as I said before, >> Upon the test that sets child_caught_sigint, waitpid() has not >> returned with EINTR and wait_sigint_received has been set. >> >> If I break on the SIGINT handler, I see the call trace at the >> return of the "syscall". >> >> I can try and get you a call trace later today. > [...] > > (gdb) handle SIGINT nostop pass > SIGINT is used by the debugger. > Are you sure you want to change it? (y or n) y > SignalStop Print Pass to program Description > SIGINTNoYes Yes Interrupt > (gdb) break wait_sigint_handler > Breakpoint 1 at 0x443a70: file jobs.c, line 2241. > (gdb) run > Starting program: bash-4.3/bash -c ./a\;\ echo\ x > ^C > Program received signal SIGINT, Interrupt. > > Breakpoint 1, wait_sigint_handler (sig=2) at jobs.c:2241 > 2241{ > (gdb) bt > #0 wait_sigint_handler (sig=2) at jobs.c:2241 > #1 > #2 0x776bc31c in __libc_waitpid (pid=pid@entry=-1, > stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at > ../sysdeps/unix/sysv/linux/waitpid.c:31 > #3 0x00445f3d in waitchld (block=block@entry=1, wpid=5337) at > jobs.c:3224 > #4 0x0044733b in wait_for (pid=5337) at jobs.c:2485 > #5 0x00437992 in execute_command_internal > (command=command@entry=0x70bb88, asynchronous=asynchronous@entry=0, > pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, > fds_to_close=fds_to_close@entry=0x70bde8) at execute_cmd.c:829 > #6 0x00437b0e in execute_command (command=0x70bb88) at > execute_cmd.c:390 > #7 0x00435f23 in execute_connection (fds_to_close=0x70bdc8, > pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bd88) at > execute_cmd.c:2494 > #8 execute_command_internal (command=0x70bd88, > asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, > pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bdc8) > at execute_cmd.c:945 > #9 0x0047955b in parse_and_execute (string=, > from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at > evalstring.c:387 > #10 0x004205d7 in run_one_command (command=) at > shell.c:1348 > #11 0x0041f524 in main (argc=3, argv=0x7fffe258, > env=0x7fffe278) at shell.c:695 > (gdb) frame 2 > #2 0x776bc31c in __libc_waitpid (pid=pid@entry=-1, > stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at > ../sysdeps/unix/sysv/linux/waitpid.c:31 > 31 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory. > (gdb) disassemble > Dump of assembler code for function __libc_waitpid: >0x776bc300 <+0>: mov0x2f14cd(%rip),%r9d# > 0x779ad7d4 <__libc_multiple_threads> >0x776bc307 <+7>: test %r9d,%r9d >0x776bc30a <+10>:jne0x776bc336 <__libc_waitpid+54> >0x776bc30c <+12>:xor%r10d,%r10d >0x776bc30f <+15>:movslq %edx,%rdx >0x776bc312 <+18>:movslq %edi,%rdi >0x776bc315 <+21>:mov$0x3d,%eax >0x776bc31a <+26>:syscall > => 0x776bc31c <+28>:cmp$0xf000,%rax >0x776bc322 <+34>:ja 0x776bc325 <__libc_waitpid+37> >0x776bc324 <+36>:retq >0x776bc325 <+37>:mov0x2ebb3c(%rip),%rdx# > 0x779a7e68 >0x776bc32c <+44>:neg%eax >0x776bc32e <+46>:mov%eax,%fs:(%rdx) >0x776bc331 <+49>:or $0x,%rax > (gdb) fin > Run till exit from #2 0x776bc31c in __libc_waitpid > (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffdbc8, > options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31 > 0x00445f3d in waitchld (block=block@entry=1, wpid=5481) at jobs.c:3224 > 3224 pid = WAITPID (-1, &status, waitpid_flags); > V
Re: SIGINT handling
2015-09-24 07:01:23 +0100, Stephane Chazelas: > 2015-09-23 21:27:00 -0400, Chet Ramey: > > On 9/19/15 5:31 PM, Stephane Chazelas wrote: > > > > > In case it was caused by some Debian patch, I recompiled the > > > code of 4.3.42 from gnu.org and the one from the devel branch on > > > the git repository (commit bash-20150911 snapshot) and still: > > > > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > ^Chi > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > ^Chi > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > ^C > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > ^Chi > > > > > > Sometimes (and the frequency of occurrences is erratic, > > > generally roughly 80% of "hi"s but at times, I don't see a "hi" > > > in a while), the "hi" doesn't show up. Note that I press ^C well > > > after sleep has started. > > > > It would be nice to see a system call trace for this so we can check > > what's going on with the timing. > > I don't have them logged but I did several tests in gdb > with "handle SIGINT nostop pass" and as I said before, > Upon the test that sets child_caught_sigint, waitpid() has not > returned with EINTR and wait_sigint_received has been set. > > If I break on the SIGINT handler, I see the call trace at the > return of the "syscall". > > I can try and get you a call trace later today. [...] (gdb) handle SIGINT nostop pass SIGINT is used by the debugger. Are you sure you want to change it? (y or n) y SignalStop Print Pass to program Description SIGINTNoYes Yes Interrupt (gdb) break wait_sigint_handler Breakpoint 1 at 0x443a70: file jobs.c, line 2241. (gdb) run Starting program: bash-4.3/bash -c ./a\;\ echo\ x ^C Program received signal SIGINT, Interrupt. Breakpoint 1, wait_sigint_handler (sig=2) at jobs.c:2241 2241{ (gdb) bt #0 wait_sigint_handler (sig=2) at jobs.c:2241 #1 #2 0x776bc31c in __libc_waitpid (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31 #3 0x00445f3d in waitchld (block=block@entry=1, wpid=5337) at jobs.c:3224 #4 0x0044733b in wait_for (pid=5337) at jobs.c:2485 #5 0x00437992 in execute_command_internal (command=command@entry=0x70bb88, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bde8) at execute_cmd.c:829 #6 0x00437b0e in execute_command (command=0x70bb88) at execute_cmd.c:390 #7 0x00435f23 in execute_connection (fds_to_close=0x70bdc8, pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bd88) at execute_cmd.c:2494 #8 execute_command_internal (command=0x70bd88, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bdc8) at execute_cmd.c:945 #9 0x0047955b in parse_and_execute (string=, from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at evalstring.c:387 #10 0x004205d7 in run_one_command (command=) at shell.c:1348 #11 0x0041f524 in main (argc=3, argv=0x7fffe258, env=0x7fffe278) at shell.c:695 (gdb) frame 2 #2 0x776bc31c in __libc_waitpid (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31 31 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory. (gdb) disassemble Dump of assembler code for function __libc_waitpid: 0x776bc300 <+0>: mov0x2f14cd(%rip),%r9d# 0x779ad7d4 <__libc_multiple_threads> 0x776bc307 <+7>: test %r9d,%r9d 0x776bc30a <+10>:jne0x776bc336 <__libc_waitpid+54> 0x776bc30c <+12>:xor%r10d,%r10d 0x776bc30f <+15>:movslq %edx,%rdx 0x776bc312 <+18>:movslq %edi,%rdi 0x776bc315 <+21>:mov$0x3d,%eax 0x776bc31a <+26>:syscall => 0x776bc31c <+28>:cmp$0xf000,%rax 0x776bc322 <+34>:ja 0x776bc325 <__libc_waitpid+37> 0x776bc324 <+36>:retq 0x776bc325 <+37>:mov0x2ebb3c(%rip),%rdx# 0x779a7e68 0x776bc32c <+44>:neg%eax 0x776bc32e <+46>:mov%eax,%fs:(%rdx) 0x776bc331 <+49>:or $0x,%rax (gdb) fin Run till exit from #2 0x776bc31c in __libc_waitpid (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7fffdbc8, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31 0x00445f3d in waitchld (block=block@entry=1, wpid=5481) at jobs.c:3224 3224 pid = WAITPID (-1, &status, waitpid_flags); Value returned is $5 = 5337 (gdb) p wait_sigint_received $6 = 1 In the other (working) cases, the difference is that waitpid() returs -1 EINTR instead. Note that Bart on the zsh mailing
Re: SIGINT handling
2015-09-23 21:27:00 -0400, Chet Ramey: > On 9/19/15 5:31 PM, Stephane Chazelas wrote: > > > In case it was caused by some Debian patch, I recompiled the > > code of 4.3.42 from gnu.org and the one from the devel branch on > > the git repository (commit bash-20150911 snapshot) and still: > > > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > ^Chi > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > ^Chi > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > ^C > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > ^Chi > > > > Sometimes (and the frequency of occurrences is erratic, > > generally roughly 80% of "hi"s but at times, I don't see a "hi" > > in a while), the "hi" doesn't show up. Note that I press ^C well > > after sleep has started. > > It would be nice to see a system call trace for this so we can check > what's going on with the timing. I don't have them logged but I did several tests in gdb with "handle SIGINT nostop pass" and as I said before, Upon the test that sets child_caught_sigint, waitpid() has not returned with EINTR and wait_sigint_received has been set. If I break on the SIGINT handler, I see the call trace at the return of the "syscall". I can try and get you a call trace later today. > > Can you reproduce this on anything other than Debian? I'm wondering > whether it's a Linux-4 kernel phenomenon. Plus I don't have any > Debian machines laying around. It's hard to reproduce on an idle system. It's relatively easy to reproduce on a busy one and if the "cmd" exits shortly after receiving its SIGINT. I can reproduce on a Ubuntu 14.04 with an older kernel (3.13). I can't reproduce on FreeBSD (in a VM though). cmd == #include main() {signal(2,_exit);pause();} $ tar zcf - / >& /dev/null & [1] 4417 $ tar zcf - / >& /dev/null & [2] 4419 $ tar zcf - / >& /dev/null & [3] 4421 $ bash -c './a.out; echo x' ^Cx $ bash -c './a.out; echo x' ^C Works on second attempt. -- Stephane
Re: SIGINT handling
On 9/20/15 11:52 AM, Stephane Chazelas wrote: > When the above code exits without printing "hi", we see this > call stack for instance (breakpoint on kill() in gdb): > > #0 kill () at ../sysdeps/unix/syscall-template.S:81 > #1 0x0045dd8e in termsig_handler (sig=) at sig.c:588 > #2 0x0045ddef in termsig_handler (sig=) at sig.c:554 > #3 0x004466bb in set_job_status_and_cleanup (job=0) at jobs.c:3539 > #4 waitchld (block=block@entry=1, wpid=20802) at jobs.c:3316 > #5 0x0044733b in wait_for (pid=20802) at jobs.c:2485 > #6 0x00437992 in execute_command_internal > (command=command@entry=0x70aa48, asynchronous=asynchronous@entry=0, > pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, > fds_to_close=fds_to_close@entry=0x70bb68) at execute_cmd.c:829 > #7 0x00437b0e in execute_command (command=0x70aa48) at > execute_cmd.c:390 > #8 0x00435f23 in execute_connection (fds_to_close=0x70bb48, > pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bb08) at > execute_cmd.c:2494 > #9 execute_command_internal (command=0x70bb08, > asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, > pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb48) > at execute_cmd.c:945 > #10 0x0047955b in parse_and_execute (string=, > from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at > evalstring.c:387 > #11 0x004205d7 in run_one_command (command=) at > shell.c:1348 > #12 0x0041f524 in main (argc=3, argv=0x7fffe198, > env=0x7fffe1b8) at shell.c:695 > > That is, SIGINT is being handled *after* the SIGINT handler has > been restored to its default of exiting the shell. An alternate explanation is that somehow the shell is forgetting that SIGINT is trapped. I don't see how or why that would happen, but I don't have enough information to determine whether that's the case. > Now, I'm not sure how to best fix that as I suppose we don't get > any guarantee of when SIGINT will be delivered (it may be why > ksh93 ignores SIGINT altogether and relies solely on > WIFSIGNALED) > > The above scenario suggests SIGCHLD is being delivered before > SIGINT which is strange. I'd expect SIGINT to be inserted by the > kernel in both cmd and bash queues upon CTRL-C, and the SIGCHLD > would necesarily come after those SIGINT. Could it be that > SIGCHLD jumps the queue? The above scenario doesn't suggest that SIGCHLD is being delivered at all. The shell is doing a blocking waitpid for a specific pid. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
On 9/19/15 5:31 PM, Stephane Chazelas wrote: > In case it was caused by some Debian patch, I recompiled the > code of 4.3.42 from gnu.org and the one from the devel branch on > the git repository (commit bash-20150911 snapshot) and still: > > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > ^Chi > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > ^Chi > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > ^C > $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > ^Chi > > Sometimes (and the frequency of occurrences is erratic, > generally roughly 80% of "hi"s but at times, I don't see a "hi" > in a while), the "hi" doesn't show up. Note that I press ^C well > after sleep has started. It would be nice to see a system call trace for this so we can check what's going on with the timing. Can you reproduce this on anything other than Debian? I'm wondering whether it's a Linux-4 kernel phenomenon. Plus I don't have any Debian machines laying around. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
2015-09-22 12:04:45 -0600, Bob Proulx: > Greg Wooledge wrote: > > Just for the record, ping is the *classic* example of an incorrectly > > written application that traps SIGINT but doesn't kill itself with > > SIGINT afterward. (This seems to be true on multiple systems -- at > > the very least, HP-UX and Linux pings both suffer from it.) > > The command I run into the problem most with is 'rsync' in a loop. > > EXIT VALUES >0 Success > ... >20 Received SIGUSR1 or SIGINT > > Which forces me to write such things this way. > > rsync ... > rc=$? > if [ $rc -eq 20 ]; then > kill -INT $$ > fi > if [ $rc -ne 0 ]; then > echo "Error: failed: ..." 1>&2 > exit 1 > fi [...] Another (generic) work-around as mentioned at http://unix.stackexchange.com/a/230568 and here is to add: trap ' trap - INT kill -s INT "$$" ' INT That doesn't work properly if there are subshells though. That basically turns a WCE shell to WUE (for very simple scripts). For SIGQUIT, you'd probably want to disable core dumps as well: trap ' trap - QUIT ulimit -c 0 kill -s QUIT "$$" ' QUIT -- Stephane
Re: SIGINT handling
Greg Wooledge wrote: > Just for the record, ping is the *classic* example of an incorrectly > written application that traps SIGINT but doesn't kill itself with > SIGINT afterward. (This seems to be true on multiple systems -- at > the very least, HP-UX and Linux pings both suffer from it.) The command I run into the problem most with is 'rsync' in a loop. EXIT VALUES 0 Success ... 20 Received SIGUSR1 or SIGINT Which forces me to write such things this way. rsync ... rc=$? if [ $rc -eq 20 ]; then kill -INT $$ fi if [ $rc -ne 0 ]; then echo "Error: failed: ..." 1>&2 exit 1 fi Bob
Re: SIGINT handling
2015-09-22 15:18:32 +0100, Stephane Chazelas: > 2015-09-22 09:41:35 -0400, Chet Ramey: > [...] > > > AFAICT emacs starts a new process group (and makes it the > > > foreground process group). > > > > Maybe, if it's being run from an interactive shell or in a separate > > X window. On the other hand, run this script with `dash': > [...] > > It does that unconditionaly (since 94 at least), but that's > under a #ifdef BSD_PGRPS in the emacs source. Strangely enough, > that BSD_PGRPS is not defined anymore for freebsd or netbsd > though it is for gnu-linux [...] To add on that, the code was removed at some point altogether http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=58eb6cf0f77547d29f4fddca922eb6f98c0ffb28 in emacs-24.0.96 and then added back without the #ifdef BSD_PGRPS http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=322aea6ddf7ec7fd71410d98ec1de69f219aff3e in emacs-24.2.90 So versions 24.0.96 to 24.2 must have been broken under gnu-linux as well, and newer versions (24.2.90 and above) should be OK including on FreeBSD|OS/X (so no need to report it as a bug to the emacs maintainers). -- Stephane
Re: SIGINT handling
On 9/22/15 11:28 AM, Stephane Chazelas wrote: > 2015-09-22 15:18:32 +0100, Stephane Chazelas: >> 2015-09-22 09:41:35 -0400, Chet Ramey: >> [...] AFAICT emacs starts a new process group (and makes it the foreground process group). >>> >>> Maybe, if it's being run from an interactive shell or in a separate >>> X window. On the other hand, run this script with `dash': >> [...] >> >> It does that unconditionaly (since 94 at least), but that's >> under a #ifdef BSD_PGRPS in the emacs source. Strangely enough, >> that BSD_PGRPS is not defined anymore for freebsd or netbsd >> though it is for gnu-linux > [...] > > To add on that, the code was removed at some point altogether > http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=58eb6cf0f77547d29f4fddca922eb6f98c0ffb28 > in emacs-24.0.96 and then added back without the #ifdef > BSD_PGRPS > http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=322aea6ddf7ec7fd71410d98ec1de69f219aff3e > in emacs-24.2.90 > > So versions 24.0.96 to 24.2 must have been broken under > gnu-linux as well, and newer versions (24.2.90 and above) should > be OK including on FreeBSD|OS/X (so no need to report it as a > bug to the emacs maintainers). I don't use GNU emacs; it's not that big a deal. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
2015-09-22 16:28:16 +0100, Stephane Chazelas: [...] > To add on that, the code was removed at some point altogether > http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=58eb6cf0f77547d29f4fddca922eb6f98c0ffb28 > in emacs-24.0.96 and then added back without the #ifdef > BSD_PGRPS > http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=322aea6ddf7ec7fd71410d98ec1de69f219aff3e > in emacs-24.2.90 [...] And here's the bug that prompted for reinserting that code, which is relevant to this discussion: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=12697 -- Stephane
Re: SIGINT handling
2015-09-22 09:41:35 -0400, Chet Ramey: [...] > > AFAICT emacs starts a new process group (and makes it the > > foreground process group). > > Maybe, if it's being run from an interactive shell or in a separate > X window. On the other hand, run this script with `dash': [...] It does that unconditionaly (since 94 at least), but that's under a #ifdef BSD_PGRPS in the emacs source. Strangely enough, that BSD_PGRPS is not defined anymore for freebsd or netbsd though it is for gnu-linux It seems it's because the meaning of that macro has changed over time. I suspect it used to mean "whether job control was available", but now it's to decide whether to use setpgtp or setpgid. The part that puts emacs on its own foreground process group (narrow_foreground_group) does use setpgrp() (and calls tcsetpgrp()) but after a: #ifdef HAVE_SETPGID #if !defined (USG) || defined (BSD_PGRPS) #undef setpgrp #define setpgrp setpgid #endif #endif So in any case, it is calling setpgid() Just seems like a bug/overlook that narrow_foreground_group is not done on BSD and causes the problem you observe. -- Stephane
Re: SIGINT handling
2015-09-22 09:41:35 -0400, Chet Ramey: [...] > > AFAICT emacs starts a new process group (and makes it the > > foreground process group). > > Maybe, if it's being run from an interactive shell or in a separate > X window. On the other hand, run this script with `dash': > > echo before > emacs -nw /tmp/qux > echo after > > If you use ^G to abort an editing command in emacs, you won't see `after' > displayed and the script will exit with status 130, even though emacs > clearly doesn't die due to SIGINT. [...] It works for me (on Debian, displays both before and after) as emacs starts in a new process group. The problem seems to be with some ports of emacs to OS/X and was already discussed at http://www.zsh.org/mla/workers/2009/msg00926.html about the MacPorts version of Emacs that doesn't seem to be starting the new process group. -- Stephane
Re: SIGINT handling
On 9/21/15 5:24 PM, Stephane Chazelas wrote: > 2015-09-21 15:34:28 -0400, Chet Ramey: >> On 9/21/15 5:48 AM, Stephane Chazelas wrote: >> >>> I'm not sure I prefer that WCE approach over WUE. Wouldn't it be >>> preferable that applications that intercept SIGINT/QUIT/TSTP for >>> anything other than clean-up before exit/suspend implement job >>> control themselves instead (like vi's :! should create a process >>> group and make that the foreground process group of the >>> terminal so pressing ^C in sh -c vi, :!sleep 10, only sends the >>> SIGINT to sleep)? >> >> The classic example is emacs remapping the terminal intr key to ^G >> and using SIGINT as its internal abort-command signal. > [...] > > AFAICT emacs starts a new process group (and makes it the > foreground process group). Maybe, if it's being run from an interactive shell or in a separate X window. On the other hand, run this script with `dash': echo before emacs -nw /tmp/qux echo after If you use ^G to abort an editing command in emacs, you won't see `after' displayed and the script will exit with status 130, even though emacs clearly doesn't die due to SIGINT. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
2015-09-22 08:18:08 -0400, Greg Wooledge: [...] > You might already have been aware of this; I'm not sure. But in any case, > it makes a tremendous different what "cmd" is in your example. You > can't generalize it. Hi Greg, Yes, this whole thread is about the behaviour of uninteractive bash with commands that call exit() upon SIGINT. It was initially a follow-up on https://unix.stackexchange.com/questions/230421/unable-to-stop-a-bash-script-with-ctrlc/230568#230568 which was about ping specifically. It's true that with shells implementing WCE, the behaviour of ping is unfortunate, but I don't think we can say that ping is to blame, more WCE. ping cannot exit other than on error or when killed. It seems reasonable for it to exit (after printing the statistics) if there was no error upon CTRL-C. Note that the iputils version does a exit(!nreceived || (deadline && nreceived < npackets)); It it returning information to the caller which it couldn't do if it killed itself. That allows system("ping something") for instance to make use of the return status (system(3) ignores SIGINT in the parent). The WCE behaviour is cause for a number of bugs like that so I'm not sure it's such a great idea. -- Stephane
Re: SIGINT handling
On Mon, Sep 21, 2015 at 10:07:55PM +0100, Stephane Chazelas wrote: > Maybe the test scenario was not clear: > > bash -c 'cmd; echo hi' > > is run from an interactive shell, cmd is a long running > application (the problem that sparked this discussion was with > ping and I showed examples with an inline-script calling sleep) Just for the record, ping is the *classic* example of an incorrectly written application that traps SIGINT but doesn't kill itself with SIGINT afterward. (This seems to be true on multiple systems -- at the very least, HP-UX and Linux pings both suffer from it.) A loop like this works as expected: while true; do sleep 1 done A loop like this does not: while true; do ping -c 1 some.host # or on HP-UX, ping some.host -n 1 done You might already have been aware of this; I'm not sure. But in any case, it makes a tremendous different what "cmd" is in your example. You can't generalize it.
Re: SIGINT handling
2015-09-22 07:41:09 +0100, Stephane Chazelas: [...] > I wonder how FreeBSD sh addresses that. > > BTW, ksh93 has the problem (the 2011 one) as well as in: > > ksh93 -c 'while :; do /bin/true; done' > > Sometimes is not interrupted by the first ^C. (same with bash > with my patch applied). [...] Looks like FreeBSD sh doesn't address it either, ^C also fails to interrupt at times there as well. -- Stephane
Re: SIGINT handling
2015-09-21 22:07:55 +0100, Stephane Chazelas: [...] > Can you please clarify why the check for EINTR was needed? > > What do you suggest we do to fix that issue? [...] > The thing is that thread was about the opposite problem at the > other end of the spectrum so we need to find the right way to do > it so we don't cause one problem or the other. [...] OK, I get it now, that other thread was about a totally different scenario where ^C is pressed in between waitpid() returning for a normal exit and bash restoring the normal handler for SIGINT which explains the check for EINTR which is intended as a race-free check that SIGINT was received before the child died. Now, that check for EINTR is wrong as well as it introduces that other bug, so it could very well be that the only thing we can do is reduce that window above to a minimum or give up on WCE. Unless there's a clever thing that can be done in SIGCHLD and SIGINT handlers. I wonder how FreeBSD sh addresses that. BTW, ksh93 has the problem (the 2011 one) as well as in: ksh93 -c 'while :; do /bin/true; done' Sometimes is not interrupted by the first ^C. (same with bash with my patch applied). Note that the WCE/WUE was discussed in 2009 on the zsh mailing list: http://www.zsh.org/mla/workers/2009/msg00943.html where the order of delivery for SIGCHLD and SIGINT was already noted. It looks like the zsh maintainers are no big fan of WCE either. -- Stephane
Re: SIGINT handling
2015-09-21 22:24:03 +0100, Stephane Chazelas: [...] > If it didn't, we could not use it in scripts of shells that > don't do WCE *but also in non-shell scripts* (perl, python, > ruby...) or non-scripts. [...] For completeness perl's and python's system() like system(3) ignore SIGINT, so it's a WUNE (wait and unconditionaly not exit). python's subprocess.call() does "IUE" (for "immediate unconditional exit") -- Stephane
Re: SIGINT handling
2015-09-21 15:34:28 -0400, Chet Ramey: > On 9/21/15 5:48 AM, Stephane Chazelas wrote: > > > I'm not sure I prefer that WCE approach over WUE. Wouldn't it be > > preferable that applications that intercept SIGINT/QUIT/TSTP for > > anything other than clean-up before exit/suspend implement job > > control themselves instead (like vi's :! should create a process > > group and make that the foreground process group of the > > terminal so pressing ^C in sh -c vi, :!sleep 10, only sends the > > SIGINT to sleep)? > > The classic example is emacs remapping the terminal intr key to ^G > and using SIGINT as its internal abort-command signal. [...] AFAICT emacs starts a new process group (and makes it the foreground process group). UIDPID PPID PGID SID C STIME TTY TIME CMD chazelas 12232 5595 12232 12232 0 15:00 pts/13 00:00:00 /bin/zsh chazelas 13609 12232 13609 12232 0 22:14 pts/13 00:00:00 sh -c emacs; echo test chazelas 13610 13609 13610 12232 0 22:14 pts/13 00:00:00 emacs >From strace: 13766 setpgid(0, 0) = 0 13766 ioctl(3, TIOCSPGRP, [13766]) = 0 If it didn't, we could not use it in scripts of shells that don't do WCE *but also in non-shell scripts* (perl, python, ruby...) or non-scripts. A real-life problem though is things like: sh -c 'vi; echo hi' Where if you run :!sleep 10 and interrupt it with Ctrl-C, the "echo hi" is not run in shells that don't do WCE (and non-shell scripts and non-scripts that don't do it either). -- Stephane
Re: SIGINT handling
2015-09-21 15:04:46 -0400, Chet Ramey: > On 9/20/15 3:45 PM, Stephane Chazelas wrote: > > 2015-09-20 17:12:45 +0100, Stephane Chazelas: > > [...] > >> I thought the termsig_handler was being invoked upon SIGINT as > >> the SIGINT handler, but it is being called explicitely by > >> set_job_status_and_cleanup so the problem is elsewhere. > >> > >> child_caught_sigint is 0 while if I understand correctly it > >> should be 1 for a cmd that calls exit() upon SIGINT. So that's > >> probably probably where we should be looking. > > [...] > > > > I had another look. > > > > If we're to beleive gdb, child_caught_sigint is 0 because > > waitpid() returns without EINTR even though wait_sigint_received > > is 1. > > > > The only reasonable explanation I can think of is that the child > > handles its SIGINT first, exits which updates its state and > > causes bash the parent to be scheduled, and waitpid() returns > > (without EINT) and after that bash's SIGINT handler kicks in too > > late. > > Absent kernel problems, there are four scenarios for the child process > reacting to SIGINT: > > 1. The SIGINT arrives before the child begins executing. > > 2. The SIGINT arrives while the child is executing. > > 3. The SIGINT arrives while the child is exiting successfully. > > 4. The SIGINT arrives after the child has exited but before the > parent's waitpid() returns. > > In the first two cases, the shell's waitpid() should return -1, but the > first case will probably return ECHILD while the second returns EINTR. > In the third case, there's not really anything the shell can do, since > there's nothing to distinguish that case from one where the child catches > SIGINT and exits successfully, and your patch doesn't change things. > The fourth case will, in practice, be indistinguishable from the third > case, since the kernel is usually `greedy' and will not return EINTR if > there is something to report. The problem is that here the parent's SIGINT handler is run upon the return from waitpid(), just after. My patch doesn't rely on EINTR from waitpid() (which doesn't happen here, waitpid() returns with the pid of the child that did an exit() upon receiving SIGINT), just on the "status" returned by the child, so doesn't have the problem. There would still be a problem if SIGINT was handled even later (after we test for it), but I could not reproduce that. Given that the SIGCHLD should come before SIGINT, it would seem reasonable to assume SIGINT should be handled at the latest upon the return of waitpid(). Can you please clarify why the check for EINTR was needed? What do you suggest we do to fix that issue? > In all these cases, I assume that bash has called waitchld() and > waiting_for_child == 1. If it's not, the signal handler treats the > signal as it would normally, if it were not waiting for a child to > exit. Maybe the test scenario was not clear: bash -c 'cmd; echo hi' is run from an interactive shell, cmd is a long running application (the problem that sparked this discussion was with ping and I showed examples with an inline-script calling sleep) that has a handler on SIGINT that calls exit(). Upon pressing ^C, a few seconds after starting that, so at a time where bash is doing waitpid() and cmd is doing something like sleep(), the tty line discipline sends SIGINT to the foreground process group, so both bash and cmd. Now, the whole problem is caused by cmd calling exit() straight upon receiving that SIGINT. So everything happens at the same time. In the case where "hi" is not output, we have the events in this order: - SIGINT is sent to bash and cmd - cmd handles its SIGINT and calls exit() - bash's waitpid() returns without being interrupted with the child's status being 0 (in anycase not WIFSIGNALED() with SIGINT). - Straight upon return of that syscall (gdb shows a call trace for the SIGINT handler in the __waipid() libc wrapper), bash's SIGINT handler is executed. - we return from the handler in __waitpid(), and then: if (pid < 0 && errno == EINTR && wait_sigint_received) child_caught_sigint = 1; because waipid() did *not* return with EINTR, we have child_caught_sigint = 0 (even though the child clearly caught sigint as it returned with WIFEXITED), so even though wait_sigint_received is 1, bash makes the wrong decision, decides the child did not catch SIGINT and kills itself with SIGINT (and so doesn't run echo hi). Things that don't look right in the code either are things like where those conditions are asserted and tested and the handler set and reset, but then again I don't have the full picture. > > Anyway, this patch makes the problem go away for me (and > > addresses my problem #2 about exit code 130 not being treated > > as an interrupted child). It might break things though if there > > was a real reason for bash to check for waitpid()'s EINTR. > > You should read > > http://lists.gnu.org/archive/html/bug-bash/2011-02/msg00088.html > > for a
Re: SIGINT handling
On 9/21/15 5:48 AM, Stephane Chazelas wrote: > I'm not sure I prefer that WCE approach over WUE. Wouldn't it be > preferable that applications that intercept SIGINT/QUIT/TSTP for > anything other than clean-up before exit/suspend implement job > control themselves instead (like vi's :! should create a process > group and make that the foreground process group of the > terminal so pressing ^C in sh -c vi, :!sleep 10, only sends the > SIGINT to sleep)? The classic example is emacs remapping the terminal intr key to ^G and using SIGINT as its internal abort-command signal. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
On 9/20/15 3:45 PM, Stephane Chazelas wrote: > 2015-09-20 17:12:45 +0100, Stephane Chazelas: > [...] >> I thought the termsig_handler was being invoked upon SIGINT as >> the SIGINT handler, but it is being called explicitely by >> set_job_status_and_cleanup so the problem is elsewhere. >> >> child_caught_sigint is 0 while if I understand correctly it >> should be 1 for a cmd that calls exit() upon SIGINT. So that's >> probably probably where we should be looking. > [...] > > I had another look. > > If we're to beleive gdb, child_caught_sigint is 0 because > waitpid() returns without EINTR even though wait_sigint_received > is 1. > > The only reasonable explanation I can think of is that the child > handles its SIGINT first, exits which updates its state and > causes bash the parent to be scheduled, and waitpid() returns > (without EINT) and after that bash's SIGINT handler kicks in too > late. Absent kernel problems, there are four scenarios for the child process reacting to SIGINT: 1. The SIGINT arrives before the child begins executing. 2. The SIGINT arrives while the child is executing. 3. The SIGINT arrives while the child is exiting successfully. 4. The SIGINT arrives after the child has exited but before the parent's waitpid() returns. In the first two cases, the shell's waitpid() should return -1, but the first case will probably return ECHILD while the second returns EINTR. In the third case, there's not really anything the shell can do, since there's nothing to distinguish that case from one where the child catches SIGINT and exits successfully, and your patch doesn't change things. The fourth case will, in practice, be indistinguishable from the third case, since the kernel is usually `greedy' and will not return EINTR if there is something to report. In all these cases, I assume that bash has called waitchld() and waiting_for_child == 1. If it's not, the signal handler treats the signal as it would normally, if it were not waiting for a child to exit. > > Anyway, this patch makes the problem go away for me (and > addresses my problem #2 about exit code 130 not being treated > as an interrupted child). It might break things though if there > was a real reason for bash to check for waitpid()'s EINTR. You should read http://lists.gnu.org/archive/html/bug-bash/2011-02/msg00088.html for a summary of why the test for waitpid() returning -1/EINTR exists. Linus's posts, at least the ones where there's more light than heat, are good reading. > With that patch applied, > > ./bash -c 'sh -c "trap exit INT; sleep 120; :"; echo hi' > ./bash -c 'mksh -c "sleep 120; :"; echo hi' > > Does *not* output "hi" (as mksh or sh do a exit(130) which is > regarded as them being "interrupted by that SIGINT", or at least > reporting that the child they want to report the status of > (sleep) has been killed by a SIGINT). This still counts as catching and handling the SIGINT, and the shell should not act as if the foreground process died as a result of one. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
On Mon, Sep 21, 2015 at 10:48:07AM +0100, Stephane Chazelas wrote: > 2015-09-19 21:36:28 +0100, Stephane Chazelas: > > 2015-09-18 16:14:39 +0100, Stephane Chazelas: > > [...] > > > In: > > > > > > bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > > > > If I press Ctrl-C, I still see "hi". > > [...] > > > > Jilles provided with the explanation at > > http://unix.stackexchange.com/a/230731 > > with a link to: > > http://www.cons.org/cracauer/sigint.html > [...] > Note that bash (and ksh, contrary to FreeBSD sh) is not > consistent in its handling of that "WCE" (for "wait and > cooperative exit") approach in that pressing ^C in: > bash -c ' > var=$(sh -c "trap \"\" INT; sleep 3; echo result) > echo "$var" > ' > kills bash, leaving the "sh" and "sleep" running unattended in > background. > Same for: > bash -O lastpipe -c ' > sh -c "trap \"\" INT; sleep 3; echo test" | read var; echo done' > One could also argue, that to be consistent, SIGTSTP and SIGQUIT > should be treated similarly (strangely enough > http://www.cons.org/cracauer/sigint.html doesn't mention SIGTSTP). Agreed for SIGQUIT, but not for SIGTSTP. For SIGTSTP, either the shell has job control enabled or it does not. If it does, SIGTSTP stops the job and continues the shell; if it does not, SIGTSTP stops the whole job including the shell. > I'm not sure I prefer that WCE approach over WUE. Wouldn't it be > preferable that applications that intercept SIGINT/QUIT/TSTP for > anything other than clean-up before exit/suspend implement job > control themselves instead (like vi's :! should create a process > group and make that the foreground process group of the > terminal so pressing ^C in sh -c vi, :!sleep 10, only sends the > SIGINT to sleep)? This kind of job control manipulation is very hard to get right in the general case. FreeBSD's su does it, and it needed various iterations to fix hanging processes or unexpected logouts, some of which only occur when the application is started from certain shells. Also, it is not possible to fix generally cases like su SOMEUSER -c 'while sleep 0.1; do echo @@@; done' | less where there are other processes in the same process group as the one doing job control manipulations. If su changes the tty's foreground process group, it will prevent less from reconfiguring terminal modes. -- Jilles Tjoelker
Re: SIGINT handling
2015-09-21 17:35:36 +0200, Jilles Tjoelker: [...] > This kind of job control manipulation is very hard to get right in the > general case. FreeBSD's su does it, and it needed various iterations to > fix hanging processes or unexpected logouts, some of which only occur > when the application is started from certain shells. > > Also, it is not possible to fix generally cases like > su SOMEUSER -c 'while sleep 0.1; do echo @@@; done' | less > where there are other processes in the same process group as the one > doing job control manipulations. If su changes the tty's foreground > process group, it will prevent less from reconfiguring terminal modes. [...] What was the rationale for adding that to "su"? I'd have expected job control to be only done by interactive applications. -- Stephane
Re: SIGINT handling
2015-09-21 17:35:36 +0200, Jilles Tjoelker: [...] > > One could also argue, that to be consistent, SIGTSTP and SIGQUIT > > should be treated similarly (strangely enough > > http://www.cons.org/cracauer/sigint.html doesn't mention SIGTSTP). > > Agreed for SIGQUIT, but not for SIGTSTP. For SIGTSTP, either the shell > has job control enabled or it does not. If it does, SIGTSTP stops the > job and continues the shell; if it does not, SIGTSTP stops the whole job > including the shell. [...] Note sure what you mean, we may not be talking of the same thing. What I meant: In: sh -c '(trap "" INT; sleep 10); echo done' If you send ^C, nothing happens. In: sh -c '(trap "" TSTP; sleep 10); echo done' If you press ^Z, the "sh" is suspended, but "sleep" keeps running in background. One could argue they should be treated the same (that sh shouldn't suspend itself if the process it's currently waiting for has not been suspended, just like for ^C it should not die of SIGINT if the process it's currently waiting for has not died of SIGINT). You may be talking of: sh -mc 'sleep 10; echo "$?"' where SIGINT upon ^C, SIGTSTP upon ^Z is sent to sleep only (not sh). -- Stephane
Re: SIGINT handling
2015-09-19 21:36:28 +0100, Stephane Chazelas: > 2015-09-18 16:14:39 +0100, Stephane Chazelas: > [...] > > In: > > > > bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > > > If I press Ctrl-C, I still see "hi". > [...] > > Jilles provided with the explanation at > http://unix.stackexchange.com/a/230731 > > with a link to: > http://www.cons.org/cracauer/sigint.html [...] Note that bash (and ksh, contrary to FreeBSD sh) is not consistent in its handling of that "WCE" (for "wait and cooperative exit") approach in that pressing ^C in: bash -c ' var=$(sh -c "trap \"\" INT; sleep 3; echo result) echo "$var" ' kills bash, leaving the "sh" and "sleep" running unattended in background. Same for: bash -O lastpipe -c ' sh -c "trap \"\" INT; sleep 3; echo test" | read var; echo done' One could also argue, that to be consistent, SIGTSTP and SIGQUIT should be treated similarly (strangely enough http://www.cons.org/cracauer/sigint.html doesn't mention SIGTSTP). I'm not sure I prefer that WCE approach over WUE. Wouldn't it be preferable that applications that intercept SIGINT/QUIT/TSTP for anything other than clean-up before exit/suspend implement job control themselves instead (like vi's :! should create a process group and make that the foreground process group of the terminal so pressing ^C in sh -c vi, :!sleep 10, only sends the SIGINT to sleep)? -- Stephane
Re: SIGINT handling
2015-09-20 17:12:45 +0100, Stephane Chazelas: [...] > I thought the termsig_handler was being invoked upon SIGINT as > the SIGINT handler, but it is being called explicitely by > set_job_status_and_cleanup so the problem is elsewhere. > > child_caught_sigint is 0 while if I understand correctly it > should be 1 for a cmd that calls exit() upon SIGINT. So that's > probably probably where we should be looking. [...] I had another look. If we're to beleive gdb, child_caught_sigint is 0 because waitpid() returns without EINTR even though wait_sigint_received is 1. The only reasonable explanation I can think of is that the child handles its SIGINT first, exits which updates its state and causes bash the parent to be scheduled, and waitpid() returns (without EINT) and after that bash's SIGINT handler kicks in too late. Anyway, this patch makes the problem go away for me (and addresses my problem #2 about exit code 130 not being treated as an interrupted child). It might break things though if there was a real reason for bash to check for waitpid()'s EINTR. With that patch applied, ./bash -c 'sh -c "trap exit INT; sleep 120; :"; echo hi' ./bash -c 'mksh -c "sleep 120; :"; echo hi' Does *not* output "hi" (as mksh or sh do a exit(130) which is regarded as them being "interrupted by that SIGINT", or at least reporting that the child they want to report the status of (sleep) has been killed by a SIGINT). And ./bash -c 'sh -c "trap exit\ 0 INT; sleep 120; :"; echo hi' *consistently* outputs "hi" (the zero exit status cancels the aborting of bash). --- jobs.c~ 2015-09-20 20:03:14.692119372 +0100 +++ jobs.c 2015-09-20 20:37:01.510892045 +0100 @@ -3257,21 +3257,15 @@ itrace("waitchld: waitpid returns %d blo CHECK_TERMSIG; CHECK_WAIT_INTR; - /* If waitpid returns -1/EINTR and the shell saw a SIGINT, then we -assume the child has blocked or handled SIGINT. In that case, we -require the child to actually die due to SIGINT to act on the -SIGINT we received; otherwise we assume the child handled it and -let it go. */ - if (pid < 0 && errno == EINTR && wait_sigint_received) - child_caught_sigint = 1; - if (pid <= 0) continue; /* jumps right to the test */ - /* If the child process did die due to SIGINT, forget our assumption -that it caught or otherwise handled it. */ - if (WIFSIGNALED (status) && WTERMSIG (status) == SIGINT) -child_caught_sigint = 0; + /* If we received a SIGINT, but the child did not die of a SIGINT and + did not report a 128+SIGINT exit status, we assume the child handled + it and let it go. */ + child_caught_sigint = wait_sigint_received && + ! ((WIFSIGNALED (status) && WTERMSIG (status) == SIGINT) || + (WIFEXITED (status) && WEXITSTATUS (status) == 128 + SIGINT)); /* children_exited is used to run traps on SIGCHLD. We don't want to run the trap if a process is just being continued. */ -- Stephane
Re: SIGINT handling
[...] > When the above code exits without printing "hi", we see this > call stack for instance (breakpoint on kill() in gdb): > > #0 kill () at ../sysdeps/unix/syscall-template.S:81 > #1 0x0045dd8e in termsig_handler (sig=) at sig.c:588 > #2 0x0045ddef in termsig_handler (sig=) at sig.c:554 > #3 0x004466bb in set_job_status_and_cleanup (job=0) at jobs.c:3539 > #4 waitchld (block=block@entry=1, wpid=20802) at jobs.c:3316 > #5 0x0044733b in wait_for (pid=20802) at jobs.c:2485 > #6 0x00437992 in execute_command_internal > (command=command@entry=0x70aa48, asynchronous=asynchronous@entry=0, > pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, > fds_to_close=fds_to_close@entry=0x70bb68) at execute_cmd.c:829 > #7 0x00437b0e in execute_command (command=0x70aa48) at > execute_cmd.c:390 > #8 0x00435f23 in execute_connection (fds_to_close=0x70bb48, > pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bb08) at > execute_cmd.c:2494 > #9 execute_command_internal (command=0x70bb08, > asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, > pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb48) > at execute_cmd.c:945 > #10 0x0047955b in parse_and_execute (string=, > from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at > evalstring.c:387 > #11 0x004205d7 in run_one_command (command=) at > shell.c:1348 > #12 0x0041f524 in main (argc=3, argv=0x7fffe198, > env=0x7fffe1b8) at shell.c:695 > > That is, SIGINT is being handled *after* the SIGINT handler has > been restored to its default of exiting the shell. [...] Sorry, please disregard that. I thought the termsig_handler was being invoked upon SIGINT as the SIGINT handler, but it is being called explicitely by set_job_status_and_cleanup so the problem is elsewhere. child_caught_sigint is 0 while if I understand correctly it should be 1 for a cmd that calls exit() upon SIGINT. So that's probably probably where we should be looking. -- Stephane
Re: SIGINT handling
2015-09-19 21:28:24 -0400, Chet Ramey: > On 9/19/15 5:31 PM, Stephane Chazelas wrote: > > 2015-09-19 16:42:28 -0400, Chet Ramey: > > [...] > >> I'm surprised you've managed to avoid the dozen or so discussions on the > >> topic. > >> > >> http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html > > [...] > > > > Thanks for the links. I still think the comments on the second > > article I sent > > (http://thread.gmane.org/gmane.comp.shells.bash.bugs/24178/focus=24183) > > still hold though and from a quick read I don't see those points > > being mentioned in the past discussions (but that was a quick > > read). > > > > I notice that you mention the race conditions have been fixed, > > but I'm still seeing some non-deterministic behaviour. > > I can't reproduce this on Mac OS X and RHEL 6 and 7, the systems I have > readily available today. > > The shell notes when it sees SIGINT and whether or not waitpid returns > -1/EINTR. If the sleep exits due to SIGINT, even after the waitpid > returns -1, the shell assumes it didn't catch and handle the SIGINT and > the shell calls the trap handler. [...] To clarify, In bash -c 'sh -c "trap exit INT; sleep 99; :"; echo hi' The command under test is "bash", not "sh". The "sh" is just there as a cmd that does exit() upon receiving SIGINT. It's just: bash -c 'cmd; echo hi' You can replace "cmd" with: perl -e '$SIG{INT}= sub{exit}; sleep' (or mksh -c 'sleep 10; :' (which does an exit(130) upon receiving SIGINT)) The problem here is that when you press CTRL-C, SIGINT is sent to all the processes in the process group, so to "bash" and "cmd". Now, bash works as expected only if it handles its own SIGINT before the child has caught its own one and exited. When the above code exits without printing "hi", we see this call stack for instance (breakpoint on kill() in gdb): #0 kill () at ../sysdeps/unix/syscall-template.S:81 #1 0x0045dd8e in termsig_handler (sig=) at sig.c:588 #2 0x0045ddef in termsig_handler (sig=) at sig.c:554 #3 0x004466bb in set_job_status_and_cleanup (job=0) at jobs.c:3539 #4 waitchld (block=block@entry=1, wpid=20802) at jobs.c:3316 #5 0x0044733b in wait_for (pid=20802) at jobs.c:2485 #6 0x00437992 in execute_command_internal (command=command@entry=0x70aa48, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb68) at execute_cmd.c:829 #7 0x00437b0e in execute_command (command=0x70aa48) at execute_cmd.c:390 #8 0x00435f23 in execute_connection (fds_to_close=0x70bb48, pipe_out=-1, pipe_in=-1, asynchronous=0, command=0x70bb08) at execute_cmd.c:2494 #9 execute_command_internal (command=0x70bb08, asynchronous=asynchronous@entry=0, pipe_in=pipe_in@entry=-1, pipe_out=pipe_out@entry=-1, fds_to_close=fds_to_close@entry=0x70bb48) at execute_cmd.c:945 #10 0x0047955b in parse_and_execute (string=, from_file=from_file@entry=0x4b5f96 "-c", flags=flags@entry=4) at evalstring.c:387 #11 0x004205d7 in run_one_command (command=) at shell.c:1348 #12 0x0041f524 in main (argc=3, argv=0x7fffe198, env=0x7fffe1b8) at shell.c:695 That is, SIGINT is being handled *after* the SIGINT handler has been restored to its default of exiting the shell. Now, I'm not sure how to best fix that as I suppose we don't get any guarantee of when SIGINT will be delivered (it may be why ksh93 ignores SIGINT altogether and relies solely on WIFSIGNALED) The above scenario suggests SIGCHLD is being delivered before SIGINT which is strange. I'd expect SIGINT to be inserted by the kernel in both cmd and bash queues upon CTRL-C, and the SIGCHLD would necesarily come after those SIGINT. Could it be that SIGCHLD jumps the queue? Note that I'm not seeing that as often on every system. It seems I can make it more likely by making the system busier. -- Stephane
Re: SIGINT handling
On 9/19/15 5:31 PM, Stephane Chazelas wrote: > 2015-09-19 16:42:28 -0400, Chet Ramey: > [...] >> I'm surprised you've managed to avoid the dozen or so discussions on the >> topic. >> >> http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html > [...] > > Thanks for the links. I still think the comments on the second > article I sent > (http://thread.gmane.org/gmane.comp.shells.bash.bugs/24178/focus=24183) > still hold though and from a quick read I don't see those points > being mentioned in the past discussions (but that was a quick > read). > > I notice that you mention the race conditions have been fixed, > but I'm still seeing some non-deterministic behaviour. I can't reproduce this on Mac OS X and RHEL 6 and 7, the systems I have readily available today. The shell notes when it sees SIGINT and whether or not waitpid returns -1/EINTR. If the sleep exits due to SIGINT, even after the waitpid returns -1, the shell assumes it didn't catch and handle the SIGINT and the shell calls the trap handler. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
2015-09-19 16:42:28 -0400, Chet Ramey: [...] > I'm surprised you've managed to avoid the dozen or so discussions on the > topic. > > http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html [...] Thanks for the links. I still think the comments on the second article I sent (http://thread.gmane.org/gmane.comp.shells.bash.bugs/24178/focus=24183) still hold though and from a quick read I don't see those points being mentioned in the past discussions (but that was a quick read). I notice that you mention the race conditions have been fixed, but I'm still seeing some non-deterministic behaviour. In case it was caused by some Debian patch, I recompiled the code of 4.3.42 from gnu.org and the one from the devel branch on the git repository (commit bash-20150911 snapshot) and still: $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' ^Chi $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' ^Chi $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' ^C $ ./bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' ^Chi Sometimes (and the frequency of occurrences is erratic, generally roughly 80% of "hi"s but at times, I don't see a "hi" in a while), the "hi" doesn't show up. Note that I press ^C well after sleep has started. On Linux 4.1.0-1-amd64 core2 duo, bashcompiled with gcc (Debian 5.2.1-16) 5.2.1 20150903 linked with GNU C Library (Debian GLIBC 2.19-19) stable release version 2.19, by Roland McGrath et al. Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.8.5. Compiled on a Linux 4.0.7 system on 2015-07-09. Available extensions: crypt add-on version 2.1 by Michael Glad and others GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B libc ABIs: UNIQUE IFUNC -- Stephane
Re: SIGINT handling
On 9/18/15 11:14 AM, Stephane Chazelas wrote: > Hello. > > In: > > bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > If I press Ctrl-C, I still see "hi". > > On Solaris with 4.1.11(2)-release (i386-pc-solaris2.11), that > seems to be consistent. > > On Debian with 4.3.42(1)-release (x86_64-pc-linux-gnu), that > seems to happen only in something like 80% of the time. > > For bash to exit upon receiving that SIGINT, the currently > running process has to die itself as well of SIGINT (or the > currently running command to be builtin). > > That sounds like a bad idea, especially considering that it > doesn't exit either if the process returns with exit code 130 > upon receiving that SIGINT. For instance: > > For instance, in: > > bash -c 'mksh -c "sleep 10; :"; echo hi' > > Upon pressing Ctrl-C, mksh handles the SIGINT and exits with > 130 (as opposed to dying of a SIGINT), so bash doesn't exit > (sometimes only on Debian). I'm surprised you've managed to avoid the dozen or so discussions on the topic. http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: SIGINT handling
2015-09-18 16:14:39 +0100, Stephane Chazelas: [...] > In: > > bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' > > If I press Ctrl-C, I still see "hi". [...] Jilles provided with the explanation at http://unix.stackexchange.com/a/230731 with a link to: http://www.cons.org/cracauer/sigint.html Which makes sense. Now, IMO a few things could be improved: 1- it would be nice if it could be clearly documented 2- if the shell received SIGINT, then I'd argue the currently running process returning with a "status" such that WIFEXITED(status)&& WEXITSTATUS(status) == SIGINT + 0200 should be another case where bash (and AT&T ksh and FreeBSD sh) should exit as well (by killing themselves with SIGINT or exit(SIGINT + 0200)). That's my: > That sounds like a bad idea, especially considering that it > doesn't exit either if the process returns with exit code 130 > upon receiving that SIGINT. For instance: > > For instance, in: > > bash -c 'mksh -c "sleep 10; :"; echo hi' > > Upon pressing Ctrl-C, mksh handles the SIGINT and exits with > 130 (as opposed to dying of a SIGINT), so bash doesn't exit > (sometimes only on Debian). 3. There still seems to be a bug in bash in that > On Debian with 4.3.42(1)-release (x86_64-pc-linux-gnu), that > seems to happen only in something like 80% of the time. Cheers, Stephane
SIGINT handling
Hello. In: bash -c 'sh -c "trap exit INT; sleep 10; :"; echo hi' If I press Ctrl-C, I still see "hi". On Solaris with 4.1.11(2)-release (i386-pc-solaris2.11), that seems to be consistent. On Debian with 4.3.42(1)-release (x86_64-pc-linux-gnu), that seems to happen only in something like 80% of the time. For bash to exit upon receiving that SIGINT, the currently running process has to die itself as well of SIGINT (or the currently running command to be builtin). That sounds like a bad idea, especially considering that it doesn't exit either if the process returns with exit code 130 upon receiving that SIGINT. For instance: For instance, in: bash -c 'mksh -c "sleep 10; :"; echo hi' Upon pressing Ctrl-C, mksh handles the SIGINT and exits with 130 (as opposed to dying of a SIGINT), so bash doesn't exit (sometimes only on Debian). ksh93 seems to be doing something similar (even worse). http://unix.stackexchange.com/a/230568/22565 Why? What's the rational behind that. It seems it's not documented and contradicts the documentation. -- Stephane