Re: Possible race condition in bash
On 2/3/21 7:54 AM, Nikolay Borisov wrote: Have you had a chance to look into this? There is a fix in the devel branch. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: Possible race condition in bash
On 21.11.20 г. 23:15 ч., Chet Ramey wrote: > On 11/21/20 2:32 PM, Andreas Schwab wrote: >> On Nov 21 2020, Chet Ramey wrote: >> >>> but since the shell always ignores SIGTERM, >> >> Even a non-interactive shell? > > No, you're right, it's SIGQUIT the shell always ignores. It catches SIGTERM > if there is an EXIT trap so it can run the trap on EXIT before exiting. > >>> I'd also try it with one of the bash-5.1 testing releases, since I did >>> some reworking of how the shell handles SIGTERM back in April. >> >> 5.1-rc3 has the same issue. > > OK, I instrumented bash a little bit, and discovered that the second > child doesn't start running, or at least doesn't get to the point in the > subshell initialization where it resets the signal handlers, until after > the parent sends SIGTERM. > > That means the trap handler is still set to catch SIGTERM by the time > the signal arrives. However, the child is not allowed to run the trap > handler; it's supposed to reset the dispositions to the default. This > is the race condition. > > It's a dilemma. I could block SIGTERM (all trapped signals, really) until > the child resets the signal dispositions, but that seems like a long time > to keep signals blocked. I'll look at it some more after bash-5.1 is > released. Hello, Have you had a chance to look into this? > > Chet >
Re: Possible race condition in bash
On 11/21/20 1:15 PM, Chet Ramey wrote: On 11/21/20 2:32 PM, Andreas Schwab wrote: On Nov 21 2020, Chet Ramey wrote: but since the shell always ignores SIGTERM, Even a non-interactive shell? No, you're right, it's SIGQUIT the shell always ignores. It catches SIGTERM if there is an EXIT trap so it can run the trap on EXIT before exiting. I'd also try it with one of the bash-5.1 testing releases, since I did some reworking of how the shell handles SIGTERM back in April. 5.1-rc3 has the same issue. OK, I instrumented bash a little bit, and discovered that the second child doesn't start running, or at least doesn't get to the point in the subshell initialization where it resets the signal handlers, until after the parent sends SIGTERM. That means the trap handler is still set to catch SIGTERM by the time the signal arrives. However, the child is not allowed to run the trap handler; it's supposed to reset the dispositions to the default. This is the race condition. It's a dilemma. I could block SIGTERM (all trapped signals, really) until the child resets the signal dispositions, but that seems like a long time to keep signals blocked. I'll look at it some more after bash-5.1 is released. Plenty of other programs keep signals blocked until reset. I don't see the problem with the delay: if the SIGTERM is ignored, the delay between SIGTERM and kill is infinite. :-) Since the delay between process-kill and actual process death can be arbitrarily long anyway (that's the contract with the kernel), I don't see a problem with the blocking.
Re: Possible race condition in bash
On 11/21/20 2:32 PM, Andreas Schwab wrote: > On Nov 21 2020, Chet Ramey wrote: > >> but since the shell always ignores SIGTERM, > > Even a non-interactive shell? No, you're right, it's SIGQUIT the shell always ignores. It catches SIGTERM if there is an EXIT trap so it can run the trap on EXIT before exiting. >> I'd also try it with one of the bash-5.1 testing releases, since I did >> some reworking of how the shell handles SIGTERM back in April. > > 5.1-rc3 has the same issue. OK, I instrumented bash a little bit, and discovered that the second child doesn't start running, or at least doesn't get to the point in the subshell initialization where it resets the signal handlers, until after the parent sends SIGTERM. That means the trap handler is still set to catch SIGTERM by the time the signal arrives. However, the child is not allowed to run the trap handler; it's supposed to reset the dispositions to the default. This is the race condition. It's a dilemma. I could block SIGTERM (all trapped signals, really) until the child resets the signal dispositions, but that seems like a long time to keep signals blocked. I'll look at it some more after bash-5.1 is released. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: Possible race condition in bash
On Nov 21 2020, Chet Ramey wrote: > but since the shell always ignores SIGTERM, Even a non-interactive shell? > I'd also try it with one of the bash-5.1 testing releases, since I did > some reworking of how the shell handles SIGTERM back in April. 5.1-rc3 has the same issue. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: Possible race condition in bash
On 21.11.20 г. 20:50 ч., Chet Ramey wrote: > On 11/21/20 1:35 PM, Nikolay Borisov wrote: > >> I can see setting of SIGTERM handler for both 2 subshells _after_ receiving >> the signal. What exactly should I be looking at? > > That's your race condition. > So the kernel initializes the signal struct of the child in copy_sighand which does: memcpy(sig->action, current->sighand->action, sizeof(sig->action)); At this point the child has the same handlers as the ones in the parent. As per the posix deifnition of how the environment is initialized those signals shall be restored to SIG_DFL. HOwever I can see that the signals are being delivered to the children _before_ they reset signals to SIG_DFL: 12394 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=12392, si_uid=0} --- 12393 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=12392, si_uid=0} --- 12394 rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12393 rt_sigprocmask(SIG_SETMASK, [], 12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12393 <... rt_sigprocmask resumed> NULL, 8) = 0 12394 rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12393 rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12394 rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12393 rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12394 rt_sigaction(SIGHUP, {sa_handler=0x55b5e4479d80, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12393 rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12394 <... rt_sigaction resumed> {sa_handler=0x55b5e4476400, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12394 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, {sa_handler=0x55b5e4476400, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12394 rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, So shouldn't the child execute the trap of the parent in this case?
Re: Possible race condition in bash
On 21.11.20 г. 20:50 ч., Chet Ramey wrote: > On 11/21/20 1:35 PM, Nikolay Borisov wrote: > >> I can see setting of SIGTERM handler for both 2 subshells _after_ receiving >> the signal. What exactly should I be looking at? > > That's your race condition. > But shouldn't the default environment not ignore this signal? ALso what would be the correct way to synchronize ?
Re: Possible race condition in bash
On 11/21/20 1:35 PM, Nikolay Borisov wrote: > I can see setting of SIGTERM handler for both 2 subshells _after_ receiving > the signal. What exactly should I be looking at? That's your race condition. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: Possible race condition in bash
On 11/21/20 1:25 PM, Andreas Schwab wrote: > On Nov 21 2020, Chet Ramey wrote: > >> This is a potential race condition, but it's not with the shell. > > The race with the trap command in the function is obvious, but that > should not result in the signal to be ignored. There should always be a > trap active, either the main one or the one in the function, and neither > ignores the signal. No. "A subshell environment shall be created as a duplicate of the shell environment, except that signal traps that are not being ignored shall be set to the default action." In this case, bash resets the signal disposition to the what it had when the shell started. That should be SIG_DFL, but since the shell always ignores SIGTERM, it's not clear from the description here what the signal handler gets set to. That's why I suggested looking at strace output. I'd also try it with one of the bash-5.1 testing releases, since I did some reworking of how the shell handles SIGTERM back in April. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: Possible race condition in bash
On 21.11.20 г. 20:09 ч., Chet Ramey wrote: > On 11/21/20 3:06 AM, Nikolay Borisov wrote: > >> The output is: >> >> my pid 12186 >> 12186 subfun xxx 0 >> funpid=12186 twopid=12187 mypid=12185 >> killing 12186 12187 >> waiting on everything >> my pid 12187 >> 12187 subfun yyy 0 >> received term for xxx >> 12187 subfun yyy 1 >> 12187 subfun yyy 2 >> 12187 subfun yyy 3 >> >> >> and 12187 keeps printing. It would seem there $! can return a pid of the >> subshell before it's in a state that it can be killed. > > This is a potential race condition, but it's not with the shell. The > parent shell sets $! as soon as the child pid returns from fork(), and > that doesn't have anything to do with when the child gets scheduled or > runs. You might have better luck running this under strace or truss and > seeing what happens with the timing: whether the child sets SIGTERM to > the trap handler before the signal arrives, or whether the child resets > the SIGTERM handler to what the parent had before the trap before the > signal arrives. So I straced the execution of a run which hangs (keeps printing) and here is what I see (full log can be found at https://termbin.com/umhnv): 12392 write(1, "funpid=12393 twopid=12394 mypid="..., 38) = 38 12392 write(1, "killing 12393 12394\n", 20) = 20 12392 kill(12393, SIGTERM) = 0 12392 kill(12394, SIGTERM) = 0 12392 write(1, "waiting on everything\n", 22) = 22 So 12392 is the pid of the main shell which spawns 2 subshells with pids 12393 and 12394, sends SIGTERM to them. And they seem to receive the signal but ignore it: 2392 wait4(-1, 12394 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=12392, si_uid=0} --- 12393 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=12392, si_uid=0} --- 12394 rt_sigreturn({mask=[INT CHLD]} 12393 rt_sigreturn({mask=[INT CHLD]} 12394 <... rt_sigreturn resumed> ) = 0 12393 <... rt_sigreturn resumed> ) = 0 12394 getpid() = 12394 12393 getpid( 12394 close(255 12393 <... getpid resumed> )= 12393 12394 <... close resumed> ) = 0 12394 rt_sigprocmask(SIG_SETMASK, [], 12393 close(255 12394 <... rt_sigprocmask resumed> NULL, 8) = 0 12393 <... close resumed> ) = 0 12394 rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12393 rt_sigprocmask(SIG_SETMASK, [], 12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12393 <... rt_sigprocmask resumed> NULL, 8) = 0 12394 rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12393 rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12394 rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12393 rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12394 rt_sigaction(SIGHUP, {sa_handler=0x55b5e4479d80, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12393 rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 12394 <... rt_sigaction resumed> {sa_handler=0x55b5e4476400, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0 12394 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, {sa_handler=0x55b5e4476400, sa
Re: Possible race condition in bash
On Nov 21 2020, Chet Ramey wrote: > This is a potential race condition, but it's not with the shell. The race with the trap command in the function is obvious, but that should not result in the signal to be ignored. There should always be a trap active, either the main one or the one in the function, and neither ignores the signal. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: Possible race condition in bash
On 11/21/20 3:06 AM, Nikolay Borisov wrote: > The output is: > > my pid 12186 > 12186 subfun xxx 0 > funpid=12186 twopid=12187 mypid=12185 > killing 12186 12187 > waiting on everything > my pid 12187 > 12187 subfun yyy 0 > received term for xxx > 12187 subfun yyy 1 > 12187 subfun yyy 2 > 12187 subfun yyy 3 > > > and 12187 keeps printing. It would seem there $! can return a pid of the > subshell before it's in a state that it can be killed. This is a potential race condition, but it's not with the shell. The parent shell sets $! as soon as the child pid returns from fork(), and that doesn't have anything to do with when the child gets scheduled or runs. You might have better luck running this under strace or truss and seeing what happens with the timing: whether the child sets SIGTERM to the trap handler before the signal arrives, or whether the child resets the SIGTERM handler to what the parent had before the trap before the signal arrives. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Possible race condition in bash
Hello, It's possible to have the following script never terminate: #!/bin/bash subfun() { trap "echo received term for $*;exit 1" SIGTERM echo my pid $BASHPID i=0 while true; do echo $BASHPID subfun $* $i i=$((i+1)) sleep 5 done } trap 'echo main; exit 1' 0 1 2 3 15 subfun xxx & funpid=$! subfun yyy & twopid=$! echo funpid=$funpid twopid=$twopid mypid=$$ echo killing $funpid $twopid kill $funpid $twopid echo waiting on everything wait echo done The output is: my pid 12186 12186 subfun xxx 0 funpid=12186 twopid=12187 mypid=12185 killing 12186 12187 waiting on everything my pid 12187 12187 subfun yyy 0 received term for xxx 12187 subfun yyy 1 12187 subfun yyy 2 12187 subfun yyy 3 and 12187 keeps printing. It would seem there $! can return a pid of the subshell before it's in a state that it can be killed. This has to do with setting of the trap handler, because if the trap handler of the main shell is changed to : trap 'echo main; exit 1' EXIT - Then it never hangs. I.e the more trap handlers that subshells have to reset to their defaults the higher the chances this race condition is hit. Also if I change the sleep in the subshell function to anything below 5 I also seem to be unable to hit it. Also sometimes when it doesn't hang I get the following output: my pid 12283 12283 subfun xxx 0 funpid=12283 twopid=12284 mypid=12282 killing 12283 12284 waiting on everything received term for xxx done main It misses a "Received term for yyy" line. I reproduced this behavior both with version 4.4.20(1)-release (that comes with ubuntu 18.04) as well as 5.0.18(1)-release built from source. Any ideas what might be causing it?