Re: Possible race condition in bash

2021-02-03 Thread Chet Ramey

On 2/3/21 7:54 AM, Nikolay Borisov wrote:


Have you had a chance to look into this?


There is a fix in the devel branch.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Possible race condition in bash

2021-02-03 Thread Nikolay Borisov



On 21.11.20 г. 23:15 ч., Chet Ramey wrote:
> On 11/21/20 2:32 PM, Andreas Schwab wrote:
>> On Nov 21 2020, Chet Ramey wrote:
>>
>>> but since the shell always ignores SIGTERM,
>>
>> Even a non-interactive shell?
> 
> No, you're right, it's SIGQUIT the shell always ignores. It catches SIGTERM
> if there is an EXIT trap so it can run the trap on EXIT before exiting.
> 
>>> I'd also try it with one of the bash-5.1 testing releases, since I did
>>> some reworking of how the shell handles SIGTERM back in April.
>>
>> 5.1-rc3 has the same issue.
> 
> OK, I instrumented bash a little bit, and discovered that the second
> child doesn't start running, or at least doesn't get to the point in the
> subshell initialization where it resets the signal handlers, until after
> the parent sends SIGTERM.
> 
> That means the trap handler is still set to catch SIGTERM by the time
> the signal arrives. However, the child is not allowed to run the trap
> handler; it's supposed to reset the dispositions to the default. This
> is the race condition.
> 
> It's a dilemma. I could block SIGTERM (all trapped signals, really) until
> the child resets the signal dispositions, but that seems like a long time
> to keep signals blocked. I'll look at it some more after bash-5.1 is
> released.

Hello,

Have you had a chance to look into this?

> 
> Chet
> 



Re: Possible race condition in bash

2020-11-21 Thread Daniel Colascione

On 11/21/20 1:15 PM, Chet Ramey wrote:

On 11/21/20 2:32 PM, Andreas Schwab wrote:

On Nov 21 2020, Chet Ramey wrote:


but since the shell always ignores SIGTERM,


Even a non-interactive shell?


No, you're right, it's SIGQUIT the shell always ignores. It catches SIGTERM
if there is an EXIT trap so it can run the trap on EXIT before exiting.


I'd also try it with one of the bash-5.1 testing releases, since I did
some reworking of how the shell handles SIGTERM back in April.


5.1-rc3 has the same issue.


OK, I instrumented bash a little bit, and discovered that the second
child doesn't start running, or at least doesn't get to the point in the
subshell initialization where it resets the signal handlers, until after
the parent sends SIGTERM.

That means the trap handler is still set to catch SIGTERM by the time
the signal arrives. However, the child is not allowed to run the trap
handler; it's supposed to reset the dispositions to the default. This
is the race condition.

It's a dilemma. I could block SIGTERM (all trapped signals, really) until
the child resets the signal dispositions, but that seems like a long time
to keep signals blocked. I'll look at it some more after bash-5.1 is
released.


Plenty of other programs keep signals blocked until reset. I don't see 
the problem with the delay: if the SIGTERM is ignored, the delay between 
SIGTERM and kill is infinite. :-) Since the delay between process-kill 
and actual process death can be arbitrarily long anyway (that's the 
contract with the kernel), I don't see a problem with the blocking.




Re: Possible race condition in bash

2020-11-21 Thread Chet Ramey
On 11/21/20 2:32 PM, Andreas Schwab wrote:
> On Nov 21 2020, Chet Ramey wrote:
> 
>> but since the shell always ignores SIGTERM,
> 
> Even a non-interactive shell?

No, you're right, it's SIGQUIT the shell always ignores. It catches SIGTERM
if there is an EXIT trap so it can run the trap on EXIT before exiting.

>> I'd also try it with one of the bash-5.1 testing releases, since I did
>> some reworking of how the shell handles SIGTERM back in April.
> 
> 5.1-rc3 has the same issue.

OK, I instrumented bash a little bit, and discovered that the second
child doesn't start running, or at least doesn't get to the point in the
subshell initialization where it resets the signal handlers, until after
the parent sends SIGTERM.

That means the trap handler is still set to catch SIGTERM by the time
the signal arrives. However, the child is not allowed to run the trap
handler; it's supposed to reset the dispositions to the default. This
is the race condition.

It's a dilemma. I could block SIGTERM (all trapped signals, really) until
the child resets the signal dispositions, but that seems like a long time
to keep signals blocked. I'll look at it some more after bash-5.1 is
released.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Possible race condition in bash

2020-11-21 Thread Andreas Schwab
On Nov 21 2020, Chet Ramey wrote:

> but since the shell always ignores SIGTERM,

Even a non-interactive shell?

> I'd also try it with one of the bash-5.1 testing releases, since I did
> some reworking of how the shell handles SIGTERM back in April.

5.1-rc3 has the same issue.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: Possible race condition in bash

2020-11-21 Thread Nikolay Borisov



On 21.11.20 г. 20:50 ч., Chet Ramey wrote:
> On 11/21/20 1:35 PM, Nikolay Borisov wrote:
> 
>> I can see setting of SIGTERM handler for both 2 subshells _after_ receiving 
>> the signal. What exactly should I be looking at?
> 
> That's your race condition.
> 

So the kernel initializes the signal struct of the child in copy_sighand which 
does: 


memcpy(sig->action, current->sighand->action, sizeof(sig->action));

At this point the child has the same handlers as the ones in the parent. 
As per the posix deifnition of how the environment is initialized those signals 
shall be restored to SIG_DFL. HOwever I can see that the signals are being 
delivered 
to the children _before_ they reset signals to SIG_DFL: 

12394 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=12392, si_uid=0} 
---
12393 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=12392, si_uid=0} 
---

 
12394 rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12393 rt_sigprocmask(SIG_SETMASK, [],   
12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12393 <... rt_sigprocmask resumed> NULL, 8) = 0 
12394 rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12393 rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12394 rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12393 rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12394 rt_sigaction(SIGHUP, {sa_handler=0x55b5e4479d80, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0},   
12393 rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12394 <... rt_sigaction resumed> {sa_handler=0x55b5e4476400, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12394 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0},  
{sa_handler=0x55b5e4476400, sa_mask=[], sa_flags=SA_RESTORER, 
sa_restorer=0x7f8c881fdfd0}, 8) = 0
12394 rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 



So shouldn't the child execute the trap of the parent in this case?



Re: Possible race condition in bash

2020-11-21 Thread Nikolay Borisov



On 21.11.20 г. 20:50 ч., Chet Ramey wrote:
> On 11/21/20 1:35 PM, Nikolay Borisov wrote:
> 
>> I can see setting of SIGTERM handler for both 2 subshells _after_ receiving 
>> the signal. What exactly should I be looking at?
> 
> That's your race condition.
> 

But shouldn't the default environment not ignore this signal? ALso what
would be the correct way to synchronize ?



Re: Possible race condition in bash

2020-11-21 Thread Chet Ramey
On 11/21/20 1:35 PM, Nikolay Borisov wrote:

> I can see setting of SIGTERM handler for both 2 subshells _after_ receiving 
> the signal. What exactly should I be looking at?

That's your race condition.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Possible race condition in bash

2020-11-21 Thread Chet Ramey
On 11/21/20 1:25 PM, Andreas Schwab wrote:
> On Nov 21 2020, Chet Ramey wrote:
> 
>> This is a potential race condition, but it's not with the shell.
> 
> The race with the trap command in the function is obvious, but that
> should not result in the signal to be ignored.  There should always be a
> trap active, either the main one or the one in the function, and neither
> ignores the signal.

No.

"A subshell environment shall be created as a duplicate of the shell
environment, except that signal traps that are not being ignored shall be
set to the default action."

In this case, bash resets the signal disposition to the what it had when
the shell started. That should be SIG_DFL, but since the shell always
ignores SIGTERM, it's not clear from the description here what the signal
handler gets set to. That's why I suggested looking at strace output.

I'd also try it with one of the bash-5.1 testing releases, since I did
some reworking of how the shell handles SIGTERM back in April.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Possible race condition in bash

2020-11-21 Thread Nikolay Borisov



On 21.11.20 г. 20:09 ч., Chet Ramey wrote:
> On 11/21/20 3:06 AM, Nikolay Borisov wrote:
> 
>> The output is:
>>
>> my pid 12186
>> 12186 subfun xxx 0
>> funpid=12186 twopid=12187 mypid=12185
>> killing 12186 12187
>> waiting on everything
>> my pid 12187
>> 12187 subfun yyy 0
>> received term for xxx
>> 12187 subfun yyy 1
>> 12187 subfun yyy 2
>> 12187 subfun yyy 3
>>
>>
>> and 12187 keeps printing. It would seem there $! can return a pid of the
>> subshell before it's in a state that it can be killed. 
> 
> This is a potential race condition, but it's not with the shell. The
> parent shell sets $! as soon as the child pid returns from fork(), and
> that doesn't have anything to do with when the child gets scheduled or
> runs. You might have better luck running this under strace or truss and
> seeing what happens with the timing: whether the child sets SIGTERM to
> the trap handler before the signal arrives, or whether the child resets
> the SIGTERM handler to what the parent had before the trap before the
> signal arrives.

So I straced the execution of a run which hangs (keeps printing) and here is 
what I see 
(full log can be found at https://termbin.com/umhnv): 


12392 write(1, "funpid=12393 twopid=12394 mypid="..., 38) = 38  
12392 write(1, "killing 12393 12394\n", 20) = 20
12392 kill(12393, SIGTERM)  = 0 
12392 kill(12394, SIGTERM)  = 0 
12392 write(1, "waiting on everything\n", 22) = 22   

So 12392 is the pid of the main  shell which spawns 2 subshells 
with pids 12393 and 12394, sends SIGTERM to them. 
And they seem  to receive the signal but ignore it: 


2392 wait4(-1, 
12394 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=12392, si_uid=0} 
---
12393 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=12392, si_uid=0} 
---
12394 rt_sigreturn({mask=[INT CHLD]}
12393 rt_sigreturn({mask=[INT CHLD]}
12394 <... rt_sigreturn resumed> )  = 0 
12393 <... rt_sigreturn resumed> )  = 0 
12394 getpid()  = 12394 
12393 getpid(   
12394 close(255 
12393 <... getpid resumed> )= 12393 
12394 <... close resumed> ) = 0 
12394 rt_sigprocmask(SIG_SETMASK, [],   
12393 close(255 
12394 <... rt_sigprocmask resumed> NULL, 8) = 0 
12393 <... close resumed> ) = 0 
12394 rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12393 rt_sigprocmask(SIG_SETMASK, [],   
12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12393 <... rt_sigprocmask resumed> NULL, 8) = 0 
12394 rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12393 rt_sigaction(SIGTSTP, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12394 rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12393 rt_sigaction(SIGTTIN, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12394 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12394 rt_sigaction(SIGHUP, {sa_handler=0x55b5e4479d80, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0},   
12393 rt_sigaction(SIGTTOU, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 
12394 <... rt_sigaction resumed> {sa_handler=0x55b5e4476400, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12393 <... rt_sigaction resumed> {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0}, 8) = 0
12394 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER, sa_restorer=0x7f8c881fdfd0},  
{sa_handler=0x55b5e4476400, sa

Re: Possible race condition in bash

2020-11-21 Thread Andreas Schwab
On Nov 21 2020, Chet Ramey wrote:

> This is a potential race condition, but it's not with the shell.

The race with the trap command in the function is obvious, but that
should not result in the signal to be ignored.  There should always be a
trap active, either the main one or the one in the function, and neither
ignores the signal.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: Possible race condition in bash

2020-11-21 Thread Chet Ramey
On 11/21/20 3:06 AM, Nikolay Borisov wrote:

> The output is:
> 
> my pid 12186
> 12186 subfun xxx 0
> funpid=12186 twopid=12187 mypid=12185
> killing 12186 12187
> waiting on everything
> my pid 12187
> 12187 subfun yyy 0
> received term for xxx
> 12187 subfun yyy 1
> 12187 subfun yyy 2
> 12187 subfun yyy 3
> 
> 
> and 12187 keeps printing. It would seem there $! can return a pid of the
> subshell before it's in a state that it can be killed. 

This is a potential race condition, but it's not with the shell. The
parent shell sets $! as soon as the child pid returns from fork(), and
that doesn't have anything to do with when the child gets scheduled or
runs. You might have better luck running this under strace or truss and
seeing what happens with the timing: whether the child sets SIGTERM to
the trap handler before the signal arrives, or whether the child resets
the SIGTERM handler to what the parent had before the trap before the
signal arrives.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Possible race condition in bash

2020-11-21 Thread Nikolay Borisov
Hello,


It's possible to have the following script never terminate:

#!/bin/bash



subfun() {

trap "echo received term for $*;exit 1" SIGTERM

echo my pid $BASHPID

i=0

while true; do

echo $BASHPID subfun $* $i

i=$((i+1))

sleep 5

done

}

trap 'echo main; exit 1' 0 1 2 3 15


subfun xxx &

funpid=$!

subfun yyy &

twopid=$!

echo funpid=$funpid twopid=$twopid mypid=$$

echo killing $funpid $twopid

kill $funpid $twopid

echo waiting on everything

wait

echo done

The output is:

my pid 12186
12186 subfun xxx 0
funpid=12186 twopid=12187 mypid=12185
killing 12186 12187
waiting on everything
my pid 12187
12187 subfun yyy 0
received term for xxx
12187 subfun yyy 1
12187 subfun yyy 2
12187 subfun yyy 3


and 12187 keeps printing. It would seem there $! can return a pid of the
subshell before it's in a state that it can be killed. This has to do
with setting of the trap handler, because if the trap handler of the
main shell is changed to : trap 'echo main; exit 1' EXIT - Then it never
hangs. I.e the more trap handlers that subshells have to reset to their
defaults the higher the chances this race condition is hit. Also if I
change the sleep in the subshell function to anything below 5 I also
seem to be unable to hit it. Also sometimes when it doesn't hang I get
the following output:

my pid 12283
12283 subfun xxx 0
funpid=12283 twopid=12284 mypid=12282
killing 12283 12284
waiting on everything
received term for xxx
done
main

It misses a "Received term for yyy" line.


I reproduced this behavior both with version 4.4.20(1)-release (that
comes with ubuntu 18.04) as well as 5.0.18(1)-release built from source.
Any ideas what might be causing it?