Hi all,

just in case anybody faces this problem at some point...

I found the solution with a set of good examples in
http://mywiki.wooledge.org/SignalTrap

applied to my problem (full code below), it comes to execute

"sh son.sh & wait $!"

in the father script.

Best regards,

Manuel

2017-07-04 15:55 GMT+02:00 Manuel Rodríguez Pascual
<[email protected]>:
>
> Hi all,
>
> Developing a Slurm plugin I've come to a funny problem. I guess it is not 
> strictly related to Slurm but just system administration, but maybe someone 
> can point me on the right direction.
>
> I have 2 machines, one with CentOS 7 and one with BullX (based on CentOS6). 
> When I send a signal to finish a running tasks, the behaviours are different.
>
> It can be seen with 2 nested scripts, based on slurm_trap.sh by Mike Drake  
> (https://gist.github.com/MikeDacre/10ae23dcd3986793c3fd ). The code is at the 
> bottom of the mail. As can be seen, both father and son are capturing SIGTERM 
> and SIGKILL,. The execution consists on "father" calling "son", and "son" 
> waiting forever until it is killed.
>
>
> As you can see in the execution results (bottom of the mail), one of the 
> machines executes the functions stated in "trap", but the other does not. 
> Moreover, this second machine does execute the functions in trap when only a 
> single script is executed, not two nested ones.
>
> have you got an explanation for this? Is is possible to ensure that the 
> "trap" command will always be executed?
>
> Thanks for your help,
>
> Manuel
>
> -----
> -----
> -bash-4.2$ more father.sh
>
> #!/bin/bash
>
> trap_with_arg() {
>     func="$1" ; shift
>     for sig ; do
>         trap "$func $sig" "$sig"
>     done
> }
>
> func_trap() {
>     echo father: trapped $1
> }
>
> trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM
>
> cat /dev/zero > /dev/null &
>
> sh son.sh
> -bash-4.2$ more son.sh
> #!/bin/bash
>
>
> trap_with_arg() {
>     func="$1" ; shift
>     for sig ; do
>         trap "$func $sig" "$sig"
>     done
> }
>
> func_trap() {
>     echo son: trapped $1
> }
>
> trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM
>
> cat /dev/zero > /dev/null &
> wait
> -----
> -----
>
>
> Output in CentOS7:
> -bash-4.2$ sbatch  father.sh
> Submitted batch job 1563
> -bash-4.2$ scancel 1563
> -bash-4.2$ more slurm-1563.out
> slurmstepd: error: *** JOB 1563 ON acme12 CANCELLED AT 2017-07-04T15:39:00 ***
> son: trapped TERM
> son: trapped EXIT
> father: trapped TERM
> father: trapped EXIT
>
> Output in BullX:
> ~/signalTests> sbatch  father.sh
> Submitted batch job 233
> ~/signalTests> scancel 233
> ~/signalTests> more slurm-233.out
> slurmstepd: error: *** JOB 233 ON taurusi5089 CANCELLED AT 
> 2017-07-04T15:43:54 ***
>
> Output in BullX, just son:
> ~/signalTests> sbatch -- son.sh
> Submitted batch job 235
> ~/signalTests> scancel 235
> ~/signalTests> more slurm-235.out
> slurmstepd: error: *** JOB 235 ON taurusi4061 CANCELLED AT 
> 2017-07-04T15:48:29 ***
> son: trapped TERM
> son: trapped EXIT
>
>
>
>
>

Reply via email to