Hi all, just in case anybody faces this problem at some point...
I found the solution with a set of good examples in http://mywiki.wooledge.org/SignalTrap applied to my problem (full code below), it comes to execute "sh son.sh & wait $!" in the father script. Best regards, Manuel 2017-07-04 15:55 GMT+02:00 Manuel Rodríguez Pascual <[email protected]>: > > Hi all, > > Developing a Slurm plugin I've come to a funny problem. I guess it is not > strictly related to Slurm but just system administration, but maybe someone > can point me on the right direction. > > I have 2 machines, one with CentOS 7 and one with BullX (based on CentOS6). > When I send a signal to finish a running tasks, the behaviours are different. > > It can be seen with 2 nested scripts, based on slurm_trap.sh by Mike Drake > (https://gist.github.com/MikeDacre/10ae23dcd3986793c3fd ). The code is at the > bottom of the mail. As can be seen, both father and son are capturing SIGTERM > and SIGKILL,. The execution consists on "father" calling "son", and "son" > waiting forever until it is killed. > > > As you can see in the execution results (bottom of the mail), one of the > machines executes the functions stated in "trap", but the other does not. > Moreover, this second machine does execute the functions in trap when only a > single script is executed, not two nested ones. > > have you got an explanation for this? Is is possible to ensure that the > "trap" command will always be executed? > > Thanks for your help, > > Manuel > > ----- > ----- > -bash-4.2$ more father.sh > > #!/bin/bash > > trap_with_arg() { > func="$1" ; shift > for sig ; do > trap "$func $sig" "$sig" > done > } > > func_trap() { > echo father: trapped $1 > } > > trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM > > cat /dev/zero > /dev/null & > > sh son.sh > -bash-4.2$ more son.sh > #!/bin/bash > > > trap_with_arg() { > func="$1" ; shift > for sig ; do > trap "$func $sig" "$sig" > done > } > > func_trap() { > echo son: trapped $1 > } > > trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM > > cat /dev/zero > /dev/null & > wait > ----- > ----- > > > Output in CentOS7: > -bash-4.2$ sbatch father.sh > Submitted batch job 1563 > -bash-4.2$ scancel 1563 > -bash-4.2$ more slurm-1563.out > slurmstepd: error: *** JOB 1563 ON acme12 CANCELLED AT 2017-07-04T15:39:00 *** > son: trapped TERM > son: trapped EXIT > father: trapped TERM > father: trapped EXIT > > Output in BullX: > ~/signalTests> sbatch father.sh > Submitted batch job 233 > ~/signalTests> scancel 233 > ~/signalTests> more slurm-233.out > slurmstepd: error: *** JOB 233 ON taurusi5089 CANCELLED AT > 2017-07-04T15:43:54 *** > > Output in BullX, just son: > ~/signalTests> sbatch -- son.sh > Submitted batch job 235 > ~/signalTests> scancel 235 > ~/signalTests> more slurm-235.out > slurmstepd: error: *** JOB 235 ON taurusi4061 CANCELLED AT > 2017-07-04T15:48:29 *** > son: trapped TERM > son: trapped EXIT > > > > >
