Hi all, Developing a Slurm plugin I've come to a funny problem. I guess it is not strictly related to Slurm but just system administration, but maybe someone can point me on the right direction.
I have 2 machines, one with CentOS 7 and one with BullX (based on CentOS6). When I send a signal to finish a running tasks, the behaviours are different. It can be seen with 2 nested scripts, based on slurm_trap.sh by Mike Drake (https://gist.github.com/MikeDacre/10ae23dcd3986793c3fd ). The code is at the bottom of the mail. As can be seen, both father and son are capturing SIGTERM and SIGKILL,. The execution consists on "father" calling "son", and "son" waiting forever until it is killed. As you can see in the execution results (bottom of the mail), one of the machines executes the functions stated in "trap", but the other does not. Moreover, this second machine does execute the functions in trap when only a single script is executed, not two nested ones. have you got an explanation for this? Is is possible to ensure that the "trap" command will always be executed? Thanks for your help, Manuel ----- ----- -bash-4.2$ more father.sh #!/bin/bash trap_with_arg() { func="$1" ; shift for sig ; do trap "$func $sig" "$sig" done } func_trap() { echo father: trapped $1 } trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM cat /dev/zero > /dev/null & sh son.sh -bash-4.2$ more son.sh #!/bin/bash trap_with_arg() { func="$1" ; shift for sig ; do trap "$func $sig" "$sig" done } func_trap() { echo son: trapped $1 } trap_with_arg func_trap 0 1 USR1 EXIT HUP INT QUIT PIPE TERM cat /dev/zero > /dev/null & wait ----- ----- Output in CentOS7: -bash-4.2$ sbatch father.sh Submitted batch job 1563 -bash-4.2$ scancel 1563 -bash-4.2$ more slurm-1563.out slurmstepd: error: *** JOB 1563 ON acme12 CANCELLED AT 2017-07-04T15:39:00 *** son: trapped TERM son: trapped EXIT father: trapped TERM father: trapped EXIT Output in BullX: ~/signalTests> sbatch father.sh Submitted batch job 233 ~/signalTests> scancel 233 ~/signalTests> more slurm-233.out slurmstepd: error: *** JOB 233 ON taurusi5089 CANCELLED AT 2017-07-04T15:43:54 *** Output in BullX, just son: ~/signalTests> sbatch -- son.sh Submitted batch job 235 ~/signalTests> scancel 235 ~/signalTests> more slurm-235.out slurmstepd: error: *** JOB 235 ON taurusi4061 CANCELLED AT 2017-07-04T15:48:29 *** son: trapped TERM son: trapped EXIT
