Hi,

I'm using slurm 15.08.10 and I'm trying to understand scancel better.
According to the man page:

  -b, --batch
    Signal only the batch step (the shell script), but not any other
    steps nor any children of the shell script. This is useful when
    the shell script has to trap  the  signal  and  take  some
    application defined  action.  This is not applicable if step_id
    is specified.  NOTE: The shell itself may exit upon receipt of
    many signals.  You may avoid this by explicitly trap signals
    within the shell script (e.g. "trap <arg> <signals>"). See the
    shell documentation for details.  Also see the -f, --full option.

  -f, --full
    Signal  all steps associated with the job including any batch
    step (the shell script plus all of its child processes).  By
    default, signals other than SIGKILL are not sent to the batch
    step.  Also see the -b, --batch option.

I take this to understand that `scancel -b 1234` should result in the
batch script of job 1234 receiving SIGTERM and nothing else. And in the
default case `scancel 1234` all steps except the batch step should
receive SIGTERM. In my example there are no other steps so i'm not sure
what would receive SIGTERM.

I would like to catch SIGTERM in a batch script where the batch step is
the only step in order to create a file indicating an incomplete job
(for integration with snakemake) before canceling a child process. So
this is the script:

--------------------------------------------------------------------------
#!/bin/bash
set -x

# propagate the TERM signal to child
trap 'kill -TERM $PID; touch job_cancelled' SIGTERM

sleep 10m &
PID=$!
wait $PID
trap - SIGTERM
wait $PID
exit $?
--------------------------------------------------------------------------

submitted with `sbatch batch.sh`.

If i cancel with `scancel -b 123`, the trap is *not* executed. If i cancel
with `scancel 1234` it is executed, which is oposite of what i would
have expected.

Did misunderstand something or is the documentation off?

Thanks in advance for any pointers.

Best,
Wolfgang

--
Wolfgang Resch, PhD
Computational Biologist
HPC @ NIH staff
Twitter: @nih_hpc
301.451.4345

Reply via email to