I'm attempting to have a batch script receive SIGUSR1 60 seconds before
walltime is reached.  I have the python program that runs in the job
handling the signals.  When I run my jobs interactively and send "kill -s
USR1 <pid>" the python code responds as I'd expect.  However when I run
either interactively or via a batch script and use scancel to send the USR1
signal nothing seems to happen.  I even added this to my batch script just
to see if signals are being sent

trap 'echo "SIGNAL CAUGHT"' SIGUSR1

I try 'scancel --signal=USR1 --batch <jobID>' and nothing prints.

I've used "#SBATCH --signal=USR1" in my batch scripts.  I'm unsure if there
is something I'm missing that is the key to making these signals work.  We
are using cgroups for ProctrackType and TaskPlugin.  The python code I'm
running is not executed via srun.

Thanks,
- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: [email protected]
Jabber: [email protected]

Reply via email to