I've found that using srun to launch the python application allows it to receive the signals from SLURM. Unsure if that's the intended behavior, but it works.
- Trey ============================= Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: [email protected] Jabber: [email protected] On Fri, Jan 16, 2015 at 1:59 PM, Trey Dockendorf <[email protected]> wrote: > I'm attempting to have a batch script receive SIGUSR1 60 seconds before > walltime is reached. I have the python program that runs in the job > handling the signals. When I run my jobs interactively and send "kill -s > USR1 <pid>" the python code responds as I'd expect. However when I run > either interactively or via a batch script and use scancel to send the USR1 > signal nothing seems to happen. I even added this to my batch script just > to see if signals are being sent > > trap 'echo "SIGNAL CAUGHT"' SIGUSR1 > > I try 'scancel --signal=USR1 --batch <jobID>' and nothing prints. > > I've used "#SBATCH --signal=USR1" in my batch scripts. I'm unsure if > there is something I'm missing that is the key to making these signals > work. We are using cgroups for ProctrackType and TaskPlugin. The python > code I'm running is not executed via srun. > > Thanks, > - Trey > > ============================= > > Trey Dockendorf > Systems Analyst I > Texas A&M University > Academy for Advanced Telecommunications and Learning Technologies > Phone: (979)458-2396 > Email: [email protected] > Jabber: [email protected] >
