Michael, thanks for the info, I wasn't aware of a bug that effected this. Once I used srun for my actual python script the signal catching worked as expected. I was also using the long name for bash traps so that likely is why I never saw things caught when using --signal=B:USR1.
- Trey ============================= Trey Dockendorf Systems Analyst I Texas A&M University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: [email protected] Jabber: [email protected] On Mon, Jan 26, 2015 at 11:32 AM, Michael Gutteridge < [email protected]> wrote: > FWIW: I'm able to send, trap, and process signals as one would expect. > Not sure if our versions match up (I'm on 14.03.7), but a basic bash script > using "trap" is receiving and processing those signals properly when I use > "--batch" in scancel. It also seems to work properly when I have > "--signal=" set, though (depending on your job) I think you have to set > that to signal the batch process via: > > sbatch --signal=B:USR1@60 > > Or similar. There's more discussion in this bug: > > http://bugs.schedmd.com/show_bug.cgi?id=333 > > So the version you're running may affect the behavior you are seeing. > > Also: may or may not be the issue, but trap takes the short signal name as > the condition, so "trap foo USR1" not "trap foo SIGUSR1" > > Happy hunting. > > Michael > > > On Fri, Jan 16, 2015 at 2:00 PM, Trey Dockendorf <[email protected]> > wrote: > >> I've found that using srun to launch the python application allows it >> to receive the signals from SLURM. Unsure if that's the intended behavior, >> but it works. >> >> - Trey >> >> ============================= >> >> Trey Dockendorf >> Systems Analyst I >> Texas A&M University >> Academy for Advanced Telecommunications and Learning Technologies >> Phone: (979)458-2396 >> Email: [email protected] >> Jabber: [email protected] >> >> On Fri, Jan 16, 2015 at 1:59 PM, Trey Dockendorf <[email protected]> >> wrote: >> >>> I'm attempting to have a batch script receive SIGUSR1 60 seconds before >>> walltime is reached. I have the python program that runs in the job >>> handling the signals. When I run my jobs interactively and send "kill -s >>> USR1 <pid>" the python code responds as I'd expect. However when I run >>> either interactively or via a batch script and use scancel to send the USR1 >>> signal nothing seems to happen. I even added this to my batch script just >>> to see if signals are being sent >>> >>> trap 'echo "SIGNAL CAUGHT"' SIGUSR1 >>> >>> I try 'scancel --signal=USR1 --batch <jobID>' and nothing prints. >>> >>> I've used "#SBATCH --signal=USR1" in my batch scripts. I'm unsure if >>> there is something I'm missing that is the key to making these signals >>> work. We are using cgroups for ProctrackType and TaskPlugin. The python >>> code I'm running is not executed via srun. >>> >>> Thanks, >>> - Trey >>> >>> ============================= >>> >>> Trey Dockendorf >>> Systems Analyst I >>> Texas A&M University >>> Academy for Advanced Telecommunications and Learning Technologies >>> Phone: (979)458-2396 >>> Email: [email protected] >>> Jabber: [email protected] >>> >> >> >
