FWIW: I'm able to send, trap, and process signals as one would expect. Not
sure if our versions match up (I'm on 14.03.7), but a basic bash script
using "trap" is receiving and processing those signals properly when I use
"--batch" in scancel. It also seems to work properly when I have
"--signal=" set, though (depending on your job) I think you have to set
that to signal the batch process via:
sbatch --signal=B:USR1@60
Or similar. There's more discussion in this bug:
http://bugs.schedmd.com/show_bug.cgi?id=333
So the version you're running may affect the behavior you are seeing.
Also: may or may not be the issue, but trap takes the short signal name as
the condition, so "trap foo USR1" not "trap foo SIGUSR1"
Happy hunting.
Michael
On Fri, Jan 16, 2015 at 2:00 PM, Trey Dockendorf <[email protected]> wrote:
> I've found that using srun to launch the python application allows it to
> receive the signals from SLURM. Unsure if that's the intended behavior,
> but it works.
>
> - Trey
>
> =============================
>
> Trey Dockendorf
> Systems Analyst I
> Texas A&M University
> Academy for Advanced Telecommunications and Learning Technologies
> Phone: (979)458-2396
> Email: [email protected]
> Jabber: [email protected]
>
> On Fri, Jan 16, 2015 at 1:59 PM, Trey Dockendorf <[email protected]>
> wrote:
>
>> I'm attempting to have a batch script receive SIGUSR1 60 seconds before
>> walltime is reached. I have the python program that runs in the job
>> handling the signals. When I run my jobs interactively and send "kill -s
>> USR1 <pid>" the python code responds as I'd expect. However when I run
>> either interactively or via a batch script and use scancel to send the USR1
>> signal nothing seems to happen. I even added this to my batch script just
>> to see if signals are being sent
>>
>> trap 'echo "SIGNAL CAUGHT"' SIGUSR1
>>
>> I try 'scancel --signal=USR1 --batch <jobID>' and nothing prints.
>>
>> I've used "#SBATCH --signal=USR1" in my batch scripts. I'm unsure if
>> there is something I'm missing that is the key to making these signals
>> work. We are using cgroups for ProctrackType and TaskPlugin. The python
>> code I'm running is not executed via srun.
>>
>> Thanks,
>> - Trey
>>
>> =============================
>>
>> Trey Dockendorf
>> Systems Analyst I
>> Texas A&M University
>> Academy for Advanced Telecommunications and Learning Technologies
>> Phone: (979)458-2396
>> Email: [email protected]
>> Jabber: [email protected]
>>
>
>