Michael, thanks for the info, I wasn't aware of a bug that effected this.
Once I used srun for my actual python script the signal catching worked as
expected.  I was also using the long name for bash traps so that likely is
why I never saw things caught when using --signal=B:USR1.

- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: [email protected]
Jabber: [email protected]

On Mon, Jan 26, 2015 at 11:32 AM, Michael Gutteridge <
[email protected]> wrote:

>  FWIW: I'm able to send, trap, and process signals as one would expect.
> Not sure if our versions match up (I'm on 14.03.7), but a basic bash script
> using "trap" is receiving and processing those signals properly when I use
> "--batch" in scancel.  It also seems to work properly when I have
> "--signal=" set, though (depending on your job) I think you have to set
> that to signal the batch process via:
>
>     sbatch --signal=B:USR1@60
>
> Or similar.  There's more discussion in this bug:
>
> http://bugs.schedmd.com/show_bug.cgi?id=333
>
> So the version you're running may affect the behavior you are seeing.
>
> Also: may or may not be the issue, but trap takes the short signal name as
> the condition, so "trap foo USR1" not "trap foo SIGUSR1"
>
> Happy hunting.
>
> Michael
>
>
> On Fri, Jan 16, 2015 at 2:00 PM, Trey Dockendorf <[email protected]>
> wrote:
>
>>  I've found that using srun to launch the python application allows it
>> to receive the signals from SLURM.  Unsure if that's the intended behavior,
>> but it works.
>>
>> - Trey
>>
>> =============================
>>
>> Trey Dockendorf
>> Systems Analyst I
>> Texas A&M University
>> Academy for Advanced Telecommunications and Learning Technologies
>> Phone: (979)458-2396
>> Email: [email protected]
>> Jabber: [email protected]
>>
>> On Fri, Jan 16, 2015 at 1:59 PM, Trey Dockendorf <[email protected]>
>> wrote:
>>
>>> I'm attempting to have a batch script receive SIGUSR1 60 seconds before
>>> walltime is reached.  I have the python program that runs in the job
>>> handling the signals.  When I run my jobs interactively and send "kill -s
>>> USR1 <pid>" the python code responds as I'd expect.  However when I run
>>> either interactively or via a batch script and use scancel to send the USR1
>>> signal nothing seems to happen.  I even added this to my batch script just
>>> to see if signals are being sent
>>>
>>> trap 'echo "SIGNAL CAUGHT"' SIGUSR1
>>>
>>> I try 'scancel --signal=USR1 --batch <jobID>' and nothing prints.
>>>
>>> I've used "#SBATCH --signal=USR1" in my batch scripts.  I'm unsure if
>>> there is something I'm missing that is the key to making these signals
>>> work.  We are using cgroups for ProctrackType and TaskPlugin.  The python
>>> code I'm running is not executed via srun.
>>>
>>> Thanks,
>>> - Trey
>>>
>>> =============================
>>>
>>> Trey Dockendorf
>>> Systems Analyst I
>>> Texas A&M University
>>> Academy for Advanced Telecommunications and Learning Technologies
>>> Phone: (979)458-2396
>>> Email: [email protected]
>>> Jabber: [email protected]
>>>
>>
>>
>

Reply via email to