Hi,

Am 02.06.2016 um 17:01 schrieb Julien Nicoulaud:

> Hi all,
> 
> I have a queue configured for USR2 notification on qdel, with 10 seconds 
> delay.
> 
> It works fine for batch jobs, I can see the USR2 signal is sent 10 seconds 
> before the KILL.
> 
> But when using parallel jobs, I have several issues:
> 
> 1) Subjobs submitted using qrsh -inherit are killed right away (every time)
> Is there a way to inherit -notify to subtasks ?
> 
> 2) The master job also gets killed right away "randomly" (like 1 out of 10 
> times), just after being sent the USR2 signal.
> 
> I pasted simple reproducers scripts here: 
> https://gist.github.com/nicoulaj/91a18d5c0ed952cbd027bae53bbbedbd
>  - test.sh is the submit command
>  - test-master.sh is the parallel job script
>  - test-slave.sh is the parallel subtask script
> 
> I could find this old issue, which makes me think SGE is supposed to handle 
> this correctly: https://arc.liv.ac.uk/trac/SGE/ticket/660

the USR2 is not send to the jobscript only, but to the complete process group. 
And the sleep 1 in the while loop will create another process. Hence depending 
on the time the USR2 arrives, there is a forked sleep 1 or not which will use 
the default behavior. You can try:

while (trap - usr2; exec sleep 1); do

-- Reuti


> Any idea ?
> 
> Best regards,
> Julien
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to