Hi, Am 02.06.2016 um 17:01 schrieb Julien Nicoulaud:
> Hi all, > > I have a queue configured for USR2 notification on qdel, with 10 seconds > delay. > > It works fine for batch jobs, I can see the USR2 signal is sent 10 seconds > before the KILL. > > But when using parallel jobs, I have several issues: > > 1) Subjobs submitted using qrsh -inherit are killed right away (every time) > Is there a way to inherit -notify to subtasks ? > > 2) The master job also gets killed right away "randomly" (like 1 out of 10 > times), just after being sent the USR2 signal. > > I pasted simple reproducers scripts here: > https://gist.github.com/nicoulaj/91a18d5c0ed952cbd027bae53bbbedbd > - test.sh is the submit command > - test-master.sh is the parallel job script > - test-slave.sh is the parallel subtask script > > I could find this old issue, which makes me think SGE is supposed to handle > this correctly: https://arc.liv.ac.uk/trac/SGE/ticket/660 the USR2 is not send to the jobscript only, but to the complete process group. And the sleep 1 in the while loop will create another process. Hence depending on the time the USR2 arrives, there is a forked sleep 1 or not which will use the default behavior. You can try: while (trap - usr2; exec sleep 1); do -- Reuti > Any idea ? > > Best regards, > Julien > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users