Indeed, I realize both my reproducer scripts and the real ones had subprocesses without traps.
Thank you! Julien 2016-06-27 23:21 GMT+02:00 Reuti <re...@staff.uni-marburg.de>: > Hi, > > Am 02.06.2016 um 17:01 schrieb Julien Nicoulaud: > > > Hi all, > > > > I have a queue configured for USR2 notification on qdel, with 10 seconds > delay. > > > > It works fine for batch jobs, I can see the USR2 signal is sent 10 > seconds before the KILL. > > > > But when using parallel jobs, I have several issues: > > > > 1) Subjobs submitted using qrsh -inherit are killed right away (every > time) > > Is there a way to inherit -notify to subtasks ? > > > > 2) The master job also gets killed right away "randomly" (like 1 out of > 10 times), just after being sent the USR2 signal. > > > > I pasted simple reproducers scripts here: > https://gist.github.com/nicoulaj/91a18d5c0ed952cbd027bae53bbbedbd > > - test.sh is the submit command > > - test-master.sh is the parallel job script > > - test-slave.sh is the parallel subtask script > > > > I could find this old issue, which makes me think SGE is supposed to > handle this correctly: https://arc.liv.ac.uk/trac/SGE/ticket/660 > > the USR2 is not send to the jobscript only, but to the complete process > group. And the sleep 1 in the while loop will create another process. Hence > depending on the time the USR2 arrives, there is a forked sleep 1 or not > which will use the default behavior. You can try: > > while (trap - usr2; exec sleep 1); do > > -- Reuti > > > > Any idea ? > > > > Best regards, > > Julien > > _______________________________________________ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users