Hi all,

I have a queue configured for USR2 notification on qdel, with 10 seconds
delay.

It works fine for batch jobs, I can see the USR2 signal is sent 10 seconds
before the KILL.

But when using parallel jobs, I have several issues:

1) Subjobs submitted using qrsh -inherit are killed right away (every time)
Is there a way to inherit -notify to subtasks ?

2) The master job also gets killed right away "randomly" (like 1 out of 10
times), just after being sent the USR2 signal.

I pasted simple reproducers scripts here:
https://gist.github.com/nicoulaj/91a18d5c0ed952cbd027bae53bbbedbd
 - test.sh is the submit command
 - test-master.sh is the parallel job script
 - test-slave.sh is the parallel subtask script

I could find this old issue, which makes me think SGE is supposed to handle
this correctly: https://arc.liv.ac.uk/trac/SGE/ticket/660

Any idea ?

Best regards,
Julien
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to