Hi all,

I'm working on setting up a tightly integrated parallel environment for my
application using the "qrsh -inherit" method, but I can't find the right
way to terminate the qrsh sub-tasks. Whatever method I try, the parent job
always ends with "Unable to run job N" message and the qmaster log contains:

*tightly integrated parallel task 159.1 task 1.vbox-centos6-3 failed -
killing job*


Does anyone know the right way to handle this ?

If this can help, I shared my test scripts here:
https://gist.github.com/3479264

   - test.sh: submits master.sh as a N slots parallel job
   - master.sh:
      - Launches N-1 worker.sh with "qrsh -inherit" in the background
      - Works for a while
      - Sends TERM to qrsh processes
   - worker.sh: works until killed

By the way, I'm using SGE 6.2u5.

Any help on this is welcome!

Regards,
Julien
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to