Hi all, I'm working on setting up a tightly integrated parallel environment for my application using the "qrsh -inherit" method, but I can't find the right way to terminate the qrsh sub-tasks. Whatever method I try, the parent job always ends with "Unable to run job N" message and the qmaster log contains:
*tightly integrated parallel task 159.1 task 1.vbox-centos6-3 failed - killing job* Does anyone know the right way to handle this ? If this can help, I shared my test scripts here: https://gist.github.com/3479264 - test.sh: submits master.sh as a N slots parallel job - master.sh: - Launches N-1 worker.sh with "qrsh -inherit" in the background - Works for a while - Sends TERM to qrsh processes - worker.sh: works until killed By the way, I'm using SGE 6.2u5. Any help on this is welcome! Regards, Julien
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
