Hi,

Am 26.08.2012 um 15:42 schrieb Julien Nicoulaud:

> I'm working on setting up a tightly integrated parallel environment for my 
> application using the "qrsh -inherit" method, but I can't find the right way 
> to terminate the qrsh sub-tasks. Whatever method I try, the parent job always 
> ends with "Unable to run job N"

You will get this message only if you start it with `-sync y`. It won't be in 
any logfile otherwise. But I don't face the issue, that the workers run 
forever. They are killed by the exit of the complete job, although not in a 
nice way but by a `kill`.

Maybe you can set in `qconf -mconf`: "execd_params      ENABLE_ADDGRP_KILL=TRUE"

==

The usual way to shut down slave tasks: use your own protocol which you want to 
implement and tell your worker.sh this way: "Hey, kill yourself."

==

In principle it's supported to handle signals and the sge_execd can tell the 
sge_shepherd to signal its kids. For a "normal" binary you can implement 
actions to handle it in a proper way. Using the tight integration by `qrsh 
-inherit ...` there is the special situation, that also the "qrsh_starter" will 
get the signal and it will just exit forcing the job to end.

-- Reuti


> message and the qmaster log contains:
> 
> tightly integrated parallel task 159.1 task 1.vbox-centos6-3 failed - killing 
> job
> 
> Does anyone know the right way to handle this ?
> 
> If this can help, I shared my test scripts here: 
> https://gist.github.com/3479264
>       • test.sh: submits master.sh as a N slots parallel job
>       • master.sh:
>               • Launches N-1 worker.sh with "qrsh -inherit" in the background
>               • Works for a while
>               • Sends TERM to qrsh processes
>       • worker.sh: works until killed
> By the way, I'm using SGE 6.2u5.
> 
> Any help on this is welcome!
> 
> Regards,
> Julien
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to