Thanks William, Reuti and Dave.

I will try the pointers made here.

Joseph

On 09/20/2012 02:13 AM, Reuti wrote:
Am 20.09.2012 um 02:08 schrieb Joseph Farran:

What is the recommended way and/or do scripts exists for cleaning up once a job 
completes/dies/crashes on a node?

I would prefer to do this via the epilog script.
So, the job crashes but continue running?


I am having a situation where programs go south and are left running on nodes 
when Grid Engine thinks they are no longer running and do not exists in the GE 
queue.

I can't be the first asking for such a thing, so I don't want to re-invent the 
wheel if some script or way already exists for doing this that works.
Are the processes jumping out of the process tree and are no longer bound to 
the sge_shepherd? One thing you an try is:

$ qconf -sconf
...
execd_params                 ENABLE_ADDGRP_KILL=TRUE

-- Reuti


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to