Reuti <[email protected]> writes:

>> I am having a situation where programs go south and are left running on 
>> nodes when Grid Engine thinks they are no longer running and do not exists 
>> in the GE queue.
>> 
>> I can't be the first asking for such a thing, so I don't want to
>> re-invent the wheel if some script or way already exists for doing
>> this that works.

The first thing to do is to try to avoid it, particularly by trying to
ensure parallel jobs use tight integration.

> Are the processes jumping out of the process tree and are no longer bound to 
> the sge_shepherd? One thing you an try is:
>
> $ qconf -sconf
> ...
> execd_params                 ENABLE_ADDGRP_KILL=TRUE

And see http://arc.liv.ac.uk/SGE/howto/remove_orphaned_processes.html if
necessary.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to