Reuti <[email protected]> writes: >> I am having a situation where programs go south and are left running on >> nodes when Grid Engine thinks they are no longer running and do not exists >> in the GE queue. >> >> I can't be the first asking for such a thing, so I don't want to >> re-invent the wheel if some script or way already exists for doing >> this that works.
The first thing to do is to try to avoid it, particularly by trying to ensure parallel jobs use tight integration. > Are the processes jumping out of the process tree and are no longer bound to > the sge_shepherd? One thing you an try is: > > $ qconf -sconf > ... > execd_params ENABLE_ADDGRP_KILL=TRUE And see http://arc.liv.ac.uk/SGE/howto/remove_orphaned_processes.html if necessary. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
