Stuart Barkley <[email protected]> writes: > If a node dies or is rebooted SGE does not do anything about hung jobs > when the node comes back online. The jobs continue to appear in > the queue as if they where running. > > This may be related to my using diskless nodes where the local spool > directory is cleared on reboot. I will be looking into putting the > execd spool files on a shared directory in the future which may > address this problem.
I don't think it will. I see the same with a shared spool, at least for nodes running tightly-integrated parallel jobs, and I think others have in the archives. I thought there was an issue filed already, but apparently not. I'll file it, at least. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
