Stuart Barkley <[email protected]> writes:

> If a node dies or is rebooted SGE does not do anything about hung jobs
> when the node comes back online.  The jobs continue to appear in
> the queue as if they where running.
>
> This may be related to my using diskless nodes where the local spool
> directory is cleared on reboot.  I will be looking into putting the
> execd spool files on a shared directory in the future which may
> address this problem.

I don't think it will.  I see the same with a shared spool, at least for
nodes running tightly-integrated parallel jobs, and I think others have
in the archives.  I thought there was an issue filed already, but
apparently not.  I'll file it, at least.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to