Am 11.08.2011 um 15:29 schrieb Dave Love:

> Stuart Barkley <[email protected]> writes:
> 
>> If a node dies or is rebooted SGE does not do anything about hung jobs
>> when the node comes back online.  The jobs continue to appear in
>> the queue as if they where running.
>> 
>> This may be related to my using diskless nodes where the local spool
>> directory is cleared on reboot.  I will be looking into putting the
>> execd spool files on a shared directory in the future which may
>> address this problem.
> 
> I don't think it will.  I see the same with a shared spool, at least for
> nodes running tightly-integrated parallel jobs, and I think others have
> in the archives.  I thought there was an issue filed already, but
> apparently not.  I'll file it, at least.

I think the message in the subject happens when there is something in the spool 
directory of the node like "$SGE_ROOT/default/spool/node01/jobs/00/0000/515" 
while there is nothing in "active_jobs" any longer. So it can't kill anything.

Clearing the node's "jobs" directory may resolve it.

-- Reuti


> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to