Eric,

Did you ever get to the root of this problem?

Prentice

On 11/12/2014 10:26 AM, Peskin, Eric wrote:
All,

Does SGE have to use NFS or can it work locally on each node?
If parts of it have to be on NFS, what is the minimal subset?
How much of this changes if you want redundant masters?

We have a cluster running CentOS 6.3, Bright Cluster Manager 6.0, and SGE 
2011.11.  Specifically, SGE is provided by a Bright package: 
sge-2011.11-360_cm6.0.x86_64

Twice, we have lost all the running SGE jobs when the cluster failed over from 
one head node to the other.  =( Not supposed to happen.
Since then, we have also had many individual jobs get lost.  The later 
situation correlates with messages in the system logs saying

abrt[9007]: File '/cm/shared/apps/sge/2011.11/bin/linux-x64/sge_execd' seems to 
be deleted
That file lives on an NFS mount on our Isilon storage.
Surely, the executables don't have to be on NFS?
Interesting, we are using local spooling, the spool directory on each node is  
/cm/local/apps/sge/var/spool , which is, indeed local.
But the $SGE_ROOT ,  /cm/shared/apps/sge/2011.11 lives on NFS.
Does any of it need to?
Maybe just the var part would need to:  /cm/shared/apps/sge/var ?

Thanks,
Eric



_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users


--
Prentice

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to