Am 12.11.2014 um 17:26 schrieb Peskin, Eric: > All, > > Does SGE have to use NFS or can it work locally on each node? > If parts of it have to be on NFS, what is the minimal subset?
Usually it's sufficient to have the spool directories local. We never have had problems with sge_execd not being accessible. At one point I played around with staging the complete /usr/sge to the node while it boots. IIRC I had /usr/sge as symbolic link to some NFS mount of SGE, and after the copy process I replaced the symbolic link with one to a local directory. So the overall path stayed the same all the time. -- Reuti > How much of this changes if you want redundant masters? > > We have a cluster running CentOS 6.3, Bright Cluster Manager 6.0, and SGE > 2011.11. Specifically, SGE is provided by a Bright package: > sge-2011.11-360_cm6.0.x86_64 > > Twice, we have lost all the running SGE jobs when the cluster failed over > from one head node to the other. =( Not supposed to happen. > Since then, we have also had many individual jobs get lost. The later > situation correlates with messages in the system logs saying > >> abrt[9007]: File '/cm/shared/apps/sge/2011.11/bin/linux-x64/sge_execd' seems >> to be deleted > > That file lives on an NFS mount on our Isilon storage. > Surely, the executables don't have to be on NFS? > Interesting, we are using local spooling, the spool directory on each node is > /cm/local/apps/sge/var/spool , which is, indeed local. > But the $SGE_ROOT , /cm/shared/apps/sge/2011.11 lives on NFS. > Does any of it need to? > Maybe just the var part would need to: /cm/shared/apps/sge/var ? > > Thanks, > Eric > > > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users