We've been running a "shared-nothing" SGE deployment for the last 3
years with no real issues. Upgrading software can be a pain on a large
population, but it's manageable. We also set up an active-passive HA
failover cluster for the SGE queue master, since the SGE master/shadow
configuration won't work if there is no shared SGE_ROOT.
The HA cluster is has 2 nodes with a shared disk for the SGE software
and configuration. We use Heartbeat (don't ask -- it's too complicated)
but would recommend a modern cluster platform such as Pacemaker (on
Linux). The resource group has a floating IP address in addition to the
shared disk and the qmaster service. We set
"$SGE_ROOT/$SGE_CELL/common/act_qmaster" to the host name of the
floating IP on all hosts. We also set the host_aliases to map the
floating IP to each of the cluster nodes, but I don't think that's
actually necessary.
Malcolm.
On 02/03/2012 15:03, Simon Matthews wrote:
On Thu, Mar 1, 2012 at 8:00 PM, Rayson Ho <[email protected]
<mailto:[email protected]>> wrote:
On Thu, Mar 1, 2012 at 10:55 PM, Simon Matthews
<[email protected] <mailto:[email protected]>>
wrote:
> I installed "iotop" and it shows multiple nfsd processes driving
a lot of
> I/O. I have always assumed that the qmaster and the execution
clients need
> to share a common SGE_ROOT directory. Is this true?
You don't need to have a shared SGE_ROOT, see:
http://gridscheduler.sourceforge.net/howto/nfsreduce.html
Great! I'll reconfigure things to use a local spool directory -- that
should eliminate much of the issue.
Simon
And for SGE 6.2u5 or below, you can't have BerkeleyDB on NFS (unless
it is NFSv4). For Grid Engine 2011.11, you can place your BerkeleyDB
spool directory on any version of NFS.
Rayson
> If not, then I can make
> each execution machine have a local SGE_ROOT directory, which
will eliminate
> the I/O from nfsd.
>
> Simon
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users