We've been running a "shared-nothing" SGE deployment for the last 3 years with no real issues. Upgrading software can be a pain on a large population, but it's manageable. We also set up an active-passive HA failover cluster for the SGE queue master, since the SGE master/shadow configuration won't work if there is no shared SGE_ROOT.

The HA cluster is has 2 nodes with a shared disk for the SGE software and configuration. We use Heartbeat (don't ask -- it's too complicated) but would recommend a modern cluster platform such as Pacemaker (on Linux). The resource group has a floating IP address in addition to the shared disk and the qmaster service. We set "$SGE_ROOT/$SGE_CELL/common/act_qmaster" to the host name of the floating IP on all hosts. We also set the host_aliases to map the floating IP to each of the cluster nodes, but I don't think that's actually necessary.

Malcolm.


On 02/03/2012 15:03, Simon Matthews wrote:


On Thu, Mar 1, 2012 at 8:00 PM, Rayson Ho <[email protected] <mailto:[email protected]>> wrote:

    On Thu, Mar 1, 2012 at 10:55 PM, Simon Matthews
    <[email protected] <mailto:[email protected]>>
    wrote:
    > I installed "iotop" and it shows multiple nfsd processes driving
    a lot of
    > I/O. I have always assumed that the qmaster and the execution
    clients need
    > to share a common SGE_ROOT directory. Is this true?

    You don't need to have a shared SGE_ROOT, see:

    http://gridscheduler.sourceforge.net/howto/nfsreduce.html


Great! I'll reconfigure things to use a local spool directory -- that should eliminate much of the issue.

Simon


    And for SGE 6.2u5 or below, you can't have BerkeleyDB on NFS (unless
    it is NFSv4). For Grid Engine 2011.11, you can place your BerkeleyDB
    spool directory on any version of NFS.

    Rayson



    > If not, then I can make
    > each execution machine have a local SGE_ROOT directory, which
    will eliminate
    > the I/O from nfsd.
    >
    > Simon
    >
    >




_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to