SGE is fine on 1GB fabrics and I don't know of anyone who uses 10Gb for
SGE unless it's a combined network fabric that is carrying storage and
application traffic along with SGE traffic on the same links. Or if you
are running all new stuff with 10Gb for everything and maybe a 1GB NIC
held back for ILO/DRAC/IPMI/provisioning usage
The commonly accepted point at which you'd hit a scaling limit on an
ethernet network would most likely be determined, not by Grid Engine
traffic but by:
- Network filesystem traffic for shared storage
- Application message passing traffic
I don't see SGE native traffic as a huge consumer of bandwidth or
network resources in most cases. It's the "other stuff' that blows out
the network.
And there is no one size fits all answer there as people's HPC
footprints vary wildly by how they are used and what they are
architected for.
SGE can run at massive scale over 1Gb network fabric without issues.
The only time 1Gb network becomes the bottleneck is when you try to
stuff NFS and application traffic down the same pipe. And even then
you'd hit performance and job throughput problems before you hit a
scaling limit wall. If you've got SGE running on a mostly free 1GB
fabric (maybe it's your admin or provisioning network etc.) you'd be
fine at even large scale.
The sorts of tuning you'd do to run "big SGE" on a 1GB fabric would be to:
- Tune the qmaster host to handle the # of endpoints expected
- Make darn sure application traffic and storage traffic is on a
different network
- If you have to share the 1Gbe with other traffic than configure SGE
for local spooling. The danger here is performance impact, not scaling
My $.02
Lane, William <mailto:william.l...@cshs.org>
September 24, 2015 at 6:04 PM
If a cluster is running on a relatively slow speed networking backbone
(say gigabit ethernet or
10 Gib ethernet as opposed to inifiniband), is there any commonly
accepted point at which increasing the number
of nodes in a queue negatively affects the performance of the queue?
Is there any general
rule about how many nodes to have in a queue based on a given network
backbone?
-Bill L.
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users