I'd like to get some opinions about installation and management of infiniband
from people who have experienced with with it.
WARNING: long post (sorry!)
We are installing infiniband at a site, the setup is very small:
-infiniband to be used for MPI only
-compute cluster that is CPU intensive, very low I/O, we do all I/Os on a
NetApps over 1 Gb ether.
-18 nodes
-one switch only, one cable only per node (no redundancy)
-one single "subnet", no virtual fabric
My experience with IB is close to non-existent: a three day class from the
hardware vendor and me spending some time playing with it afterwards. I have
also talked a bit about how set things up with the software vendor who pushed
us to get IB (engineering software that requires MPI).
What I was looking at doing (and tried):
-every node is completely identical, save for their ip address which
kickstart/dhcp manages. No backup of any node, they can be re-created in about
15 minutes. All data lives on the NetApps.
-install the vendor version of the OFED drivers in an automated fashion as
part of the kickstart of each node
-install the hardware vendor "special software" for management, reporting and
debugging on each node as part of the kickstart, but not starting a Fabric
Manager on any node at all.
This, "just work"! My nodes come up, they are automatically part of the
fabric, which is managed by the switch, and, it works. If there is a need for
debugging, or creating a report for the hardware tech support etc... we login
on one node (any node), start the fabric manager (again there is no virtual
fabric, one unique subnet), and start using the tools. When we're done, we
shutdown the fabric manager.
What the hardware vendor is recomending:
-we install the OS without any IB driver
-we install their "special software" on the "head node" (HN), which means:
-the HN needs passwordless ssh to all the nodes
-the HN is now different and need backups
-we push the IB drivers from the HN
-reboot all the nodes
-install IPoIB, because "we need it for management"
-IPoIB needs a common /etc/hosts file which needs to be pushed to all the
nodes, a new subnet, etc...
The vendor's argument are that their method is better because:
-it is "best practice"
-it is easy to drive through their menu system of their special software
-this is their recommended setup and will make it easier for us to work with
their technical support
-it will be very easy to upgrade their driver to all the nodes when we need to
do so
-even though advantages are not obvious now, our cluster will grow to hundreds
of node, at which point it will make sense
-they do this every day, they know!
My arguments against it are:
-it adds a lot more complexity without giving us any advantage at this point
-I see IPoIB as a liability: it could be used to transfer large files or what
not, and limit the performance of the whole fabric which has been put in place
specifically for MPI. This might affect MPI performance and ultimately make
our jobs fail - we put IB in place because MPI over ethernet wasn't fast
enough for our apps.
-I need to do more testing, but I am pretty sure their special software does
work fine talking over ethernet, without IPoIB.
-the stated intention is not to grow our cluster, but to upgrade the nodes as
faster hardware becomes available
-although the vendor probably does a lot of installs, it is unlikely that they
manage a production cluster, especially part of a UNIX team which manages more
than just that cluster.
-if we end up growing to a significant number of nodes, then we will deal with
it then. At that point we will have to deal with a lot of other issues, like
using multiple switches, probably managing the subnets. It is likely that this
will become a project on its own, which will take care of a new/different
method of managing it (if needed).
-it is not reasonable to use a combine to mow the grass in our backyard,
because we think we might turn it into a farm.
-it is very unlikely that we will change the IB driver before we upgrade to a
new OS. Our experience with our previous cluster (no IB), was no upgrade
whatsoever until the software vendor tells us to upgrade the whole OS.
-I'd still feel more comfortable rebuilding my OS, do thorough testing etc...
if I really had to upgrade the IB drivers (which are kernel modules), rather
than use a script to push the drivers onto the nodes.
-The special software, by default, will re-install the IB driver on all the
nodes. I see this as an additional risk, it would be very easy to push it to
all the nodes and break jobs that have been running for days, while we just
wanted to add/rebuild a node.
-it is unlikely, but, if we were to ever use a different HBA on some new
nodes, their method falls apart, while I can easily create a different
kickstart profile to manage it.
-new members of the UNIX team can easily be trained on the basic concepts of
IB and the three or four commands needed to diagnose easy issues. They can
build/re-build a node with no knowledge of IB at all. With the vendor's
method, they now need to become familiar with their special software.
I need a reality check here, and would really appreciate feed back from people
experience in managing IB. Thanks.
--
Yves. http://www.SollerS.ca/
http://images.SollerS.ca/
xmpp:[email protected]
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/