[lopsa-tech] Infiniband installation and management

Yves Dorfsman Thu, 18 Nov 2010 07:54:17 -0800

I'd like to get some opinions about installation and management of infiniband 
from people who have experienced with with it.



WARNING: long post (sorry!)


We are installing infiniband at a site, the setup is very small:
-infiniband to be used for MPI only
-compute cluster that is CPU intensive, very low I/O, we do all I/Os on a 
NetApps over 1 Gb ether.
-18 nodes
-one switch only, one cable only per node (no redundancy)
-one single "subnet", no virtual fabric


My experience with IB is close to non-existent: a three day class from the 
hardware vendor and me spending some time playing with it afterwards. I have 
also talked a bit about how set things up with the software vendor who pushed 
us to get IB (engineering software that requires MPI).


What I was looking at doing (and tried):
-every node is completely identical, save for their ip address which 
kickstart/dhcp manages. No backup of any node, they can be re-created in about 
15 minutes. All data lives on the NetApps.

-install the vendor version of the OFED drivers in an automated fashion as 
part of the kickstart of each node

-install the hardware vendor "special software" for management, reporting and 
debugging on each node as part of the kickstart, but not starting a Fabric 
Manager on any node at all.

This, "just work"! My nodes come up, they are automatically part of the 
fabric, which is managed by the switch, and, it works. If there is a need for 
debugging, or creating a report for the hardware tech support etc... we login 
on one node (any node), start the fabric manager (again there is no virtual 
fabric, one unique subnet), and start using the tools. When we're done, we 
shutdown the fabric manager.


What the hardware vendor is recomending:
-we install the OS without any IB driver

-we install their "special software" on the "head node" (HN), which means:
     -the HN needs passwordless ssh to all the nodes
     -the HN is now different and need backups

-we push the IB drivers from the HN

-reboot all the nodes

-install IPoIB, because "we need it for management"

-IPoIB needs a common /etc/hosts file which needs to be pushed to all the 
nodes, a new subnet, etc...



The vendor's argument are that their method is better because:
-it is "best practice"

-it is easy to drive through their menu system of their special software

-this is their recommended setup and will make it easier for us to work with 
their technical support

-it will be very easy to upgrade their driver to all the nodes when we need to 
do so

-even though advantages are not obvious now, our cluster will grow to hundreds 
of node, at which point it will make sense

-they do this every day, they know!




My arguments against it are:
-it adds a lot more complexity without giving us any advantage at this point

-I see IPoIB as a liability: it could be used to transfer large files or what 
not, and limit the performance of the whole fabric which has been put in place 
specifically for MPI. This might affect MPI performance and ultimately make 
our jobs fail - we put IB in place because MPI over ethernet wasn't fast 
enough for our apps.

-I need to do more testing, but I am pretty sure their special software does 
work fine talking over ethernet, without IPoIB.

-the stated intention is not to grow our cluster, but to upgrade the nodes as 
faster hardware becomes available

-although the vendor probably does a lot of installs, it is unlikely that they 
manage a production cluster, especially part of a UNIX team which manages more 
than just that cluster.

-if we end up growing to a significant number of nodes, then we will deal with 
it then. At that point we will have to deal with a lot of other issues, like 
using multiple switches, probably managing the subnets. It is likely that this 
will become a project on its own, which will take care of a new/different 
method of managing it (if needed).

-it is not reasonable to use a combine to mow the grass in our backyard, 
because we think we might turn it into a farm.

-it is very unlikely that we will change the IB driver before we upgrade to a 
new OS. Our experience with our previous cluster (no IB), was no upgrade 
whatsoever until the software vendor tells us to upgrade the whole OS.

-I'd still feel more comfortable rebuilding my OS, do thorough testing etc... 
if I really had to upgrade the IB drivers (which are kernel modules), rather 
than use a script to push the drivers onto the nodes.

-The special software, by default, will re-install the IB driver on all the 
nodes. I see this as an additional risk, it would be very easy to push it to 
all the nodes and break jobs that have been running for days, while we just 
wanted to add/rebuild a  node.

-it is unlikely, but, if we were to ever use a different HBA on some new 
nodes, their method falls apart, while I can easily create a different 
kickstart profile to manage it.

-new members of the UNIX team can easily be trained on the basic concepts of 
IB and the three or four commands needed to diagnose easy issues. They can 
build/re-build a node with no knowledge of IB at all. With the vendor's 
method, they now need to become familiar with their special software.



I need a reality check here, and would really appreciate feed back from people 
experience in managing IB. Thanks.

-- 
Yves.                                                  http://www.SollerS.ca/
                                                     http://images.SollerS.ca/
                                                           xmpp:[email protected]
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

[lopsa-tech] Infiniband installation and management

Reply via email to