Agree with Rémi but adding a few comments below. On Sat, 23 May 2015 08:49:13 -0700 Rémi Palancher <[email protected]> wrote:
> > Le 23/05/2015 01:29, Trevor Gale a écrit : > > > > Hello all, > > > > I am new to slurm and HPC in general, and am trying to learn how > > other pieces of software interact with slurm. I was wondering if > > any changes need to be made to the slurm.conf when using an > > infiniband network (e.g. using LIDs or GUIDs instead of IPs for > > node addresses) and if you use the MLNX_OFED inifiniband software > > available on mellanox’s website. Also I was wondering what MPI > > works best with slurm and an IB network. > > Hi Trevor, > > Actually, Slurm does not do much with IB. Whatever the underlying > network (IB, Ethernet, and so on), Slurm components require IPv4 > addresses to communicate, it is not able to use low-level protocols > under IP-o-IB. That is all true but slurm does handle topology for the IB network. You (can) make a topology config file and this allows better-than-random job placements (YMMV). > If you have Mellanox hardware it is better to use Mellanox flavor of > OFED to get all the performances with last Mellanox technologies > (MXM, FCA, etc) and get technical support from them. Customers may value the simplicity of updates when using the dist. provided ib stack over the latest and greatest features of mlnxofed. > Most major implementations of MPI (Open MPI, MPICH, Intel MPI and > probably others) are tightly integrated with Slurm, esp. when PMI > support is enable. There are many levels of integration with slurm including but not limited to: * mpi does not understand slurm but is fed an auto-created hostfile * mpi understands slurm environment and launches ranks on allocated nodes but via, maybe, ssh (or other non-slurm mech.) * mpi understands slurm environment and uses srun to launch on other nodes (like for example intelmpi "mpiexec.hydra -bootstrap slurm"). * srun used to run mpi binary and remote process launch via PMI /Peter K
