Hi, I'm trying out the MPI integration in slurm 2.5.7, and I stumbled upon something weird with mvapich2 and pmi2.
While the MPI guide at http://slurm.schedmd.com/mpi_guide.html#mvapich2 says that one should link with "-lpmi" and use "srun --mpi=none" instead of pmi2 that is recommended for mpich, mvapich2 is related to mpich and recent versions should thus support the new pmi2 as well. Now, our mvapich2 version 1.9 installation has not been built with pmi2 support; mpirun -info shows: Process Manager: pmi Launchers available: ssh rsh fork slurm ll lsf sge manual persist Resource management kernels available: user slurm ll lsf sge pbs cobalt Looking at the mpi library with readelf shows there are no symbols named "PMI2*", plenty of "PMI*" symbols though. However, just for kicks I did launch a test job with "srun --mpi=pmi2", and surprisingly, it appears to work. For comparison, the documented "srun --mpi=none" and linking the application with "-lpmi" also works, while other more or less nonsensical combinations don't work, as expected. Any idea what's going on? Is this some kind of backwards compatibility in the pmi2 support and it's supposed to work, or does it somehow work just by chance and will likely break in the future? -- Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist Aalto University School of Science, PHYS & BECS +358503841576 || [email protected]
