Hi,

I'm trying out the MPI integration in slurm 2.5.7, and I stumbled upon 
something weird with mvapich2 and pmi2.

While the MPI guide at http://slurm.schedmd.com/mpi_guide.html#mvapich2 
says that one should link with "-lpmi" and use "srun --mpi=none" instead 
of pmi2 that is recommended for mpich, mvapich2 is related to mpich and 
recent versions should thus support the new pmi2 as well.

Now, our mvapich2 version 1.9 installation has not been built with pmi2 
support; mpirun -info shows:

     Process Manager:                         pmi
     Launchers available:                     ssh rsh fork slurm ll lsf 
sge manual persist
     Resource management kernels available:   user slurm ll lsf sge pbs 
cobalt

Looking at the mpi library with readelf shows there are no symbols named 
"PMI2*", plenty of "PMI*" symbols though.

However, just for kicks I did launch a test job with "srun --mpi=pmi2", 
and surprisingly, it appears to work. For comparison, the documented 
"srun --mpi=none" and linking the application with "-lpmi" also works, 
while other more or less nonsensical combinations don't work, as 
expected. Any idea what's going on? Is this some kind of backwards 
compatibility in the pmi2 support and it's supposed to work, or does it 
somehow work just by chance and will likely break in the future?


-- 
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & BECS
+358503841576 || [email protected]

Reply via email to