I am testing Intel MPI under Slurm and have the recommended method working, i.e.,
I_MPI_PMI_LIBRARY=<slurmdir>/lib64/libpmi.so srun myimpirog However Intel MPI recommends another startup method salloc -N 1 mpiexec.hydra -bootstrap jmi -n 2 ./myimpiprog. Now I'm not sure what are the pros and cons of JMI but launching fails as it seems to invoke srun with the short name (as in my slurm.conf) but changes to the FQDN, this causes srun to fail as the "requested node configuration is not available". I'd like to check if the short name --> FQDN is Slurm or JMI/PMI weirdness. I am testing with Slurm 14.03.3-2 and Intel MPI 4.1.3.0249/5.0.0.016. Here's the debug from Intel MPI with JMI, note the switch from short names to FQDNs. hostname on the nodes returns the short name, and SLURM_NODELIST also shows the shortnames. [mpiexec@builder] Launch arguments: /opt/intel/impi/4.1.3/bin64/pmi_proxy --control-port builder:38756 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk slurm --launcher jmi --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 313688655 --proxy-id -1 [jmi-slurm@builder] Launch arguments: srun --nodelist builder,ruchba -N 2 -n 2 /opt/intel/impi/4.1.3/bin64/pmi_proxy --control-port builder:38756 --debug --pmi-connect lazy-cache --pmi-aggregate -s 0 --rmk slurm --launcher jmi --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 313688655 --proxy-id -1 [mpiexec@builder] STDIN will be redirected to 1 fd(s): 8 [proxy:0:0@builder] Start PMI_proxy 0 [jmi-slurm@builder] Launch arguments: srun --nodelist builder.hpc8888.com-N 1 -n 1 ./hello [jmi-slurm@builder] Launch arguments: srun --nodelist builder.hpc8888.com-N 1 -n 1 ./hello [proxy:0:0@builder] STDIN will be redirected to 1 fd(s): 8 [proxy:0:1@ruchba] Start PMI_proxy 1 [jmi-slurm@ruchba] Launch arguments: srun --nodelist ruchba.hpc8888.com -N 1 -n 1 ./hello [jmi-slurm@ruchba] Launch arguments: srun --nodelist ruchba.hpc8888.com -N 1 -n 1 ./hello - Anthony
