HI,
We've a cluster with 64 compute nodes, each with 4x12-core processors connected 
via GigE and Mellanox ConnectX-2 Infiniband. So not small, but not huge, either.

If I run a "null" mvapich2 program (just MPI_Init and MPI_Finalize) linked 
against SLURM's libpmi:
        time srun -p all -n 3000 ./mpinothing
it takes about a minute or so to finish (depending on the particular run; 
45-90s).

I can run the same code not linked against SLURM's PMI:
        time salloc -p all -n 3000 mpiexec.hydra -bootstrap slurm ./mpinothing 
and it completes pretty consistently in about 17 seconds. 

Is this to be expected?

I've tried tweaking the PMI environment variables without any significant 
change.  "scontrol show config" attached.

Cheers,
Andy Roosen

Attachment: scontrol.txt
Description: application/applefile

Reply via email to