Thanks for the response.

I'm using 2.2.1. We'll try comparing some "real work" jobs. Mostly, we weren't 
sure if 60 seconds was a "reasonable" amount of overhead.

Cheers,
Andy

On Feb 10, 2011, at 1:04 PM, Jette, Moe wrote:

> Andy,
> 
> SLURM's implementation of PMI uses the MPI_Init to collect and redistribute 
> all of the key-pairs to all
> of the tasks, which involves a fair-bit of data movement. Other PMI 
> implementations may not distribute
> this information as part of MPI_Init, but only as the information is needed. 
> Which would result in 
> faster startup, but you would face delays when starting to move data.
> 
> There were some improvements in the scalability of this logic performed by 
> Hongjia Cao at NUDT for 
> the Thianhe-1A computer (fastest computer in the world) and found in SLURM 
> v2.2. If you are not
> running v2.2, upgrading should help with scalability.
> 
> Moe
> 
> ________________________________________
> From: [email protected] [[email protected]] On 
> Behalf Of Andrew Roosen [[email protected]]
> Sent: Monday, February 07, 2011 10:19 AM
> To: [email protected]
> Subject: [slurm-dev] slow MVAPICH2 startup with SLURM PMI
> 
> HI,
> We've a cluster with 64 compute nodes, each with 4x12-core processors 
> connected via GigE and Mellanox ConnectX-2 Infiniband. So not small, but not 
> huge, either.
> 
> If I run a "null" mvapich2 program (just MPI_Init and MPI_Finalize) linked 
> against SLURM's libpmi:
>        time srun -p all -n 3000 ./mpinothing
> it takes about a minute or so to finish (depending on the particular run; 
> 45-90s).
> 
> I can run the same code not linked against SLURM's PMI:
>        time salloc -p all -n 3000 mpiexec.hydra -bootstrap slurm ./mpinothing
> and it completes pretty consistently in about 17 seconds.
> 
> Is this to be expected?
> 
> I've tried tweaking the PMI environment variables without any significant 
> change.  "scontrol show config" attached.
> 
> Cheers,
> Andy Roosen
> 


Reply via email to