Thanks for the response. I'm using 2.2.1. We'll try comparing some "real work" jobs. Mostly, we weren't sure if 60 seconds was a "reasonable" amount of overhead.
Cheers, Andy On Feb 10, 2011, at 1:04 PM, Jette, Moe wrote: > Andy, > > SLURM's implementation of PMI uses the MPI_Init to collect and redistribute > all of the key-pairs to all > of the tasks, which involves a fair-bit of data movement. Other PMI > implementations may not distribute > this information as part of MPI_Init, but only as the information is needed. > Which would result in > faster startup, but you would face delays when starting to move data. > > There were some improvements in the scalability of this logic performed by > Hongjia Cao at NUDT for > the Thianhe-1A computer (fastest computer in the world) and found in SLURM > v2.2. If you are not > running v2.2, upgrading should help with scalability. > > Moe > > ________________________________________ > From: [email protected] [[email protected]] On > Behalf Of Andrew Roosen [[email protected]] > Sent: Monday, February 07, 2011 10:19 AM > To: [email protected] > Subject: [slurm-dev] slow MVAPICH2 startup with SLURM PMI > > HI, > We've a cluster with 64 compute nodes, each with 4x12-core processors > connected via GigE and Mellanox ConnectX-2 Infiniband. So not small, but not > huge, either. > > If I run a "null" mvapich2 program (just MPI_Init and MPI_Finalize) linked > against SLURM's libpmi: > time srun -p all -n 3000 ./mpinothing > it takes about a minute or so to finish (depending on the particular run; > 45-90s). > > I can run the same code not linked against SLURM's PMI: > time salloc -p all -n 3000 mpiexec.hydra -bootstrap slurm ./mpinothing > and it completes pretty consistently in about 17 seconds. > > Is this to be expected? > > I've tried tweaking the PMI environment variables without any significant > change. "scontrol show config" attached. > > Cheers, > Andy Roosen >
