On Fri, Feb 11, 2011 at 9:27 AM, Andrew Roosen <[email protected]> wrote: > Thanks for the response. > > I'm using 2.2.1. We'll try comparing some "real work" jobs. Mostly, we > weren't sure if 60 seconds was a "reasonable" amount of overhead. >
Hello from the MVAPICH2 team. We have done some recent work on improving the startup time for MVAPICH2. For the time being, the fastest launcher for MVAPICH2 will be mpirun_rsh (distributed with MVAPICH2). The soon to be released MVAPICH2-1.6-rc3 will feature these changes. We are also actively working on optimizing the launch time with other PMI implementations, and these updates will be available in a later release. Thanks. > Cheers, > Andy > > On Feb 10, 2011, at 1:04 PM, Jette, Moe wrote: > >> Andy, >> >> SLURM's implementation of PMI uses the MPI_Init to collect and redistribute >> all of the key-pairs to all >> of the tasks, which involves a fair-bit of data movement. Other PMI >> implementations may not distribute >> this information as part of MPI_Init, but only as the information is needed. >> Which would result in >> faster startup, but you would face delays when starting to move data. >> >> There were some improvements in the scalability of this logic performed by >> Hongjia Cao at NUDT for >> the Thianhe-1A computer (fastest computer in the world) and found in SLURM >> v2.2. If you are not >> running v2.2, upgrading should help with scalability. >> >> Moe >> >> ________________________________________ >> From: [email protected] [[email protected]] On >> Behalf Of Andrew Roosen [[email protected]] >> Sent: Monday, February 07, 2011 10:19 AM >> To: [email protected] >> Subject: [slurm-dev] slow MVAPICH2 startup with SLURM PMI >> >> HI, >> We've a cluster with 64 compute nodes, each with 4x12-core processors >> connected via GigE and Mellanox ConnectX-2 Infiniband. So not small, but not >> huge, either. >> >> If I run a "null" mvapich2 program (just MPI_Init and MPI_Finalize) linked >> against SLURM's libpmi: >> time srun -p all -n 3000 ./mpinothing >> it takes about a minute or so to finish (depending on the particular run; >> 45-90s). >> >> I can run the same code not linked against SLURM's PMI: >> time salloc -p all -n 3000 mpiexec.hydra -bootstrap slurm ./mpinothing >> and it completes pretty consistently in about 17 seconds. >> >> Is this to be expected? >> >> I've tried tweaking the PMI environment variables without any significant >> change. "scontrol show config" attached. >> >> Cheers, >> Andy Roosen >> > > > -- Sayantan Sur Research Scientist Department of Computer Science http://www.cse.ohio-state.edu/~surs
