On Fri, Feb 11, 2011 at 9:27 AM, Andrew Roosen <[email protected]> wrote:
> Thanks for the response.
>
> I'm using 2.2.1. We'll try comparing some "real work" jobs. Mostly, we 
> weren't sure if 60 seconds was a "reasonable" amount of overhead.
>

Hello from the MVAPICH2 team. We have done some recent work on
improving the startup time for MVAPICH2. For the time being, the
fastest launcher for MVAPICH2 will be mpirun_rsh (distributed with
MVAPICH2). The soon to be released MVAPICH2-1.6-rc3 will feature these
changes.

We are also actively working on optimizing the launch time with other
PMI implementations, and these updates will be available in a later
release.

Thanks.

> Cheers,
> Andy
>
> On Feb 10, 2011, at 1:04 PM, Jette, Moe wrote:
>
>> Andy,
>>
>> SLURM's implementation of PMI uses the MPI_Init to collect and redistribute 
>> all of the key-pairs to all
>> of the tasks, which involves a fair-bit of data movement. Other PMI 
>> implementations may not distribute
>> this information as part of MPI_Init, but only as the information is needed. 
>> Which would result in
>> faster startup, but you would face delays when starting to move data.
>>
>> There were some improvements in the scalability of this logic performed by 
>> Hongjia Cao at NUDT for
>> the Thianhe-1A computer (fastest computer in the world) and found in SLURM 
>> v2.2. If you are not
>> running v2.2, upgrading should help with scalability.
>>
>> Moe
>>
>> ________________________________________
>> From: [email protected] [[email protected]] On 
>> Behalf Of Andrew Roosen [[email protected]]
>> Sent: Monday, February 07, 2011 10:19 AM
>> To: [email protected]
>> Subject: [slurm-dev] slow MVAPICH2 startup with SLURM PMI
>>
>> HI,
>> We've a cluster with 64 compute nodes, each with 4x12-core processors 
>> connected via GigE and Mellanox ConnectX-2 Infiniband. So not small, but not 
>> huge, either.
>>
>> If I run a "null" mvapich2 program (just MPI_Init and MPI_Finalize) linked 
>> against SLURM's libpmi:
>>        time srun -p all -n 3000 ./mpinothing
>> it takes about a minute or so to finish (depending on the particular run; 
>> 45-90s).
>>
>> I can run the same code not linked against SLURM's PMI:
>>        time salloc -p all -n 3000 mpiexec.hydra -bootstrap slurm ./mpinothing
>> and it completes pretty consistently in about 17 seconds.
>>
>> Is this to be expected?
>>
>> I've tried tweaking the PMI environment variables without any significant 
>> change.  "scontrol show config" attached.
>>
>> Cheers,
>> Andy Roosen
>>
>
>
>



-- 
Sayantan Sur

Research Scientist
Department of Computer Science
http://www.cse.ohio-state.edu/~surs

Reply via email to