Re: [OMPI users] OpenMPI slowdown in latency bound application

Nathan Hjelm via users Wed, 28 Aug 2019 15:56:24 -0700

Is this overall runtime or solve time? The former is essentially meaningless as 
it includes all the startup time (launch, connections, etc). Especially since 
we are talking about seconds here.


-Nathan

> On Aug 28, 2019, at 9:10 AM, Cooper Burns via users 
> <users@lists.open-mpi.org> wrote:
> 
> Peter,
> 
> It looks like:
> Node0:
> rank0, rank1, rank2, etc..
> Node1:
> rank12, rank13, etc
> etc
> 
> So the mapping looks good to me.
> 
> Thanks,
> Cooper
> Cooper Burns
> Senior Research Engineer
>     
> 
> (608) 230-1551
> convergecfd.com
> 
> 
> 
>> On Wed, Aug 28, 2019 at 10:50 AM Peter Kjellström <c...@nsc.liu.se> wrote:
>> On Wed, 28 Aug 2019 09:45:15 -0500
>> Cooper Burns <cooper.bu...@convergecfd.com> wrote:
>> 
>> > Peter,
>> > 
>> > Thanks for your input!
>> > I tried some things:
>> > 
>> > *1) The app was placed/pinned differently by the two MPIs. Often this
>> > would probably not cause such a big difference.*
>> > I agree this is unlikely the cause, however I tried various
>> > configurations of map-by, bind-to, etc and none of them had any
>> > measurable impact at all, which points to this not being the cause
>> > (as you suspected)
>> 
>> OK, there's still one thing to rule out, which rank was placed on which
>> node.
>> 
>> For OpenMPI you can pass "-report-bindings" and verify that the first N
>> ranks are placed on the first node (for N cores or ranks per node).
>> 
>> node0: r0 r4 r8 ...
>> node1: r1 ...
>> node2: r2 ...
>> node3: r3 ...
>> 
>> vs
>> 
>> node0: r0 r1 r2 r3 ...
>> 
>> > *2) Bad luck wrt collective performance. Different MPIs have
>> > different weak spots across the parameter space of
>> > numranks,transfersize,mpi-coll**ective.* This is possible... But the
>> > magnitude of the runtime difference seems too large to me... Are
>> > there any options we can give to OMPI to cause it to use different
>> > collective algorithms so that we can test this theory?
>> 
>> It can certainly cause the observed difference. I've seen very large
>> differences...
>> 
>> To get collective tunables from OpenMPI do something like:
>> 
>>  ompi_info --param coll all --level 5
>> 
>> But it will really help to know or suspect what collectives the
>> application depend on.
>> 
>> For example, if you suspected alltoall to be a factor you could sweep
>> all valid alltoall algorithms by setting:
>> 
>>  -mca coll coll_tuned_alltoall_algorithm X
>> 
>> Where X is 0..6 in my case (ompi_info returned: 0 ignore, 1 basic
>> linear, 2 bruck, 3 recursive doubling, 4 ring, 5 neighbor exchange, 6:
>> two proc only.)
>> 
>> /Peter
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI slowdown in latency bound application

Reply via email to