Is this overall runtime or solve time? The former is essentially meaningless as 
it includes all the startup time (launch, connections, etc). Especially since 
we are talking about seconds here.

-Nathan

> On Aug 28, 2019, at 9:10 AM, Cooper Burns via users 
> <users@lists.open-mpi.org> wrote:
> 
> Peter,
> 
> It looks like:
> Node0:
> rank0, rank1, rank2, etc..
> Node1:
> rank12, rank13, etc
> etc
> 
> So the mapping looks good to me.
> 
> Thanks,
> Cooper
> Cooper Burns
> Senior Research Engineer
>     
> 
> (608) 230-1551
> convergecfd.com
> 
> 
> 
>> On Wed, Aug 28, 2019 at 10:50 AM Peter Kjellström <c...@nsc.liu.se> wrote:
>> On Wed, 28 Aug 2019 09:45:15 -0500
>> Cooper Burns <cooper.bu...@convergecfd.com> wrote:
>> 
>> > Peter,
>> > 
>> > Thanks for your input!
>> > I tried some things:
>> > 
>> > *1) The app was placed/pinned differently by the two MPIs. Often this
>> > would probably not cause such a big difference.*
>> > I agree this is unlikely the cause, however I tried various
>> > configurations of map-by, bind-to, etc and none of them had any
>> > measurable impact at all, which points to this not being the cause
>> > (as you suspected)
>> 
>> OK, there's still one thing to rule out, which rank was placed on which
>> node.
>> 
>> For OpenMPI you can pass "-report-bindings" and verify that the first N
>> ranks are placed on the first node (for N cores or ranks per node).
>> 
>> node0: r0 r4 r8 ...
>> node1: r1 ...
>> node2: r2 ...
>> node3: r3 ...
>> 
>> vs
>> 
>> node0: r0 r1 r2 r3 ...
>> 
>> > *2) Bad luck wrt collective performance. Different MPIs have
>> > different weak spots across the parameter space of
>> > numranks,transfersize,mpi-coll**ective.* This is possible... But the
>> > magnitude of the runtime difference seems too large to me... Are
>> > there any options we can give to OMPI to cause it to use different
>> > collective algorithms so that we can test this theory?
>> 
>> It can certainly cause the observed difference. I've seen very large
>> differences...
>> 
>> To get collective tunables from OpenMPI do something like:
>> 
>>  ompi_info --param coll all --level 5
>> 
>> But it will really help to know or suspect what collectives the
>> application depend on.
>> 
>> For example, if you suspected alltoall to be a factor you could sweep
>> all valid alltoall algorithms by setting:
>> 
>>  -mca coll coll_tuned_alltoall_algorithm X
>> 
>> Where X is 0..6 in my case (ompi_info returned: 0 ignore, 1 basic
>> linear, 2 bruck, 3 recursive doubling, 4 ring, 5 neighbor exchange, 6:
>> two proc only.)
>> 
>> /Peter
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to