Peter,

It looks like:
Node0:
rank0, rank1, rank2, etc..
Node1:
rank12, rank13, etc
etc

So the mapping looks good to me.

Thanks,
Cooper
Cooper Burns
Senior Research Engineer
<https://www.linkedin.com/company/convergent-science-inc>
<https://www.facebook.com/ConvergentScience>
<https://twitter.com/convergecfd>
<https://www.youtube.com/user/convergecfd>  <https://vimeo.com/convergecfd>
(608) 230-1551
convergecfd.com
<https://convergecfd.com/?utm_source=Email&utm_medium=signature&utm_campaign=CSIEmailSignature>


On Wed, Aug 28, 2019 at 10:50 AM Peter Kjellström <c...@nsc.liu.se> wrote:

> On Wed, 28 Aug 2019 09:45:15 -0500
> Cooper Burns <cooper.bu...@convergecfd.com> wrote:
>
> > Peter,
> >
> > Thanks for your input!
> > I tried some things:
> >
> > *1) The app was placed/pinned differently by the two MPIs. Often this
> > would probably not cause such a big difference.*
> > I agree this is unlikely the cause, however I tried various
> > configurations of map-by, bind-to, etc and none of them had any
> > measurable impact at all, which points to this not being the cause
> > (as you suspected)
>
> OK, there's still one thing to rule out, which rank was placed on which
> node.
>
> For OpenMPI you can pass "-report-bindings" and verify that the first N
> ranks are placed on the first node (for N cores or ranks per node).
>
> node0: r0 r4 r8 ...
> node1: r1 ...
> node2: r2 ...
> node3: r3 ...
>
> vs
>
> node0: r0 r1 r2 r3 ...
>
> > *2) Bad luck wrt collective performance. Different MPIs have
> > different weak spots across the parameter space of
> > numranks,transfersize,mpi-coll**ective.* This is possible... But the
> > magnitude of the runtime difference seems too large to me... Are
> > there any options we can give to OMPI to cause it to use different
> > collective algorithms so that we can test this theory?
>
> It can certainly cause the observed difference. I've seen very large
> differences...
>
> To get collective tunables from OpenMPI do something like:
>
>  ompi_info --param coll all --level 5
>
> But it will really help to know or suspect what collectives the
> application depend on.
>
> For example, if you suspected alltoall to be a factor you could sweep
> all valid alltoall algorithms by setting:
>
>  -mca coll coll_tuned_alltoall_algorithm X
>
> Where X is 0..6 in my case (ompi_info returned: 0 ignore, 1 basic
> linear, 2 bruck, 3 recursive doubling, 4 ring, 5 neighbor exchange, 6:
> two proc only.)
>
> /Peter
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to