Peter, It looks like: Node0: rank0, rank1, rank2, etc.. Node1: rank12, rank13, etc etc
So the mapping looks good to me. Thanks, Cooper Cooper Burns Senior Research Engineer <https://www.linkedin.com/company/convergent-science-inc> <https://www.facebook.com/ConvergentScience> <https://twitter.com/convergecfd> <https://www.youtube.com/user/convergecfd> <https://vimeo.com/convergecfd> (608) 230-1551 convergecfd.com <https://convergecfd.com/?utm_source=Email&utm_medium=signature&utm_campaign=CSIEmailSignature> On Wed, Aug 28, 2019 at 10:50 AM Peter Kjellström <c...@nsc.liu.se> wrote: > On Wed, 28 Aug 2019 09:45:15 -0500 > Cooper Burns <cooper.bu...@convergecfd.com> wrote: > > > Peter, > > > > Thanks for your input! > > I tried some things: > > > > *1) The app was placed/pinned differently by the two MPIs. Often this > > would probably not cause such a big difference.* > > I agree this is unlikely the cause, however I tried various > > configurations of map-by, bind-to, etc and none of them had any > > measurable impact at all, which points to this not being the cause > > (as you suspected) > > OK, there's still one thing to rule out, which rank was placed on which > node. > > For OpenMPI you can pass "-report-bindings" and verify that the first N > ranks are placed on the first node (for N cores or ranks per node). > > node0: r0 r4 r8 ... > node1: r1 ... > node2: r2 ... > node3: r3 ... > > vs > > node0: r0 r1 r2 r3 ... > > > *2) Bad luck wrt collective performance. Different MPIs have > > different weak spots across the parameter space of > > numranks,transfersize,mpi-coll**ective.* This is possible... But the > > magnitude of the runtime difference seems too large to me... Are > > there any options we can give to OMPI to cause it to use different > > collective algorithms so that we can test this theory? > > It can certainly cause the observed difference. I've seen very large > differences... > > To get collective tunables from OpenMPI do something like: > > ompi_info --param coll all --level 5 > > But it will really help to know or suspect what collectives the > application depend on. > > For example, if you suspected alltoall to be a factor you could sweep > all valid alltoall algorithms by setting: > > -mca coll coll_tuned_alltoall_algorithm X > > Where X is 0..6 in my case (ompi_info returned: 0 ignore, 1 basic > linear, 2 bruck, 3 recursive doubling, 4 ring, 5 neighbor exchange, 6: > two proc only.) > > /Peter >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users