Thanks - "--map-by numa:span" did exactly what I wanted!

On Wed, May 15, 2019 at 10:34 PM Ralph Castain via users <
users@lists.open-mpi.org> wrote:

>
>
> > On May 15, 2019, at 7:18 PM, Adam Sylvester via users <
> users@lists.open-mpi.org> wrote:
> >
> > Up to this point, I've been running a single MPI rank per physical host
> (using multithreading within my application to use all available cores).  I
> use this command:
> > mpirun -N 1 --bind-to none --hostfile hosts.txt
> > Where hosts.txt has an IP address on each line
> >
> > I've started running on machines with significant NUMA effects... on a
> single one of these machines, I've started running a separate rank per NUMA
> node.  On a machine with 64 CPUs and 4 NUMA nodes, I do this:
> > mpirun -N 1 --bind-to numa
> > I've convinced myself by watching the processors that are active on
> 'top' that this is behaving like I want it to.
> >
> > I now want to combine these two - running on, say, 10 physical hosts
> with 4 NUMA nodes - a total of 40 ranks.  But, the order of the ranks is
> important (for efficiency, due to how the application divides up work
> across ranks).  So, I want ranks 0-3 to be on host 0 across its NUMA nodes,
> then ranks 4-7 on host 1 across its NUMA nodes, etc.
> >
> > Some guesses:
> > mpirun -n 40 --map-by numa --rank-by numa --hostfile hosts.txt
>    ^^^^^^^^^^^^^^^^^^^^^^
> This is the one you want. If you want it “load balanced” (i.e., you want
> to round-robin across all the numas before adding a second proc to one of
> them), then change the map-by option to be “--map-by numa:span” so it
> treats all the numa regions as if they were on one gigantic node and
> round-robins across them. Then you won’t need any “slots” argument
> regardless of how many procs total you execute (even if you want to put
> some extras on the first numa nodes). Note that the above cmd line will
> default to “--bind-to numa” to match the mapping policy unless you tell it
> otherwise.
>
>
> > or
> > mpirun --map-by ppr:4:node --rank-by numa --hostfile hosts.txt
> > Where hosts.txt still has a single IP address per line (and doesn't need
> a 'slots=4')
> >
> > I'd like to make sure I get the syntax right in general and not just
> empirically try guesses until one looks like it works... and find
> inevitably it doesn't work like I thought when I change the # of machines
> or run on machines with a different # of NUMA nodes.
> >
> > Thanks.
> > -Adam
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to