Thanks - "--map-by numa:span" did exactly what I wanted! On Wed, May 15, 2019 at 10:34 PM Ralph Castain via users < users@lists.open-mpi.org> wrote:
> > > > On May 15, 2019, at 7:18 PM, Adam Sylvester via users < > users@lists.open-mpi.org> wrote: > > > > Up to this point, I've been running a single MPI rank per physical host > (using multithreading within my application to use all available cores). I > use this command: > > mpirun -N 1 --bind-to none --hostfile hosts.txt > > Where hosts.txt has an IP address on each line > > > > I've started running on machines with significant NUMA effects... on a > single one of these machines, I've started running a separate rank per NUMA > node. On a machine with 64 CPUs and 4 NUMA nodes, I do this: > > mpirun -N 1 --bind-to numa > > I've convinced myself by watching the processors that are active on > 'top' that this is behaving like I want it to. > > > > I now want to combine these two - running on, say, 10 physical hosts > with 4 NUMA nodes - a total of 40 ranks. But, the order of the ranks is > important (for efficiency, due to how the application divides up work > across ranks). So, I want ranks 0-3 to be on host 0 across its NUMA nodes, > then ranks 4-7 on host 1 across its NUMA nodes, etc. > > > > Some guesses: > > mpirun -n 40 --map-by numa --rank-by numa --hostfile hosts.txt > ^^^^^^^^^^^^^^^^^^^^^^ > This is the one you want. If you want it “load balanced” (i.e., you want > to round-robin across all the numas before adding a second proc to one of > them), then change the map-by option to be “--map-by numa:span” so it > treats all the numa regions as if they were on one gigantic node and > round-robins across them. Then you won’t need any “slots” argument > regardless of how many procs total you execute (even if you want to put > some extras on the first numa nodes). Note that the above cmd line will > default to “--bind-to numa” to match the mapping policy unless you tell it > otherwise. > > > > or > > mpirun --map-by ppr:4:node --rank-by numa --hostfile hosts.txt > > Where hosts.txt still has a single IP address per line (and doesn't need > a 'slots=4') > > > > I'd like to make sure I get the syntax right in general and not just > empirically try guesses until one looks like it works... and find > inevitably it doesn't work like I thought when I change the # of machines > or run on machines with a different # of NUMA nodes. > > > > Thanks. > > -Adam > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users