Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code
OMP_NUM_THREADS=1 mpiexec -n 1 gnu_openmpi_a/one_c_prof.exe : 113 iterations OMP_NUM_THREADS=6 mpiexec -n 1 --map-by node:PE=6 : 639 iterations OMP_NUM_THREADS=6 mpiexec -n 2 --map-by node:PE=6 : 639 iterations OMP_NUM_THREADS=12 mpiexec -n 1 --map-by node:PE=12 : 1000 iterations OMP_NUM_THREADS=12 mpiexec -n 2 --use-hwthread-cpus --map-by node:PE=12 : 646 iterations that's looking better, with limited gain for 1 process on 2 chips. Thanks. I am testing Allineas profiler, and our goal is to point out bad practice, so I need to run all sorts of pathological cases. Now to see what our software thinks Thanks for your help John On 8 December 2014 at 15:57, Ralph Castainwrote: > Thanks for sending that lstopo output - helped clarify things for me. I > think I now understand the issue. Mostly a problem of my being rather dense > when reading your earlier note. > > Try using —map-by node:PE=N to your cmd line. I think the problem is that > we default to —map-by numa if you just give cpus-per-proc and no mapping > directive as we know that having threads that span multiple numa regions is > bad for performance > > > > On Dec 5, 2014, at 9:07 AM, John Bray wrote: > > > > Hi Ralph > > > > I have a motherboard with 2 X6580 chips, each with 6 cores 2 way > hyperthreading, so /proc/cpuinfo reports 24 cores > > > > Doing a pure compute OpenMP loop where I'd expect the number of > iterations in 10s to rise with number of threads > > with gnu and mpich > > OMP_NUM_THREADS=1 -n 1 : 112 iterations > > OMP_NUM_THREADS=2 -n 1 : 224 iterations > > OMP_NUM_THREADS=6 -n 1 : 644 iterations > > OMP_NUM_THREADS=12 -n 1 : 1287 iterations > > OMP_NUM_THREADS=22 -n 1 : 1182 iterations > > OMP_NUM_THREADS=24 -n 1 : 454 iterations > > > > which shows that mpich is spreading across the cores, but hyperthreading > is not useful, and using the whole node counterproductive > > > > with gnu and openmpi 1.8.3 > > OMP_NUM_THREADS=1 mpiexec -n 1 : 112 > > OMP_NUM_THREADS=2 mpiexec -n 1 : 113 > > which suggests you aren't allowing the threads to spread across cores > > > > adding --cpus-per-node I gain access to the resources on one chip > > > > OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112 > > OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224 > > OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644 > > then > > OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1 > > > > A request for multiple cpus-per-proc was given, but a directive > > was also give to map to an object level that has less cpus than > > requested ones: > > > > #cpus-per-proc: 12 > > number of cpus: 6 > > map-by: BYNUMA > > > > So you aren't happy using both chips for one process > > > > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus : > 112 > > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : > 112 > > OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : > 224 > > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus : > 324 > > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : > 631 > > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : > 647 > > > > OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus > > > > A request for multiple cpus-per-proc was given, but a directive > > was also give to map to an object level that has less cpus than > > requested ones: > > > > #cpus-per-proc: 24 > > number of cpus: 12 > > map-by: BYNUMA > > > > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : > 112 > > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : > 224 > > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :: > 644 > > > > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus > :: 644 > > > > A request for multiple cpus-per-proc was given, but a directive > > was also give to map to an object level that has less cpus than > > requested ones: > > > > #cpus-per-proc: 24 > > number of cpus: 12 > > map-by: BYNUMA > > > > So it seems that --use-hwthread-cpus means that --cpus-per-proc changes > from physical cores to hyperthreaded cores, but I can't get both chips > working on the problem in way mpich can > > > > John > > > > > > > > > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25919.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25927.php
Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code
Thanks for sending that lstopo output - helped clarify things for me. I think I now understand the issue. Mostly a problem of my being rather dense when reading your earlier note. Try using —map-by node:PE=N to your cmd line. I think the problem is that we default to —map-by numa if you just give cpus-per-proc and no mapping directive as we know that having threads that span multiple numa regions is bad for performance > On Dec 5, 2014, at 9:07 AM, John Braywrote: > > Hi Ralph > > I have a motherboard with 2 X6580 chips, each with 6 cores 2 way > hyperthreading, so /proc/cpuinfo reports 24 cores > > Doing a pure compute OpenMP loop where I'd expect the number of iterations in > 10s to rise with number of threads > with gnu and mpich > OMP_NUM_THREADS=1 -n 1 : 112 iterations > OMP_NUM_THREADS=2 -n 1 : 224 iterations > OMP_NUM_THREADS=6 -n 1 : 644 iterations > OMP_NUM_THREADS=12 -n 1 : 1287 iterations > OMP_NUM_THREADS=22 -n 1 : 1182 iterations > OMP_NUM_THREADS=24 -n 1 : 454 iterations > > which shows that mpich is spreading across the cores, but hyperthreading is > not useful, and using the whole node counterproductive > > with gnu and openmpi 1.8.3 > OMP_NUM_THREADS=1 mpiexec -n 1 : 112 > OMP_NUM_THREADS=2 mpiexec -n 1 : 113 > which suggests you aren't allowing the threads to spread across cores > > adding --cpus-per-node I gain access to the resources on one chip > > OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112 > OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224 > OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644 > then > OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1 > > A request for multiple cpus-per-proc was given, but a directive > was also give to map to an object level that has less cpus than > requested ones: > > #cpus-per-proc: 12 > number of cpus: 6 > map-by: BYNUMA > > So you aren't happy using both chips for one process > > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus : 112 > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112 > OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224 > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus : 324 > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 631 > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 647 > > OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus > > A request for multiple cpus-per-proc was given, but a directive > was also give to map to an object level that has less cpus than > requested ones: > > #cpus-per-proc: 24 > number of cpus: 12 > map-by: BYNUMA > > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112 > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224 > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :: 644 > > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus :: 644 > > A request for multiple cpus-per-proc was given, but a directive > was also give to map to an object level that has less cpus than > requested ones: > > #cpus-per-proc: 24 > number of cpus: 12 > map-by: BYNUMA > > So it seems that --use-hwthread-cpus means that --cpus-per-proc changes from > physical cores to hyperthreaded cores, but I can't get both chips working on > the problem in way mpich can > > John > > > > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25919.php
Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code
lstopo is pretty! John
Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code
We may be getting hung up on terminology, but my guess is that the problem is one of accurately understanding how many cores you have vs ht’s.Can you run lstopo and see what it thinks is there? If you haven’t installed that, you can just run “mpirun -mca ess_base_verbose 10 -n 1 hostname” to get the info > On Dec 5, 2014, at 9:07 AM, John Braywrote: > > Hi Ralph > > I have a motherboard with 2 X6580 chips, each with 6 cores 2 way > hyperthreading, so /proc/cpuinfo reports 24 cores > > Doing a pure compute OpenMP loop where I'd expect the number of iterations in > 10s to rise with number of threads > with gnu and mpich > OMP_NUM_THREADS=1 -n 1 : 112 iterations > OMP_NUM_THREADS=2 -n 1 : 224 iterations > OMP_NUM_THREADS=6 -n 1 : 644 iterations > OMP_NUM_THREADS=12 -n 1 : 1287 iterations > OMP_NUM_THREADS=22 -n 1 : 1182 iterations > OMP_NUM_THREADS=24 -n 1 : 454 iterations > > which shows that mpich is spreading across the cores, but hyperthreading is > not useful, and using the whole node counterproductive > > with gnu and openmpi 1.8.3 > OMP_NUM_THREADS=1 mpiexec -n 1 : 112 > OMP_NUM_THREADS=2 mpiexec -n 1 : 113 > which suggests you aren't allowing the threads to spread across cores > > adding --cpus-per-node I gain access to the resources on one chip > > OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112 > OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224 > OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644 > then > OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1 > > A request for multiple cpus-per-proc was given, but a directive > was also give to map to an object level that has less cpus than > requested ones: > > #cpus-per-proc: 12 > number of cpus: 6 > map-by: BYNUMA > > So you aren't happy using both chips for one process > > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus : 112 > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112 > OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224 > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus : 324 > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 631 > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 647 > > OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus > > A request for multiple cpus-per-proc was given, but a directive > was also give to map to an object level that has less cpus than > requested ones: > > #cpus-per-proc: 24 > number of cpus: 12 > map-by: BYNUMA > > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112 > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224 > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :: 644 > > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus :: 644 > > A request for multiple cpus-per-proc was given, but a directive > was also give to map to an object level that has less cpus than > requested ones: > > #cpus-per-proc: 24 > number of cpus: 12 > map-by: BYNUMA > > So it seems that --use-hwthread-cpus means that --cpus-per-proc changes from > physical cores to hyperthreaded cores, but I can't get both chips working on > the problem in way mpich can > > John > > > > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25919.php
Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code
Hi Ralph I have a motherboard with 2 X6580 chips, each with 6 cores 2 way hyperthreading, so /proc/cpuinfo reports 24 cores Doing a pure compute OpenMP loop where I'd expect the number of iterations in 10s to rise with number of threads with gnu and mpich OMP_NUM_THREADS=1 -n 1 : 112 iterations OMP_NUM_THREADS=2 -n 1 : 224 iterations OMP_NUM_THREADS=6 -n 1 : 644 iterations OMP_NUM_THREADS=12 -n 1 : 1287 iterations OMP_NUM_THREADS=22 -n 1 : 1182 iterations OMP_NUM_THREADS=24 -n 1 : 454 iterations which shows that mpich is spreading across the cores, but hyperthreading is not useful, and using the whole node counterproductive with gnu and openmpi 1.8.3 OMP_NUM_THREADS=1 mpiexec -n 1 : 112 OMP_NUM_THREADS=2 mpiexec -n 1 : 113 which suggests you aren't allowing the threads to spread across cores adding --cpus-per-node I gain access to the resources on one chip OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112 OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224 OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644 then OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1 A request for multiple cpus-per-proc was given, but a directive was also give to map to an object level that has less cpus than requested ones: #cpus-per-proc: 12 number of cpus: 6 map-by: BYNUMA So you aren't happy using both chips for one process OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus : 112 OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112 OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224 OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus : 324 OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 631 OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 647 OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus A request for multiple cpus-per-proc was given, but a directive was also give to map to an object level that has less cpus than requested ones: #cpus-per-proc: 24 number of cpus: 12 map-by: BYNUMA OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112 OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224 OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :: 644 OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus :: 644 A request for multiple cpus-per-proc was given, but a directive was also give to map to an object level that has less cpus than requested ones: #cpus-per-proc: 24 number of cpus: 12 map-by: BYNUMA So it seems that --use-hwthread-cpus means that --cpus-per-proc changes from physical cores to hyperthreaded cores, but I can't get both chips working on the problem in way mpich can John
Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code
I’m trying to grok the problem, so bear with me a bit. It sounds like you have a machine with 12 physical cores, each with two hyperthreads, and you have HT turned on - correct? If that is true, then the problem is that you are attempting to bind-to core (of which you have 12), but asking for 2 cpus/proc. Since you haven’t told us to use HTs as cpus, we are using “cores” as cpus - so this cmd is actually asking us to bind each process to 2 cores, resulting in an overload. So you have two options: * remove the cpus-per-proc directive. Since you are binding to core, we will automatically bind each process to both HTs in the core, which is the result you want * add the —use-hwthread-cpus flag and change your binding request to “hwthread". This will treat each HT as a separate cpu, and we will bind each process to 2 HTs, effectively binding them to the core. The revised manpage that hopefully helps explain this better is in the upcoming 1.8.4 release. I’m also working on a page for the web site to better explain the new map/rank/bind system. HTH Ralph > On Dec 5, 2014, at 12:55 AM, John Braywrote: > > To run a hybrid MPI/OpenMP code on a hyperthreaded machine with 24 virtual > cores, I've been using -n 12 --cpus-per-proc 2 so I can use OMP_NUM_THREADS=2 > > I now see that --cpus-per-proc is deprecated in favour of --map-by, but I've > been struggling to find a conversion as the --map-by documentation is not > very clear. > > What should I use to bind 2 virtual cores to each process? > > After I use -n 12 --cpus-per-proc 2 I get > > A request was made to bind to that would result in binding more > processes than cpus on a resource: > >Bind to: CORE >Node:mic1 >#processes: 2 >#cpus: 1 > > and suggests I need an override option > > But this doesn't to match my request for 2 cores per process, almost the > reverse, having 2 processes per core. I don't think I'm overloading my > virtual cores anyway > > John > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25917.php