Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code

2014-12-08 Thread John Bray
OMP_NUM_THREADS=1 mpiexec -n 1 gnu_openmpi_a/one_c_prof.exe : 113 iterations
OMP_NUM_THREADS=6 mpiexec -n 1 --map-by node:PE=6 : 639 iterations
OMP_NUM_THREADS=6 mpiexec -n 2 --map-by node:PE=6 : 639 iterations
OMP_NUM_THREADS=12 mpiexec -n 1 --map-by node:PE=12 : 1000 iterations
OMP_NUM_THREADS=12 mpiexec -n 2 --use-hwthread-cpus --map-by node:PE=12 :
646 iterations

that's looking better, with limited gain for 1 process on 2 chips. Thanks.
I am testing Allineas profiler, and our goal is to point out bad practice,
so I need to run all sorts of pathological cases. Now to see what our
software thinks

Thanks for your help

John

On 8 December 2014 at 15:57, Ralph Castain  wrote:

> Thanks for sending that lstopo output - helped clarify things for me. I
> think I now understand the issue. Mostly a problem of my being rather dense
> when reading your earlier note.
>
> Try using —map-by node:PE=N to your cmd line. I think the problem is that
> we default to —map-by numa if you just give cpus-per-proc and no mapping
> directive as we know that having threads that span multiple numa regions is
> bad for performance
>
>
> > On Dec 5, 2014, at 9:07 AM, John Bray  wrote:
> >
> > Hi Ralph
> >
> > I have a motherboard with 2 X6580 chips, each with 6 cores 2 way
> hyperthreading, so /proc/cpuinfo reports 24 cores
> >
> > Doing a pure compute OpenMP loop where I'd expect the number of
> iterations in 10s to rise with number of threads
> > with gnu and mpich
> > OMP_NUM_THREADS=1 -n 1 : 112 iterations
> > OMP_NUM_THREADS=2 -n 1 : 224 iterations
> > OMP_NUM_THREADS=6 -n 1 : 644 iterations
> > OMP_NUM_THREADS=12 -n 1 : 1287 iterations
> > OMP_NUM_THREADS=22 -n 1 : 1182 iterations
> > OMP_NUM_THREADS=24 -n 1 : 454 iterations
> >
> > which shows that mpich is spreading across the cores, but hyperthreading
> is not useful, and using the whole node counterproductive
> >
> > with gnu and openmpi 1.8.3
> > OMP_NUM_THREADS=1 mpiexec -n 1 : 112
> > OMP_NUM_THREADS=2 mpiexec -n 1 : 113
> > which suggests you aren't allowing the threads to spread across cores
> >
> > adding --cpus-per-node I gain access to the resources on one chip
> >
> > OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112
> > OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224
> > OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644
> > then
> > OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1
> >
> > A request for multiple cpus-per-proc was given, but a directive
> > was also give to map to an object level that has less cpus than
> > requested ones:
> >
> >   #cpus-per-proc:  12
> >   number of cpus:  6
> >   map-by:  BYNUMA
> >
> > So you aren't happy using both chips for one process
> >
> > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus :
> 112
> > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus :
> 112
> > OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus :
> 224
> > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus :
> 324
> > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :
> 631
> > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :
> 647
> >
> > OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus
> >
> > A request for multiple cpus-per-proc was given, but a directive
> > was also give to map to an object level that has less cpus than
> > requested ones:
> >
> >   #cpus-per-proc:  24
> >   number of cpus:  12
> >   map-by:  BYNUMA
> >
> > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus :
> 112
> > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus :
> 224
> > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus ::
> 644
> >
> > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus
> :: 644
> >
> > A request for multiple cpus-per-proc was given, but a directive
> > was also give to map to an object level that has less cpus than
> > requested ones:
> >
> >   #cpus-per-proc:  24
> >   number of cpus:  12
> >   map-by:  BYNUMA
> >
> > So it seems that --use-hwthread-cpus means that --cpus-per-proc changes
> from physical cores to hyperthreaded cores, but I can't get both chips
> working on the problem in way mpich can
> >
> > John
> >
> >
> >
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25919.php
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25927.php


Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code

2014-12-08 Thread Ralph Castain
Thanks for sending that lstopo output - helped clarify things for me. I think I 
now understand the issue. Mostly a problem of my being rather dense when 
reading your earlier note.

Try using —map-by node:PE=N to your cmd line. I think the problem is that we 
default to —map-by numa if you just give cpus-per-proc and no mapping directive 
as we know that having threads that span multiple numa regions is bad for 
performance


> On Dec 5, 2014, at 9:07 AM, John Bray  wrote:
> 
> Hi Ralph
> 
> I have a motherboard with 2 X6580 chips, each with 6 cores 2 way 
> hyperthreading, so /proc/cpuinfo reports 24 cores
> 
> Doing a pure compute OpenMP loop where I'd expect the number of iterations in 
> 10s to rise with number of threads
> with gnu and mpich
> OMP_NUM_THREADS=1 -n 1 : 112 iterations
> OMP_NUM_THREADS=2 -n 1 : 224 iterations
> OMP_NUM_THREADS=6 -n 1 : 644 iterations
> OMP_NUM_THREADS=12 -n 1 : 1287 iterations
> OMP_NUM_THREADS=22 -n 1 : 1182 iterations
> OMP_NUM_THREADS=24 -n 1 : 454 iterations
> 
> which shows that mpich is spreading across the cores, but hyperthreading is 
> not useful, and using the whole node counterproductive
> 
> with gnu and openmpi 1.8.3
> OMP_NUM_THREADS=1 mpiexec -n 1 : 112
> OMP_NUM_THREADS=2 mpiexec -n 1 : 113
> which suggests you aren't allowing the threads to spread across cores
> 
> adding --cpus-per-node I gain access to the resources on one chip
> 
> OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112
> OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224
> OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644
> then
> OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1
> 
> A request for multiple cpus-per-proc was given, but a directive
> was also give to map to an object level that has less cpus than
> requested ones:
> 
>   #cpus-per-proc:  12
>   number of cpus:  6
>   map-by:  BYNUMA
> 
> So you aren't happy using both chips for one process
> 
> OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus : 112
> OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112
> OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224
> OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus : 324
> OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 631
> OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 647
> 
> OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus 
> 
> A request for multiple cpus-per-proc was given, but a directive
> was also give to map to an object level that has less cpus than
> requested ones:
> 
>   #cpus-per-proc:  24
>   number of cpus:  12
>   map-by:  BYNUMA
> 
> OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112
> OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224
> OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :: 644
> 
> OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus :: 644
> 
> A request for multiple cpus-per-proc was given, but a directive
> was also give to map to an object level that has less cpus than
> requested ones:
> 
>   #cpus-per-proc:  24
>   number of cpus:  12
>   map-by:  BYNUMA
> 
> So it seems that --use-hwthread-cpus means that --cpus-per-proc changes from 
> physical cores to hyperthreaded cores, but I can't get both chips working on 
> the problem in way mpich can
> 
> John
> 
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/25919.php



Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code

2014-12-08 Thread John Bray
lstopo is pretty!

John


Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code

2014-12-05 Thread Ralph Castain
We may be getting hung up on terminology, but my guess is that the problem is 
one of accurately understanding how many cores you have vs ht’s.Can you run 
lstopo and see what it thinks is there?

If you haven’t installed that, you can just run “mpirun -mca ess_base_verbose 
10 -n 1 hostname” to get the info


> On Dec 5, 2014, at 9:07 AM, John Bray  wrote:
> 
> Hi Ralph
> 
> I have a motherboard with 2 X6580 chips, each with 6 cores 2 way 
> hyperthreading, so /proc/cpuinfo reports 24 cores
> 
> Doing a pure compute OpenMP loop where I'd expect the number of iterations in 
> 10s to rise with number of threads
> with gnu and mpich
> OMP_NUM_THREADS=1 -n 1 : 112 iterations
> OMP_NUM_THREADS=2 -n 1 : 224 iterations
> OMP_NUM_THREADS=6 -n 1 : 644 iterations
> OMP_NUM_THREADS=12 -n 1 : 1287 iterations
> OMP_NUM_THREADS=22 -n 1 : 1182 iterations
> OMP_NUM_THREADS=24 -n 1 : 454 iterations
> 
> which shows that mpich is spreading across the cores, but hyperthreading is 
> not useful, and using the whole node counterproductive
> 
> with gnu and openmpi 1.8.3
> OMP_NUM_THREADS=1 mpiexec -n 1 : 112
> OMP_NUM_THREADS=2 mpiexec -n 1 : 113
> which suggests you aren't allowing the threads to spread across cores
> 
> adding --cpus-per-node I gain access to the resources on one chip
> 
> OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112
> OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224
> OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644
> then
> OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1
> 
> A request for multiple cpus-per-proc was given, but a directive
> was also give to map to an object level that has less cpus than
> requested ones:
> 
>   #cpus-per-proc:  12
>   number of cpus:  6
>   map-by:  BYNUMA
> 
> So you aren't happy using both chips for one process
> 
> OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus : 112
> OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112
> OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224
> OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus : 324
> OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 631
> OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 647
> 
> OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus 
> 
> A request for multiple cpus-per-proc was given, but a directive
> was also give to map to an object level that has less cpus than
> requested ones:
> 
>   #cpus-per-proc:  24
>   number of cpus:  12
>   map-by:  BYNUMA
> 
> OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112
> OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224
> OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :: 644
> 
> OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus :: 644
> 
> A request for multiple cpus-per-proc was given, but a directive
> was also give to map to an object level that has less cpus than
> requested ones:
> 
>   #cpus-per-proc:  24
>   number of cpus:  12
>   map-by:  BYNUMA
> 
> So it seems that --use-hwthread-cpus means that --cpus-per-proc changes from 
> physical cores to hyperthreaded cores, but I can't get both chips working on 
> the problem in way mpich can
> 
> John
> 
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/25919.php



Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code

2014-12-05 Thread John Bray
Hi Ralph

I have a motherboard with 2 X6580 chips, each with 6 cores 2 way
hyperthreading, so /proc/cpuinfo reports 24 cores

Doing a pure compute OpenMP loop where I'd expect the number of iterations
in 10s to rise with number of threads
with gnu and mpich
OMP_NUM_THREADS=1 -n 1 : 112 iterations
OMP_NUM_THREADS=2 -n 1 : 224 iterations
OMP_NUM_THREADS=6 -n 1 : 644 iterations
OMP_NUM_THREADS=12 -n 1 : 1287 iterations
OMP_NUM_THREADS=22 -n 1 : 1182 iterations
OMP_NUM_THREADS=24 -n 1 : 454 iterations

which shows that mpich is spreading across the cores, but hyperthreading is
not useful, and using the whole node counterproductive

with gnu and openmpi 1.8.3
OMP_NUM_THREADS=1 mpiexec -n 1 : 112
OMP_NUM_THREADS=2 mpiexec -n 1 : 113
which suggests you aren't allowing the threads to spread across cores

adding --cpus-per-node I gain access to the resources on one chip

OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112
OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224
OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644
then
OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1

A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that has less cpus than
requested ones:

  #cpus-per-proc:  12
  number of cpus:  6
  map-by:  BYNUMA

So you aren't happy using both chips for one process

OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus : 112
OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112
OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224
OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus : 324
OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 631
OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : 647

OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus

A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that has less cpus than
requested ones:

  #cpus-per-proc:  24
  number of cpus:  12
  map-by:  BYNUMA

OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : 112
OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : 224
OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :: 644

OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus ::
644

A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that has less cpus than
requested ones:

  #cpus-per-proc:  24
  number of cpus:  12
  map-by:  BYNUMA

So it seems that --use-hwthread-cpus means that --cpus-per-proc changes
from physical cores to hyperthreaded cores, but I can't get both chips
working on the problem in way mpich can

John


Re: [OMPI users] Converting --cpus-per-proc to --map-by for a hybrid code

2014-12-05 Thread Ralph Castain
I’m trying to grok the problem, so bear with me a bit. It sounds like you have 
a machine with 12 physical cores, each with two hyperthreads, and you have HT 
turned on - correct?

If that is true, then the problem is that you are attempting to bind-to core 
(of which you have 12), but asking for 2 cpus/proc. Since you haven’t told us 
to use HTs as cpus, we are using “cores” as cpus - so this cmd is actually 
asking us to bind each process to 2 cores, resulting in an overload.

So you have two options:

* remove the cpus-per-proc directive. Since you are binding to core, we will 
automatically bind each process to both HTs in the core, which is the result 
you want

* add the —use-hwthread-cpus flag and change your binding request to 
“hwthread". This will treat each HT as a separate cpu, and we will bind each 
process to 2 HTs, effectively binding them to the core.

The revised manpage that hopefully helps explain this better is in the upcoming 
1.8.4 release. I’m also working on a page for the web site to better explain 
the new map/rank/bind system.

HTH
Ralph

> On Dec 5, 2014, at 12:55 AM, John Bray  wrote:
> 
> To run a hybrid MPI/OpenMP code on a hyperthreaded machine with 24 virtual 
> cores, I've been using -n 12 --cpus-per-proc 2 so I can use OMP_NUM_THREADS=2
> 
> I now see that --cpus-per-proc is deprecated in favour of --map-by, but I've 
> been struggling to find a conversion as the --map-by documentation is not 
> very clear.
> 
> What should I use to bind 2 virtual cores to each process?
> 
> After I use -n 12 --cpus-per-proc 2 I get
> 
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
> 
>Bind to: CORE
>Node:mic1
>#processes:  2
>#cpus:   1
> 
> and suggests I need an override option
> 
> But this doesn't to match my request for 2 cores  per process, almost the 
> reverse, having 2 processes per core. I don't think I'm overloading my 
> virtual cores anyway
> 
> John
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/25917.php