subject:"\[OMPI users\] affinity issues under cpuset torque 1.8.1"

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-25 Thread Brock Palen

Yes 

ompi_info --all 

Works,

ompi_info -param all all

[brockp@flux-login1 34241]$ ompi_info --param all all
Error getting SCIF driver version 
 MCA btl: parameter "btl_tcp_if_include" (current value: "",
  data source: default, level: 1 user/basic, type:
  string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to use for MPI communication
  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
  with btl_tcp_if_exclude.
 MCA btl: parameter "btl_tcp_if_exclude" (current value:
  "127.0.0.1/8,sppp", data source: default, level: 1
  user/basic, type: string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to NOT use for MPI
  communication -- all devices not matching these
  specifications will be used (e.g.,
  "eth0,192.168.0.0/16").  If set to a non-default
  value, it is mutually exclusive with
  btl_tcp_if_include.
[brockp@flux-login1 34241]$ 


ompi_info --param all all --level 9 
(gives me what I expect).

Thanks,

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 24, 2014, at 10:22 AM, Jeff Squyres (jsquyres)  
wrote:

> Brock --
> 
> Can you run with "ompi_info --all"?
> 
> With "--param all all", ompi_info in v1.8.x is defaulting to only showing 
> level 1 MCA params.  It's showing you all possible components and variables, 
> but only level 1.
> 
> Or you could also use "--level 9" to show all 9 levels.  Here's the relevant 
> section from the README:
> 
> -
> The following options may be helpful:
> 
> --all   Show a *lot* of information about your Open MPI
>installation. 
> --parsable  Display all the information in an easily
>grep/cut/awk/sed-able format.
> --param  
>A  of "all" and a  of "all" will
>show all parameters to all components.  Otherwise, the
>parameters of all the components in a specific framework,
>or just the parameters of a specific component can be
>displayed by using an appropriate  and/or
> name.
> --level 
>By default, ompi_info only shows "Level 1" MCA parameters
>-- parameters that can affect whether MPI processes can
>run successfully or not (e.g., determining which network
>interfaces to use).  The --level option will display all
>MCA parameters from level 1 to  (the max 
>value is 9).  Use "ompi_info --param 
> --level 9" to see *all* MCA parameters for a
>given component.  See "The Modular Component Architecture
>(MCA)" section, below, for a fuller explanation.
> 
> 
> 
> 
> 
> On Jun 24, 2014, at 5:19 AM, Ralph Castain  wrote:
> 
>> That's odd - it shouldn't truncate the output. I'll take a look later today 
>> - we're all gathered for a developer's conference this week, so I'll be able 
>> to poke at this with Nathan.
>> 
>> 
>> 
>> On Mon, Jun 23, 2014 at 3:15 PM, Brock Palen  wrote:
>> Perfection, flexible, extensible, so nice.
>> 
>> BTW this doesn't happen older versions,
>> 
>> [brockp@flux-login2 34241]$ ompi_info --param all all
>> Error getting SCIF driver version
>> MCA btl: parameter "btl_tcp_if_include" (current value: "",
>>  data source: default, level: 1 user/basic, type:
>>  string)
>>  Comma-delimited list of devices and/or CIDR
>>  notation of networks to use for MPI communication
>>  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
>>  with btl_tcp_if_exclude.
>> MCA btl: parameter "btl_tcp_if_exclude" (current value:
>>  "127.0.0.1/8,sppp", data source: default, level: 1
>>  user/basic, type: string)
>>  Comma-delimited list of devices and/or CIDR
>>  notation of networks to NOT use for MPI
>>  communication -- all devices not matching these
>>  specifications will be used (e.g.,
>>  "eth0,192.168.0.0/16").  If set to a non-default
>>  value, it is mutually exclusive with
>>  btl_tcp_if_include.
>> 
>> 
>> This is normally much longer.  And yes we don't have the PHI stuff installed 
>> on all nodes, strange that 'all all' is now very short,  ompi_info -a  still 
>> works though.
>> 
>> 
>> 
>>

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-24 Thread Jeff Squyres (jsquyres)

Brock --

Can you run with "ompi_info --all"?

With "--param all all", ompi_info in v1.8.x is defaulting to only showing level 
1 MCA params.  It's showing you all possible components and variables, but only 
level 1.

Or you could also use "--level 9" to show all 9 levels.  Here's the relevant 
section from the README:

-
The following options may be helpful:

--all   Show a *lot* of information about your Open MPI
installation. 
--parsable  Display all the information in an easily
grep/cut/awk/sed-able format.
--param  
A  of "all" and a  of "all" will
show all parameters to all components.  Otherwise, the
parameters of all the components in a specific framework,
or just the parameters of a specific component can be
displayed by using an appropriate  and/or
 name.
--level 
By default, ompi_info only shows "Level 1" MCA parameters
-- parameters that can affect whether MPI processes can
run successfully or not (e.g., determining which network
interfaces to use).  The --level option will display all
MCA parameters from level 1 to  (the max 
value is 9).  Use "ompi_info --param 
 --level 9" to see *all* MCA parameters for a
given component.  See "The Modular Component Architecture
(MCA)" section, below, for a fuller explanation.





On Jun 24, 2014, at 5:19 AM, Ralph Castain  wrote:

> That's odd - it shouldn't truncate the output. I'll take a look later today - 
> we're all gathered for a developer's conference this week, so I'll be able to 
> poke at this with Nathan.
> 
> 
> 
> On Mon, Jun 23, 2014 at 3:15 PM, Brock Palen  wrote:
> Perfection, flexible, extensible, so nice.
> 
> BTW this doesn't happen older versions,
> 
> [brockp@flux-login2 34241]$ ompi_info --param all all
> Error getting SCIF driver version
>  MCA btl: parameter "btl_tcp_if_include" (current value: "",
>   data source: default, level: 1 user/basic, type:
>   string)
>   Comma-delimited list of devices and/or CIDR
>   notation of networks to use for MPI communication
>   (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
>   with btl_tcp_if_exclude.
>  MCA btl: parameter "btl_tcp_if_exclude" (current value:
>   "127.0.0.1/8,sppp", data source: default, level: 1
>   user/basic, type: string)
>   Comma-delimited list of devices and/or CIDR
>   notation of networks to NOT use for MPI
>   communication -- all devices not matching these
>   specifications will be used (e.g.,
>   "eth0,192.168.0.0/16").  If set to a non-default
>   value, it is mutually exclusive with
>   btl_tcp_if_include.
> 
> 
> This is normally much longer.  And yes we don't have the PHI stuff installed 
> on all nodes, strange that 'all all' is now very short,  ompi_info -a  still 
> works though.
> 
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Jun 20, 2014, at 1:48 PM, Ralph Castain  wrote:
> 
> > Put "orte_hetero_nodes=1" in your default MCA param file - uses can 
> > override by setting that param to 0
> >
> >
> > On Jun 20, 2014, at 10:30 AM, Brock Palen  wrote:
> >
> >> Perfection!  That appears to do it for our standard case.
> >>
> >> Now I know how to set MCA options by env var or config file.  How can I 
> >> make this the default, that then a user can override?
> >>
> >> Brock Palen
> >> www.umich.edu/~brockp
> >> CAEN Advanced Computing
> >> XSEDE Campus Champion
> >> bro...@umich.edu
> >> (734)936-1985
> >>
> >>
> >>
> >> On Jun 20, 2014, at 1:21 PM, Ralph Castain  wrote:
> >>
> >>> I think I begin to grok at least part of the problem. If you are 
> >>> assigning different cpus on each node, then you'll need to tell us that 
> >>> by setting --hetero-nodes otherwise we won't have any way to report that 
> >>> back to mpirun for its binding calculation.
> >>>
> >>> Otherwise, we expect that the cpuset of the first node we launch a daemon 
> >>> onto (or where mpirun is executing, if we are only launching local to 
> >>> mpirun) accurately represents the cpuset on every node in the allocation.
> >>>
> >>> We still might well have a bug in our binding computation - but the above 
> >>> will definitely impact what you said the user did.
> >>>
> >>> On Jun 20, 2014, at 10:06 AM, Brock Palen  wrote:
> >>>
>  Extra data point if I do:
> 
>

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-24 Thread Ralph Castain

That's odd - it shouldn't truncate the output. I'll take a look later today
- we're all gathered for a developer's conference this week, so I'll be
able to poke at this with Nathan.



On Mon, Jun 23, 2014 at 3:15 PM, Brock Palen  wrote:

> Perfection, flexible, extensible, so nice.
>
> BTW this doesn't happen older versions,
>
> [brockp@flux-login2 34241]$ ompi_info --param all all
> Error getting SCIF driver version
>  MCA btl: parameter "btl_tcp_if_include" (current value:
> "",
>   data source: default, level: 1 user/basic, type:
>   string)
>   Comma-delimited list of devices and/or CIDR
>   notation of networks to use for MPI communication
>   (e.g., "eth0,192.168.0.0/16").  Mutually
> exclusive
>   with btl_tcp_if_exclude.
>  MCA btl: parameter "btl_tcp_if_exclude" (current value:
>   "127.0.0.1/8,sppp", data source: default,
> level: 1
>   user/basic, type: string)
>   Comma-delimited list of devices and/or CIDR
>   notation of networks to NOT use for MPI
>   communication -- all devices not matching these
>   specifications will be used (e.g.,
>   "eth0,192.168.0.0/16").  If set to a non-default
>   value, it is mutually exclusive with
>   btl_tcp_if_include.
>
>
> This is normally much longer.  And yes we don't have the PHI stuff
> installed on all nodes, strange that 'all all' is now very short,
>  ompi_info -a  still works though.
>
>
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
>
>
>
> On Jun 20, 2014, at 1:48 PM, Ralph Castain  wrote:
>
> > Put "orte_hetero_nodes=1" in your default MCA param file - uses can
> override by setting that param to 0
> >
> >
> > On Jun 20, 2014, at 10:30 AM, Brock Palen  wrote:
> >
> >> Perfection!  That appears to do it for our standard case.
> >>
> >> Now I know how to set MCA options by env var or config file.  How can I
> make this the default, that then a user can override?
> >>
> >> Brock Palen
> >> www.umich.edu/~brockp
> >> CAEN Advanced Computing
> >> XSEDE Campus Champion
> >> bro...@umich.edu
> >> (734)936-1985
> >>
> >>
> >>
> >> On Jun 20, 2014, at 1:21 PM, Ralph Castain  wrote:
> >>
> >>> I think I begin to grok at least part of the problem. If you are
> assigning different cpus on each node, then you'll need to tell us that by
> setting --hetero-nodes otherwise we won't have any way to report that back
> to mpirun for its binding calculation.
> >>>
> >>> Otherwise, we expect that the cpuset of the first node we launch a
> daemon onto (or where mpirun is executing, if we are only launching local
> to mpirun) accurately represents the cpuset on every node in the allocation.
> >>>
> >>> We still might well have a bug in our binding computation - but the
> above will definitely impact what you said the user did.
> >>>
> >>> On Jun 20, 2014, at 10:06 AM, Brock Palen  wrote:
> >>>
>  Extra data point if I do:
> 
>  [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core
> hostname
> 
> --
>  A request was made to bind to that would result in binding more
>  processes than cpus on a resource:
> 
>  Bind to: CORE
>  Node:nyx5513
>  #processes:  2
>  #cpus:  1
> 
>  You can override this protection by adding the "overload-allowed"
>  option to your binding directive.
> 
> --
> 
>  [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
>  13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90,
> 12.38
>  13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90,
> 12.38
>  [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind
> --get
>  0x0010
>  0x1000
>  [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
>  nyx5513
>  nyx5513
> 
>  Interesting, if I force bind to core, MPI barfs saying there is only
> 1 cpu available, PBS says it gave it two, and if I force (this is all
> inside an interactive job) just on that node hwloc-bind --get I get what I
> expect,
> 
>  Is there a way to get a map of what MPI thinks it has on each host?
> 
>  Brock Palen
>  www.umich.edu/~brockp
>  CAEN Advanced Computing
>  XSEDE Campus Champion
>  bro...@umich.edu
>  (734)936-1985
> 
> 
> 
>  On Jun 20, 2014, at 12:38 PM, Brock Palen

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-24 Thread Ralph Castain

Let's say that the downside is an unknown at this time. The only real
impact of setting that param is that each daemon now reports its topology
at startup. Without the param, only the daemon on the first node does so.
The concern expressed when we first added that report was that the volume
of data being sent on a very large system might impact launch time.
However, the amount of data from each node isn't very much, so we don't
know if there really would be a downside, or how significant it might be.

Sadly, we haven't had access to machines of any real size to test this so
we had real numbers for the decision. Absent that data, we took the
conservative approach of setting the default so as to preserve the
pre-existing behavior.

So everyone out there: please consider this an appeal for data. If you are
interested and willing, just send me (or the list - your option) any data
you are willing to share regarding launch time with and without the
--hetero-nodes option. A simple "time mpirun --map-by ppr:1:node /bin/true"
(or equivalent) run at various numbers of nodes would suffice.


On Mon, Jun 23, 2014 at 3:17 PM, Maxime Boissonneault <
maxime.boissonnea...@calculquebec.ca> wrote:

> Hi,
> I've been following this thread because it may be relevant to our setup.
>
> Is there a drawback of having orte_hetero_nodes=1 as default MCA parameter
> ? Is there a reason why the most generic case is not assumed ?
>
> Maxime Boissonneault
>
> Le 2014-06-20 13:48, Ralph Castain a écrit :
>
>> Put "orte_hetero_nodes=1" in your default MCA param file - uses can
>> override by setting that param to 0
>>
>>
>> On Jun 20, 2014, at 10:30 AM, Brock Palen  wrote:
>>
>>  Perfection!  That appears to do it for our standard case.
>>>
>>> Now I know how to set MCA options by env var or config file.  How can I
>>> make this the default, that then a user can override?
>>>
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>>
>>>
>>>
>>> On Jun 20, 2014, at 1:21 PM, Ralph Castain  wrote:
>>>
>>>  I think I begin to grok at least part of the problem. If you are
 assigning different cpus on each node, then you'll need to tell us that by
 setting --hetero-nodes otherwise we won't have any way to report that back
 to mpirun for its binding calculation.

 Otherwise, we expect that the cpuset of the first node we launch a
 daemon onto (or where mpirun is executing, if we are only launching local
 to mpirun) accurately represents the cpuset on every node in the 
 allocation.

 We still might well have a bug in our binding computation - but the
 above will definitely impact what you said the user did.

 On Jun 20, 2014, at 10:06 AM, Brock Palen  wrote:

  Extra data point if I do:
>
> [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core
> hostname
> 
> --
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
>
>   Bind to: CORE
>   Node:nyx5513
>   #processes:  2
>   #cpus:  1
>
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> 
> --
>
> [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90,
> 12.38
> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90,
> 12.38
> [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind
> --get
> 0x0010
> 0x1000
> [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
> nyx5513
> nyx5513
>
> Interesting, if I force bind to core, MPI barfs saying there is only 1
> cpu available, PBS says it gave it two, and if I force (this is all inside
> an interactive job) just on that node hwloc-bind --get I get what I 
> expect,
>
> Is there a way to get a map of what MPI thinks it has on each host?
>
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
>
>
>
> On Jun 20, 2014, at 12:38 PM, Brock Palen  wrote:
>
>  I was able to produce it in my test.
>>
>> orted affinity set by cpuset:
>> [root@nyx5874 ~]# hwloc-bind --get --pid 103645
>> 0xc002
>>
>> This mask (1, 14,15) which is across sockets, matches the cpu set
>> setup by the batch system.
>> [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.
>> nyx.engin.umich.edu/cpus
>> 1,14-15
>>
>> The ranks though were then all set to the same core:
>>
>>

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-23 Thread Maxime Boissonneault


Hi,
I've been following this thread because it may be relevant to our setup.

Is there a drawback of having orte_hetero_nodes=1 as default MCA 
parameter ? Is there a reason why the most generic case is not assumed ?


Maxime Boissonneault

Le 2014-06-20 13:48, Ralph Castain a écrit :

Put "orte_hetero_nodes=1" in your default MCA param file - uses can override by 
setting that param to 0


On Jun 20, 2014, at 10:30 AM, Brock Palen  wrote:


Perfection!  That appears to do it for our standard case.

Now I know how to set MCA options by env var or config file.  How can I make 
this the default, that then a user can override?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 1:21 PM, Ralph Castain  wrote:


I think I begin to grok at least part of the problem. If you are assigning 
different cpus on each node, then you'll need to tell us that by setting 
--hetero-nodes otherwise we won't have any way to report that back to mpirun 
for its binding calculation.

Otherwise, we expect that the cpuset of the first node we launch a daemon onto 
(or where mpirun is executing, if we are only launching local to mpirun) 
accurately represents the cpuset on every node in the allocation.

We still might well have a bug in our binding computation - but the above will 
definitely impact what you said the user did.

On Jun 20, 2014, at 10:06 AM, Brock Palen  wrote:


Extra data point if I do:

[brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

  Bind to: CORE
  Node:nyx5513
  #processes:  2
  #cpus:  1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--

[brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
[brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
0x0010
0x1000
[brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
nyx5513
nyx5513

Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
available, PBS says it gave it two, and if I force (this is all inside an 
interactive job) just on that node hwloc-bind --get I get what I expect,

Is there a way to get a map of what MPI thinks it has on each host?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 12:38 PM, Brock Palen  wrote:


I was able to produce it in my test.

orted affinity set by cpuset:
[root@nyx5874 ~]# hwloc-bind --get --pid 103645
0xc002

This mask (1, 14,15) which is across sockets, matches the cpu set setup by the 
batch system.
[root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus
1,14-15

The ranks though were then all set to the same core:

[root@nyx5874 ~]# hwloc-bind --get --pid 103871
0x8000
[root@nyx5874 ~]# hwloc-bind --get --pid 103872
0x8000
[root@nyx5874 ~]# hwloc-bind --get --pid 103873
0x8000

Which is core 15:

report-bindings gave me:
You can see how a few nodes were bound to all the same core, the last one in 
each case.  I only gave you the results for the hose nyx5874.

[nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all 
available processors)
[nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all 
available processors)
[nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all 
available processors)
[nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all 
available processors)
[nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all 
available processors)
[nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt 0]]:

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-23 Thread Brock Palen

Perfection, flexible, extensible, so nice.

BTW this doesn't happen older versions,

[brockp@flux-login2 34241]$ ompi_info --param all all
Error getting SCIF driver version 
 MCA btl: parameter "btl_tcp_if_include" (current value: "",
  data source: default, level: 1 user/basic, type:
  string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to use for MPI communication
  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
  with btl_tcp_if_exclude.
 MCA btl: parameter "btl_tcp_if_exclude" (current value:
  "127.0.0.1/8,sppp", data source: default, level: 1
  user/basic, type: string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to NOT use for MPI
  communication -- all devices not matching these
  specifications will be used (e.g.,
  "eth0,192.168.0.0/16").  If set to a non-default
  value, it is mutually exclusive with
  btl_tcp_if_include.


This is normally much longer.  And yes we don't have the PHI stuff installed on 
all nodes, strange that 'all all' is now very short,  ompi_info -a  still works 
though.



Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 1:48 PM, Ralph Castain  wrote:

> Put "orte_hetero_nodes=1" in your default MCA param file - uses can override 
> by setting that param to 0
> 
> 
> On Jun 20, 2014, at 10:30 AM, Brock Palen  wrote:
> 
>> Perfection!  That appears to do it for our standard case.
>> 
>> Now I know how to set MCA options by env var or config file.  How can I make 
>> this the default, that then a user can override?
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jun 20, 2014, at 1:21 PM, Ralph Castain  wrote:
>> 
>>> I think I begin to grok at least part of the problem. If you are assigning 
>>> different cpus on each node, then you'll need to tell us that by setting 
>>> --hetero-nodes otherwise we won't have any way to report that back to 
>>> mpirun for its binding calculation.
>>> 
>>> Otherwise, we expect that the cpuset of the first node we launch a daemon 
>>> onto (or where mpirun is executing, if we are only launching local to 
>>> mpirun) accurately represents the cpuset on every node in the allocation.
>>> 
>>> We still might well have a bug in our binding computation - but the above 
>>> will definitely impact what you said the user did.
>>> 
>>> On Jun 20, 2014, at 10:06 AM, Brock Palen  wrote:
>>> 
 Extra data point if I do:
 
 [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
 --
 A request was made to bind to that would result in binding more
 processes than cpus on a resource:
 
 Bind to: CORE
 Node:nyx5513
 #processes:  2
 #cpus:  1
 
 You can override this protection by adding the "overload-allowed"
 option to your binding directive.
 --
 
 [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
 [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
 0x0010
 0x1000
 [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
 nyx5513
 nyx5513
 
 Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
 available, PBS says it gave it two, and if I force (this is all inside an 
 interactive job) just on that node hwloc-bind --get I get what I expect,
 
 Is there a way to get a map of what MPI thinks it has on each host?
 
 Brock Palen
 www.umich.edu/~brockp
 CAEN Advanced Computing
 XSEDE Campus Champion
 bro...@umich.edu
 (734)936-1985
 
 
 
 On Jun 20, 2014, at 12:38 PM, Brock Palen  wrote:
 
> I was able to produce it in my test.
> 
> orted affinity set by cpuset:
> [root@nyx5874 ~]# hwloc-bind --get --pid 103645
> 0xc002
> 
> This mask (1, 14,15) which is across sockets, matches the cpu set setup 
> by the batch system. 
> [root@nyx5874 ~]# cat 
> /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 
> 1,14-15
> 
> The ranks though were then all

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Ralph Castain

Put "orte_hetero_nodes=1" in your default MCA param file - uses can override by 
setting that param to 0


On Jun 20, 2014, at 10:30 AM, Brock Palen  wrote:

> Perfection!  That appears to do it for our standard case.
> 
> Now I know how to set MCA options by env var or config file.  How can I make 
> this the default, that then a user can override?
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Jun 20, 2014, at 1:21 PM, Ralph Castain  wrote:
> 
>> I think I begin to grok at least part of the problem. If you are assigning 
>> different cpus on each node, then you'll need to tell us that by setting 
>> --hetero-nodes otherwise we won't have any way to report that back to mpirun 
>> for its binding calculation.
>> 
>> Otherwise, we expect that the cpuset of the first node we launch a daemon 
>> onto (or where mpirun is executing, if we are only launching local to 
>> mpirun) accurately represents the cpuset on every node in the allocation.
>> 
>> We still might well have a bug in our binding computation - but the above 
>> will definitely impact what you said the user did.
>> 
>> On Jun 20, 2014, at 10:06 AM, Brock Palen  wrote:
>> 
>>> Extra data point if I do:
>>> 
>>> [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
>>> --
>>> A request was made to bind to that would result in binding more
>>> processes than cpus on a resource:
>>> 
>>>  Bind to: CORE
>>>  Node:nyx5513
>>>  #processes:  2
>>>  #cpus:  1
>>> 
>>> You can override this protection by adding the "overload-allowed"
>>> option to your binding directive.
>>> --
>>> 
>>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
>>> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
>>> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
>>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
>>> 0x0010
>>> 0x1000
>>> [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
>>> nyx5513
>>> nyx5513
>>> 
>>> Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
>>> available, PBS says it gave it two, and if I force (this is all inside an 
>>> interactive job) just on that node hwloc-bind --get I get what I expect,
>>> 
>>> Is there a way to get a map of what MPI thinks it has on each host?
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> On Jun 20, 2014, at 12:38 PM, Brock Palen  wrote:
>>> 
 I was able to produce it in my test.
 
 orted affinity set by cpuset:
 [root@nyx5874 ~]# hwloc-bind --get --pid 103645
 0xc002
 
 This mask (1, 14,15) which is across sockets, matches the cpu set setup by 
 the batch system. 
 [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 
 1,14-15
 
 The ranks though were then all set to the same core:
 
 [root@nyx5874 ~]# hwloc-bind --get --pid 103871
 0x8000
 [root@nyx5874 ~]# hwloc-bind --get --pid 103872
 0x8000
 [root@nyx5874 ~]# hwloc-bind --get --pid 103873
 0x8000
 
 Which is core 15:
 
 report-bindings gave me:
 You can see how a few nodes were bound to all the same core, the last one 
 in each case.  I only gave you the results for the hose nyx5874.
 
 [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all 
 available processors)
 [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all 
 available processors)
 [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all 
 available processors)
 [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all 
 available processors)
 [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 
 0]]: [./././././././.][./././././././B]
 [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 
 0]]: [./././././././.][./././././././B]
 [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 
 0]]: [./././././././.][./././././././B]
 [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all 
 available processors)
 [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 
 0]]: [./././././././.][./././././././B]
 [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 
 0]]: [./././././././.][./././././././B]
 [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 
 0]]: [./././././././.][./././././././B]

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

Perfection!  That appears to do it for our standard case.

Now I know how to set MCA options by env var or config file.  How can I make 
this the default, that then a user can override?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 1:21 PM, Ralph Castain  wrote:

> I think I begin to grok at least part of the problem. If you are assigning 
> different cpus on each node, then you'll need to tell us that by setting 
> --hetero-nodes otherwise we won't have any way to report that back to mpirun 
> for its binding calculation.
> 
> Otherwise, we expect that the cpuset of the first node we launch a daemon 
> onto (or where mpirun is executing, if we are only launching local to mpirun) 
> accurately represents the cpuset on every node in the allocation.
> 
> We still might well have a bug in our binding computation - but the above 
> will definitely impact what you said the user did.
> 
> On Jun 20, 2014, at 10:06 AM, Brock Palen  wrote:
> 
>> Extra data point if I do:
>> 
>> [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
>> --
>> A request was made to bind to that would result in binding more
>> processes than cpus on a resource:
>> 
>>   Bind to: CORE
>>   Node:nyx5513
>>   #processes:  2
>>   #cpus:  1
>> 
>> You can override this protection by adding the "overload-allowed"
>> option to your binding directive.
>> --
>> 
>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
>> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
>> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
>> [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
>> 0x0010
>> 0x1000
>> [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
>> nyx5513
>> nyx5513
>> 
>> Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
>> available, PBS says it gave it two, and if I force (this is all inside an 
>> interactive job) just on that node hwloc-bind --get I get what I expect,
>> 
>> Is there a way to get a map of what MPI thinks it has on each host?
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jun 20, 2014, at 12:38 PM, Brock Palen  wrote:
>> 
>>> I was able to produce it in my test.
>>> 
>>> orted affinity set by cpuset:
>>> [root@nyx5874 ~]# hwloc-bind --get --pid 103645
>>> 0xc002
>>> 
>>> This mask (1, 14,15) which is across sockets, matches the cpu set setup by 
>>> the batch system. 
>>> [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 
>>> 1,14-15
>>> 
>>> The ranks though were then all set to the same core:
>>> 
>>> [root@nyx5874 ~]# hwloc-bind --get --pid 103871
>>> 0x8000
>>> [root@nyx5874 ~]# hwloc-bind --get --pid 103872
>>> 0x8000
>>> [root@nyx5874 ~]# hwloc-bind --get --pid 103873
>>> 0x8000
>>> 
>>> Which is core 15:
>>> 
>>> report-bindings gave me:
>>> You can see how a few nodes were bound to all the same core, the last one 
>>> in each case.  I only gave you the results for the hose nyx5874.
>>> 
>>> [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all 
>>> available processors)
>>> [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all 
>>> available processors)
>>> [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all 
>>> available processors)
>>> [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all 
>>> available processors)
>>> [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all 
>>> available processors)
>>> [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt 
>>> 0]]: [./././././././.][./././././././B]
>>> [nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Ralph Castain

I think I begin to grok at least part of the problem. If you are assigning 
different cpus on each node, then you'll need to tell us that by setting 
--hetero-nodes otherwise we won't have any way to report that back to mpirun 
for its binding calculation.

Otherwise, we expect that the cpuset of the first node we launch a daemon onto 
(or where mpirun is executing, if we are only launching local to mpirun) 
accurately represents the cpuset on every node in the allocation.

We still might well have a bug in our binding computation - but the above will 
definitely impact what you said the user did.

On Jun 20, 2014, at 10:06 AM, Brock Palen  wrote:

> Extra data point if I do:
> 
> [brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
> --
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
> 
>   Bind to: CORE
>   Node:nyx5513
>   #processes:  2
>   #cpus:  1
> 
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> --
> 
> [brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
> 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
> [brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
> 0x0010
> 0x1000
> [brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
> nyx5513
> nyx5513
> 
> Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
> available, PBS says it gave it two, and if I force (this is all inside an 
> interactive job) just on that node hwloc-bind --get I get what I expect,
> 
> Is there a way to get a map of what MPI thinks it has on each host?
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Jun 20, 2014, at 12:38 PM, Brock Palen  wrote:
> 
>> I was able to produce it in my test.
>> 
>> orted affinity set by cpuset:
>> [root@nyx5874 ~]# hwloc-bind --get --pid 103645
>> 0xc002
>> 
>> This mask (1, 14,15) which is across sockets, matches the cpu set setup by 
>> the batch system. 
>> [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 
>> 1,14-15
>> 
>> The ranks though were then all set to the same core:
>> 
>> [root@nyx5874 ~]# hwloc-bind --get --pid 103871
>> 0x8000
>> [root@nyx5874 ~]# hwloc-bind --get --pid 103872
>> 0x8000
>> [root@nyx5874 ~]# hwloc-bind --get --pid 103873
>> 0x8000
>> 
>> Which is core 15:
>> 
>> report-bindings gave me:
>> You can see how a few nodes were bound to all the same core, the last one in 
>> each case.  I only gave you the results for the hose nyx5874.
>> 
>> [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all 
>> available processors)
>> [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all 
>> available processors)
>> [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all 
>> available processors)
>> [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all 
>> available processors)
>> [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all 
>> available processors)
>> [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5798.engin.umich.edu:53026] MCW rank 60 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5798.engin.umich.edu:53026] MCW rank 56 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5798.engin.umich.edu:53026] MCW rank 57 bound to socket 1[core 15[hwt 
>> 0]]: [./././././././.][./././././././B]
>> [nyx5798.engin.umich.edu:53026] MCW rank 58 bound to socket 1[core 15[hwt 
>>

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

Extra data point if I do:

[brockp@nyx5508 34241]$ mpirun --report-bindings --bind-to core hostname
--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:nyx5513
   #processes:  2
   #cpus:  1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--

[brockp@nyx5508 34241]$ mpirun -H nyx5513 uptime
 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
 13:01:37 up 31 days, 23:06,  0 users,  load average: 10.13, 10.90, 12.38
[brockp@nyx5508 34241]$ mpirun -H nyx5513 --bind-to core hwloc-bind --get
0x0010
0x1000
[brockp@nyx5508 34241]$ cat $PBS_NODEFILE | grep nyx5513
nyx5513
nyx5513

Interesting, if I force bind to core, MPI barfs saying there is only 1 cpu 
available, PBS says it gave it two, and if I force (this is all inside an 
interactive job) just on that node hwloc-bind --get I get what I expect,

Is there a way to get a map of what MPI thinks it has on each host?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 12:38 PM, Brock Palen  wrote:

> I was able to produce it in my test.
> 
> orted affinity set by cpuset:
> [root@nyx5874 ~]# hwloc-bind --get --pid 103645
> 0xc002
> 
> This mask (1, 14,15) which is across sockets, matches the cpu set setup by 
> the batch system. 
> [root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 
> 1,14-15
> 
> The ranks though were then all set to the same core:
> 
> [root@nyx5874 ~]# hwloc-bind --get --pid 103871
> 0x8000
> [root@nyx5874 ~]# hwloc-bind --get --pid 103872
> 0x8000
> [root@nyx5874 ~]# hwloc-bind --get --pid 103873
> 0x8000
> 
> Which is core 15:
> 
> report-bindings gave me:
> You can see how a few nodes were bound to all the same core, the last one in 
> each case.  I only gave you the results for the hose nyx5874.
> 
> [nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all 
> available processors)
> [nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all 
> available processors)
> [nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all 
> available processors)
> [nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all 
> available processors)
> [nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all 
> available processors)
> [nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 60 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 56 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 57 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5798.engin.umich.edu:53026] MCW rank 58 bound to socket 1[core 15[hwt 
> 0]]: [./././././././.][./././././././B]
> [nyx5545.engin.umich.edu:88170] MCW rank 2 is not bound (or bound to all 
> available processors)
> [nyx5613.engin.umich.edu:25229] MCW rank 31 is not bound (or bound to all 
> available processors)
> [nyx5880.engin.umich.edu:01406] MCW rank 10 is not bound (or bound to all 
> available processors)
> [nyx5770.engin.umich.edu:86538] MCW rank 6 is not bound (or bound to all 
> available processors)
> [nyx5613.engin.umich.edu:25228] MCW rank 30 is not bound (or bound to all 
> available processors)
> [nyx5577.engin.umich.edu:65949] MCW rank 4 is not bound (or bound to all 
> available processors)
> [nyx5607.engin.umich.edu:30379] MCW rank 14 is not bound (or bound to all 
> available processors)
> [nyx5544.engin.umich.edu:72960] MCW rank 47 is not bound (or bound to

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

I was able to produce it in my test.

orted affinity set by cpuset:
[root@nyx5874 ~]# hwloc-bind --get --pid 103645
0xc002

This mask (1, 14,15) which is across sockets, matches the cpu set setup by the 
batch system. 
[root@nyx5874 ~]# cat /dev/cpuset/torque/12719806.nyx.engin.umich.edu/cpus 
1,14-15

The ranks though were then all set to the same core:

[root@nyx5874 ~]# hwloc-bind --get --pid 103871
0x8000
[root@nyx5874 ~]# hwloc-bind --get --pid 103872
0x8000
[root@nyx5874 ~]# hwloc-bind --get --pid 103873
0x8000

Which is core 15:

report-bindings gave me:
You can see how a few nodes were bound to all the same core, the last one in 
each case.  I only gave you the results for the hose nyx5874.

[nyx5526.engin.umich.edu:23726] MCW rank 0 is not bound (or bound to all 
available processors)
[nyx5878.engin.umich.edu:103925] MCW rank 8 is not bound (or bound to all 
available processors)
[nyx5533.engin.umich.edu:123988] MCW rank 1 is not bound (or bound to all 
available processors)
[nyx5879.engin.umich.edu:102808] MCW rank 9 is not bound (or bound to all 
available processors)
[nyx5874.engin.umich.edu:103645] MCW rank 41 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5874.engin.umich.edu:103645] MCW rank 42 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5874.engin.umich.edu:103645] MCW rank 43 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5888.engin.umich.edu:117400] MCW rank 11 is not bound (or bound to all 
available processors)
[nyx5786.engin.umich.edu:30004] MCW rank 19 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5786.engin.umich.edu:30004] MCW rank 18 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5594.engin.umich.edu:33884] MCW rank 24 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5594.engin.umich.edu:33884] MCW rank 25 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5594.engin.umich.edu:33884] MCW rank 26 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 59 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 60 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 56 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 57 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5798.engin.umich.edu:53026] MCW rank 58 bound to socket 1[core 15[hwt 0]]: 
[./././././././.][./././././././B]
[nyx5545.engin.umich.edu:88170] MCW rank 2 is not bound (or bound to all 
available processors)
[nyx5613.engin.umich.edu:25229] MCW rank 31 is not bound (or bound to all 
available processors)
[nyx5880.engin.umich.edu:01406] MCW rank 10 is not bound (or bound to all 
available processors)
[nyx5770.engin.umich.edu:86538] MCW rank 6 is not bound (or bound to all 
available processors)
[nyx5613.engin.umich.edu:25228] MCW rank 30 is not bound (or bound to all 
available processors)
[nyx5577.engin.umich.edu:65949] MCW rank 4 is not bound (or bound to all 
available processors)
[nyx5607.engin.umich.edu:30379] MCW rank 14 is not bound (or bound to all 
available processors)
[nyx5544.engin.umich.edu:72960] MCW rank 47 is not bound (or bound to all 
available processors)
[nyx5544.engin.umich.edu:72959] MCW rank 46 is not bound (or bound to all 
available processors)
[nyx5848.engin.umich.edu:04332] MCW rank 33 is not bound (or bound to all 
available processors)
[nyx5848.engin.umich.edu:04333] MCW rank 34 is not bound (or bound to all 
available processors)
[nyx5544.engin.umich.edu:72958] MCW rank 45 is not bound (or bound to all 
available processors)
[nyx5858.engin.umich.edu:12165] MCW rank 35 is not bound (or bound to all 
available processors)
[nyx5607.engin.umich.edu:30380] MCW rank 15 is not bound (or bound to all 
available processors)
[nyx5544.engin.umich.edu:72957] MCW rank 44 is not bound (or bound to all 
available processors)
[nyx5858.engin.umich.edu:12167] MCW rank 37 is not bound (or bound to all 
available processors)
[nyx5870.engin.umich.edu:33811] MCW rank 7 is not bound (or bound to all 
available processors)
[nyx5582.engin.umich.edu:81994] MCW rank 5 is not bound (or bound to all 
available processors)
[nyx5848.engin.umich.edu:04331] MCW rank 32 is not bound (or bound to all 
available processors)
[nyx5557.engin.umich.edu:46654] MCW rank 50 is not bound (or bound to all 
available processors)
[nyx5858.engin.umich.edu:12166] MCW rank 36 is not bound (or bound to all 
available processors)
[nyx5799.engin.umich.edu:67802] MCW rank 22 is not bound (or bound to all 
available processors)
[nyx5799.engin.umich.edu:67803] MCW rank 23 is not bound (or bound to all 
available processors)
[nyx5556.engin.umich.edu:50889] MCW rank 3 is not bound (or

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

Got it,

I have the input from the user and am testing it out.

It probably has less todo with torque and more cpuset's, 

I'm working on producing it myself also.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 20, 2014, at 12:18 PM, Ralph Castain  wrote:

> Thanks - I'm just trying to reproduce one problem case so I can look at it. 
> Given that I don't have access to a Torque machine, I need to "fake" it.
> 
> 
> On Jun 20, 2014, at 9:15 AM, Brock Palen  wrote:
> 
>> In this case they are a single socket, but as you can see they could be 
>> ether/or depending on the job.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jun 19, 2014, at 2:44 PM, Ralph Castain  wrote:
>> 
>>> Sorry, I should have been clearer - I was asking if cores 8-11 are all on 
>>> one socket, or span multiple sockets
>>> 
>>> 
>>> On Jun 19, 2014, at 11:36 AM, Brock Palen  wrote:
>>> 
 Ralph,
 
 It was a large job spread across.  Our system allows users to ask for 
 'procs' which are laid out in any format. 
 
 The list:
 
> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
> [nyx5409:11][nyx5411:11][nyx5412:3]
 
 Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.
 
 They could be spread across any number of sockets configuration.  We start 
 very lax "user requests X procs" and then the user can request more strict 
 requirements from there.  We support mostly serial users, and users can 
 colocate on nodes.
 
 That is good to know, I think we would want to turn our default to 'bind 
 to core' except for our few users who use hybrid mode.
 
 Our CPU set tells you what cores the job is assigned.  So in the problem 
 case provided, the cpuset/cgroup shows only cores 8-11 are available to 
 this job on this node.
 
 Brock Palen
 www.umich.edu/~brockp
 CAEN Advanced Computing
 XSEDE Campus Champion
 bro...@umich.edu
 (734)936-1985
 
 
 
 On Jun 18, 2014, at 11:10 PM, Ralph Castain  wrote:
 
> The default binding option depends on the number of procs - it is bind-to 
> core for np=2, and bind-to socket for np > 2. You never said, but should 
> I assume you ran 4 ranks? If so, then we should be trying to bind-to 
> socket.
> 
> I'm not sure what your cpuset is telling us - are you binding us to a 
> socket? Are some cpus in one socket, and some in another?
> 
> It could be that the cpuset + bind-to socket is resulting in some odd 
> behavior, but I'd need a little more info to narrow it down.
> 
> 
> On Jun 18, 2014, at 7:48 PM, Brock Palen  wrote:
> 
>> I have started using 1.8.1 for some codes (meep in this case) and it 
>> sometimes works fine, but in a few cases I am seeing ranks being given 
>> overlapping CPU assignments, not always though.
>> 
>> Example job, default binding options (so by-core right?):
>> 
>> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, 
>> and use TM to spawn.
>> 
>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>> [nyx5409:11][nyx5411:11][nyx5412:3]
>> 
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16065
>> 0x0200
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16066
>> 0x0800
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16067
>> 0x0200
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16068
>> 0x0800
>> 
>> [root@nyx5398 ~]# cat 
>> /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
>> 8-11
>> 
>> So torque claims the CPU set setup for the job has 4 cores, but as you 
>> can see the ranks were giving identical binding. 
>> 
>> I checked the pids they were part of the correct CPU set, I also 
>> checked, orted:
>> 
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16064
>> 0x0f00
>> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064
>> ignored unrecognized argument 16064
>> 
>> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
>> 8,9,10,11
>> 
>> Which is exactly what I would expect.
>> 
>> So ummm, i'm lost why this might happen?  What else should I check?  
>> Like I said not all jobs show this behavior.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription:

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Ralph Castain

Thanks - I'm just trying to reproduce one problem case so I can look at it. 
Given that I don't have access to a Torque machine, I need to "fake" it.


On Jun 20, 2014, at 9:15 AM, Brock Palen  wrote:

> In this case they are a single socket, but as you can see they could be 
> ether/or depending on the job.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Jun 19, 2014, at 2:44 PM, Ralph Castain  wrote:
> 
>> Sorry, I should have been clearer - I was asking if cores 8-11 are all on 
>> one socket, or span multiple sockets
>> 
>> 
>> On Jun 19, 2014, at 11:36 AM, Brock Palen  wrote:
>> 
>>> Ralph,
>>> 
>>> It was a large job spread across.  Our system allows users to ask for 
>>> 'procs' which are laid out in any format. 
>>> 
>>> The list:
>>> 
 [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
 [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
 [nyx5409:11][nyx5411:11][nyx5412:3]
>>> 
>>> Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.
>>> 
>>> They could be spread across any number of sockets configuration.  We start 
>>> very lax "user requests X procs" and then the user can request more strict 
>>> requirements from there.  We support mostly serial users, and users can 
>>> colocate on nodes.
>>> 
>>> That is good to know, I think we would want to turn our default to 'bind to 
>>> core' except for our few users who use hybrid mode.
>>> 
>>> Our CPU set tells you what cores the job is assigned.  So in the problem 
>>> case provided, the cpuset/cgroup shows only cores 8-11 are available to 
>>> this job on this node.
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> On Jun 18, 2014, at 11:10 PM, Ralph Castain  wrote:
>>> 
 The default binding option depends on the number of procs - it is bind-to 
 core for np=2, and bind-to socket for np > 2. You never said, but should I 
 assume you ran 4 ranks? If so, then we should be trying to bind-to socket.
 
 I'm not sure what your cpuset is telling us - are you binding us to a 
 socket? Are some cpus in one socket, and some in another?
 
 It could be that the cpuset + bind-to socket is resulting in some odd 
 behavior, but I'd need a little more info to narrow it down.
 
 
 On Jun 18, 2014, at 7:48 PM, Brock Palen  wrote:
 
> I have started using 1.8.1 for some codes (meep in this case) and it 
> sometimes works fine, but in a few cases I am seeing ranks being given 
> overlapping CPU assignments, not always though.
> 
> Example job, default binding options (so by-core right?):
> 
> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, 
> and use TM to spawn.
> 
> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
> [nyx5409:11][nyx5411:11][nyx5412:3]
> 
> [root@nyx5398 ~]# hwloc-bind --get --pid 16065
> 0x0200
> [root@nyx5398 ~]# hwloc-bind --get --pid 16066
> 0x0800
> [root@nyx5398 ~]# hwloc-bind --get --pid 16067
> 0x0200
> [root@nyx5398 ~]# hwloc-bind --get --pid 16068
> 0x0800
> 
> [root@nyx5398 ~]# cat 
> /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
> 8-11
> 
> So torque claims the CPU set setup for the job has 4 cores, but as you 
> can see the ranks were giving identical binding. 
> 
> I checked the pids they were part of the correct CPU set, I also checked, 
> orted:
> 
> [root@nyx5398 ~]# hwloc-bind --get --pid 16064
> 0x0f00
> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064
> ignored unrecognized argument 16064
> 
> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
> 8,9,10,11
> 
> Which is exactly what I would expect.
> 
> So ummm, i'm lost why this might happen?  What else should I check?  Like 
> I said not all jobs show this behavior.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/06/24672.php
 
 ___
 users mailing list
 us...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
 Link to this post: 
 http://www.open-mpi.org/community/lists/users/2014/06/24673.php
>>> 
>>> ___
>>> users mailing list
>>>

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-20 Thread Brock Palen

In this case they are a single socket, but as you can see they could be 
ether/or depending on the job.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 19, 2014, at 2:44 PM, Ralph Castain  wrote:

> Sorry, I should have been clearer - I was asking if cores 8-11 are all on one 
> socket, or span multiple sockets
> 
> 
> On Jun 19, 2014, at 11:36 AM, Brock Palen  wrote:
> 
>> Ralph,
>> 
>> It was a large job spread across.  Our system allows users to ask for 
>> 'procs' which are laid out in any format. 
>> 
>> The list:
>> 
>>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>>> [nyx5409:11][nyx5411:11][nyx5412:3]
>> 
>> Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.
>> 
>> They could be spread across any number of sockets configuration.  We start 
>> very lax "user requests X procs" and then the user can request more strict 
>> requirements from there.  We support mostly serial users, and users can 
>> colocate on nodes.
>> 
>> That is good to know, I think we would want to turn our default to 'bind to 
>> core' except for our few users who use hybrid mode.
>> 
>> Our CPU set tells you what cores the job is assigned.  So in the problem 
>> case provided, the cpuset/cgroup shows only cores 8-11 are available to this 
>> job on this node.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> On Jun 18, 2014, at 11:10 PM, Ralph Castain  wrote:
>> 
>>> The default binding option depends on the number of procs - it is bind-to 
>>> core for np=2, and bind-to socket for np > 2. You never said, but should I 
>>> assume you ran 4 ranks? If so, then we should be trying to bind-to socket.
>>> 
>>> I'm not sure what your cpuset is telling us - are you binding us to a 
>>> socket? Are some cpus in one socket, and some in another?
>>> 
>>> It could be that the cpuset + bind-to socket is resulting in some odd 
>>> behavior, but I'd need a little more info to narrow it down.
>>> 
>>> 
>>> On Jun 18, 2014, at 7:48 PM, Brock Palen  wrote:
>>> 
 I have started using 1.8.1 for some codes (meep in this case) and it 
 sometimes works fine, but in a few cases I am seeing ranks being given 
 overlapping CPU assignments, not always though.
 
 Example job, default binding options (so by-core right?):
 
 Assigned nodes, the one in question is nyx5398, we use torque CPU sets, 
 and use TM to spawn.
 
 [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
 [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
 [nyx5409:11][nyx5411:11][nyx5412:3]
 
 [root@nyx5398 ~]# hwloc-bind --get --pid 16065
 0x0200
 [root@nyx5398 ~]# hwloc-bind --get --pid 16066
 0x0800
 [root@nyx5398 ~]# hwloc-bind --get --pid 16067
 0x0200
 [root@nyx5398 ~]# hwloc-bind --get --pid 16068
 0x0800
 
 [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
 8-11
 
 So torque claims the CPU set setup for the job has 4 cores, but as you can 
 see the ranks were giving identical binding. 
 
 I checked the pids they were part of the correct CPU set, I also checked, 
 orted:
 
 [root@nyx5398 ~]# hwloc-bind --get --pid 16064
 0x0f00
 [root@nyx5398 ~]# hwloc-calc --intersect PU 16064
 ignored unrecognized argument 16064
 
 [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
 8,9,10,11
 
 Which is exactly what I would expect.
 
 So ummm, i'm lost why this might happen?  What else should I check?  Like 
 I said not all jobs show this behavior.
 
 Brock Palen
 www.umich.edu/~brockp
 CAEN Advanced Computing
 XSEDE Campus Champion
 bro...@umich.edu
 (734)936-1985
 
 
 
 ___
 users mailing list
 us...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
 Link to this post: 
 http://www.open-mpi.org/community/lists/users/2014/06/24672.php
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/06/24673.php
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/06/24675.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
>

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-19 Thread Ralph Castain

Sorry, I should have been clearer - I was asking if cores 8-11 are all on one 
socket, or span multiple sockets


On Jun 19, 2014, at 11:36 AM, Brock Palen  wrote:

> Ralph,
> 
> It was a large job spread across.  Our system allows users to ask for 'procs' 
> which are laid out in any format. 
> 
> The list:
> 
>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>> [nyx5409:11][nyx5411:11][nyx5412:3]
> 
> Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.
> 
> They could be spread across any number of sockets configuration.  We start 
> very lax "user requests X procs" and then the user can request more strict 
> requirements from there.  We support mostly serial users, and users can 
> colocate on nodes.
> 
> That is good to know, I think we would want to turn our default to 'bind to 
> core' except for our few users who use hybrid mode.
> 
> Our CPU set tells you what cores the job is assigned.  So in the problem case 
> provided, the cpuset/cgroup shows only cores 8-11 are available to this job 
> on this node.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> On Jun 18, 2014, at 11:10 PM, Ralph Castain  wrote:
> 
>> The default binding option depends on the number of procs - it is bind-to 
>> core for np=2, and bind-to socket for np > 2. You never said, but should I 
>> assume you ran 4 ranks? If so, then we should be trying to bind-to socket.
>> 
>> I'm not sure what your cpuset is telling us - are you binding us to a 
>> socket? Are some cpus in one socket, and some in another?
>> 
>> It could be that the cpuset + bind-to socket is resulting in some odd 
>> behavior, but I'd need a little more info to narrow it down.
>> 
>> 
>> On Jun 18, 2014, at 7:48 PM, Brock Palen  wrote:
>> 
>>> I have started using 1.8.1 for some codes (meep in this case) and it 
>>> sometimes works fine, but in a few cases I am seeing ranks being given 
>>> overlapping CPU assignments, not always though.
>>> 
>>> Example job, default binding options (so by-core right?):
>>> 
>>> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and 
>>> use TM to spawn.
>>> 
>>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>>> [nyx5409:11][nyx5411:11][nyx5412:3]
>>> 
>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16065
>>> 0x0200
>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16066
>>> 0x0800
>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16067
>>> 0x0200
>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16068
>>> 0x0800
>>> 
>>> [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
>>> 8-11
>>> 
>>> So torque claims the CPU set setup for the job has 4 cores, but as you can 
>>> see the ranks were giving identical binding. 
>>> 
>>> I checked the pids they were part of the correct CPU set, I also checked, 
>>> orted:
>>> 
>>> [root@nyx5398 ~]# hwloc-bind --get --pid 16064
>>> 0x0f00
>>> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064
>>> ignored unrecognized argument 16064
>>> 
>>> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
>>> 8,9,10,11
>>> 
>>> Which is exactly what I would expect.
>>> 
>>> So ummm, i'm lost why this might happen?  What else should I check?  Like I 
>>> said not all jobs show this behavior.
>>> 
>>> Brock Palen
>>> www.umich.edu/~brockp
>>> CAEN Advanced Computing
>>> XSEDE Campus Champion
>>> bro...@umich.edu
>>> (734)936-1985
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/06/24672.php
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/06/24673.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/06/24675.php

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-19 Thread Brock Palen

Ralph,

It was a large job spread across.  Our system allows users to ask for 'procs' 
which are laid out in any format. 

The list:

> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
> [nyx5409:11][nyx5411:11][nyx5412:3]

Shows that nyx5406 had 2 cores,  nyx5427 also 2,  nyx5411 had 11.

They could be spread across any number of sockets configuration.  We start very 
lax "user requests X procs" and then the user can request more strict 
requirements from there.  We support mostly serial users, and users can 
colocate on nodes.

That is good to know, I think we would want to turn our default to 'bind to 
core' except for our few users who use hybrid mode.

Our CPU set tells you what cores the job is assigned.  So in the problem case 
provided, the cpuset/cgroup shows only cores 8-11 are available to this job on 
this node.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985



On Jun 18, 2014, at 11:10 PM, Ralph Castain  wrote:

> The default binding option depends on the number of procs - it is bind-to 
> core for np=2, and bind-to socket for np > 2. You never said, but should I 
> assume you ran 4 ranks? If so, then we should be trying to bind-to socket.
> 
> I'm not sure what your cpuset is telling us - are you binding us to a socket? 
> Are some cpus in one socket, and some in another?
> 
> It could be that the cpuset + bind-to socket is resulting in some odd 
> behavior, but I'd need a little more info to narrow it down.
> 
> 
> On Jun 18, 2014, at 7:48 PM, Brock Palen  wrote:
> 
>> I have started using 1.8.1 for some codes (meep in this case) and it 
>> sometimes works fine, but in a few cases I am seeing ranks being given 
>> overlapping CPU assignments, not always though.
>> 
>> Example job, default binding options (so by-core right?):
>> 
>> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and 
>> use TM to spawn.
>> 
>> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
>> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
>> [nyx5409:11][nyx5411:11][nyx5412:3]
>> 
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16065
>> 0x0200
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16066
>> 0x0800
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16067
>> 0x0200
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16068
>> 0x0800
>> 
>> [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
>> 8-11
>> 
>> So torque claims the CPU set setup for the job has 4 cores, but as you can 
>> see the ranks were giving identical binding. 
>> 
>> I checked the pids they were part of the correct CPU set, I also checked, 
>> orted:
>> 
>> [root@nyx5398 ~]# hwloc-bind --get --pid 16064
>> 0x0f00
>> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064
>> ignored unrecognized argument 16064
>> 
>> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
>> 8,9,10,11
>> 
>> Which is exactly what I would expect.
>> 
>> So ummm, i'm lost why this might happen?  What else should I check?  Like I 
>> said not all jobs show this behavior.
>> 
>> Brock Palen
>> www.umich.edu/~brockp
>> CAEN Advanced Computing
>> XSEDE Campus Champion
>> bro...@umich.edu
>> (734)936-1985
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/06/24672.php
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/06/24673.php



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-19 Thread Ralph Castain

The default binding option depends on the number of procs - it is bind-to core 
for np=2, and bind-to socket for np > 2. You never said, but should I assume 
you ran 4 ranks? If so, then we should be trying to bind-to socket.

I'm not sure what your cpuset is telling us - are you binding us to a socket? 
Are some cpus in one socket, and some in another?

It could be that the cpuset + bind-to socket is resulting in some odd behavior, 
but I'd need a little more info to narrow it down.


On Jun 18, 2014, at 7:48 PM, Brock Palen  wrote:

> I have started using 1.8.1 for some codes (meep in this case) and it 
> sometimes works fine, but in a few cases I am seeing ranks being given 
> overlapping CPU assignments, not always though.
> 
> Example job, default binding options (so by-core right?):
> 
> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and 
> use TM to spawn.
> 
> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
> [nyx5409:11][nyx5411:11][nyx5412:3]
> 
> [root@nyx5398 ~]# hwloc-bind --get --pid 16065
> 0x0200
> [root@nyx5398 ~]# hwloc-bind --get --pid 16066
> 0x0800
> [root@nyx5398 ~]# hwloc-bind --get --pid 16067
> 0x0200
> [root@nyx5398 ~]# hwloc-bind --get --pid 16068
> 0x0800
> 
> [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
> 8-11
> 
> So torque claims the CPU set setup for the job has 4 cores, but as you can 
> see the ranks were giving identical binding. 
> 
> I checked the pids they were part of the correct CPU set, I also checked, 
> orted:
> 
> [root@nyx5398 ~]# hwloc-bind --get --pid 16064
> 0x0f00
> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064
> ignored unrecognized argument 16064
> 
> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
> 8,9,10,11
> 
> Which is exactly what I would expect.
> 
> So ummm, i'm lost why this might happen?  What else should I check?  Like I 
> said not all jobs show this behavior.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/06/24672.php

[OMPI users] affinity issues under cpuset torque 1.8.1

2014-06-18 Thread Brock Palen

I have started using 1.8.1 for some codes (meep in this case) and it sometimes 
works fine, but in a few cases I am seeing ranks being given overlapping CPU 
assignments, not always though.

Example job, default binding options (so by-core right?):

Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and use 
TM to spawn.

[nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
[nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
[nyx5409:11][nyx5411:11][nyx5412:3]

[root@nyx5398 ~]# hwloc-bind --get --pid 16065
0x0200
[root@nyx5398 ~]# hwloc-bind --get --pid 16066
0x0800
[root@nyx5398 ~]# hwloc-bind --get --pid 16067
0x0200
[root@nyx5398 ~]# hwloc-bind --get --pid 16068
0x0800
  
[root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
8-11

So torque claims the CPU set setup for the job has 4 cores, but as you can see 
the ranks were giving identical binding. 

I checked the pids they were part of the correct CPU set, I also checked, orted:

[root@nyx5398 ~]# hwloc-bind --get --pid 16064
0x0f00
[root@nyx5398 ~]# hwloc-calc --intersect PU 16064
ignored unrecognized argument 16064

[root@nyx5398 ~]# hwloc-calc --intersect PU 0x0f00
8,9,10,11

Which is exactly what I would expect.

So ummm, i'm lost why this might happen?  What else should I check?  Like I 
said not all jobs show this behavior.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985





signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

Re: [OMPI users] affinity issues under cpuset torque 1.8.1

[OMPI users] affinity issues under cpuset torque 1.8.1

18 matches

Site Navigation

Mail list logo

Footer information