Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread Ralph Castain
Added this info to the ticket, and added you to it as well.

Thanks again
Ralph


On Dec 25, 2013, at 3:42 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Hi Ralph,
> 
> Thank you for your reply. After that, I found more reasonable fix,
> I guess. I moved OBJ_CONSTRUCT for opal_tree_item_t out of debug
> part in opal_tree_construct as shown below:
> 
> static void opal_tree_construct(opal_tree_t *tree)
> {
>OBJ_CONSTRUCT( &(tree->opal_tree_sentinel), opal_tree_item_t ); /*
> tmishima */
> #if OPAL_ENABLE_DEBUG
>/* These refcounts should never be used in assertions because they
>   should never be removed from this list, added to another list,
>   etc.  So set them to sentinel values. */
> 
>tree->opal_tree_sentinel.opal_tree_item_refcount  = 1;
>tree->opal_tree_sentinel.opal_tree_item_belong_to = tree;
> #endif
>tree->opal_tree_sentinel.opal_tree_container = tree;
>tree->opal_tree_sentinel.opal_tree_parent = >opal_tree_sentinel;
>tree->opal_tree_sentinel.opal_tree_num_ancestors = -1;
> 
>tree->opal_tree_sentinel.opal_tree_next_sibling =
>>opal_tree_sentinel;
>tree->opal_tree_sentinel.opal_tree_prev_sibling =
>>opal_tree_sentinel;
> 
>tree->opal_tree_sentinel.opal_tree_first_child = >
> opal_tree_sentinel;
>tree->opal_tree_sentinel.opal_tree_last_child = >
> opal_tree_sentinel;
> 
>tree->opal_tree_num_items = 0;
>tree->comp = NULL;
>tree->serialize = NULL;
>tree->deserialize = NULL;
>tree->get_key = NULL;
> }
> 
> In addtion, I checked how lama worked for the hierarchy inversion.
> Then, it did not work on node04 which has the inversion and worked on
> node09 which has normal one. Please foward this information to lama
> developers.
> 
> Regerds,
> Tetsuya Mishima
> 
> qsub: job 8380.manage.cluster completed
> [mishima@manage openmpi-1.7.4rc2r30069]$ qsub -I -l nodes=4:ppn=8
> qsub: waiting for job 8381.manage.cluster to start
> qsub: job 8381.manage.cluster ready
> 
> [mishima@node09 ~]$ cd ~/Desktop/openmpi-1.7/demos/
> [mishima@node09 demos]$ mpirun -np 2 -report-bindings -mca rmaps lama -mca
> rmaps_lama_bind 1N -mca rmaps_lama_map Ncsbnh
> myprog
> [node09.cluster:20144] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.]
> [node09.cluster:20144] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket
> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B]
> Hello world from process 1 of 2
> Hello world from process 0 of 2
> [mishima@node09 demos]$
> 
> 
> qsub: job 8383.manage.cluster completed
> [mishima@manage openmpi-1.7.4rc2r30069]$ qsub -I -l nodes=1:ppn=32
> qsub: waiting for job 8384.manage.cluster to start
> qsub: job 8384.manage.cluster ready
> 
> [mishima@node04 ~]$ cd ~/Desktop/openmpi-1.7/demos/
> [mishima@node04 demos]$ mpirun -np 2 -report-bindings -mca rmaps lama -mca
> rmaps_lama_bind 1N -mca rmaps_lama_map Ncsbnh
> myprog
> --
> RMaps LAMA detected that there are not enough resources to map the
> remainder of the job. Check the command line options, and the number of
> nodes allocated to this job.
> Application Context : 0
> # of Processes Successfully Mapped: 0
> # of Processes Requested  : 2
> Mapping  : Ncsbnh
> Binding  : 1N
> MPPR : [Not Provided]
> Ordering : s
> --
> [node04.cluster:20298] [[21003,0],0] ORTE_ERROR_LOG: Error in file
> rmaps_lama_module.c at line 309
> 
> [node04.cluster:20298] [[21003,0],0] ORTE_ERROR_LOG: Error in file
> base/rmaps_base_map_job.c at line 217
> 
>> Deeply appreciate all you help! Your fix looks reasonable to me and is
> the kind of difference we frequently see between compilers and
> environments, which is why initializing variables is so
>> important. This one apparently slipped by the lama developers.
>> 
>> I'll apply to trunk and cmr it across to 1.7.4.
>> 
>> Thanks again
>> Ralph
>> 
>> On Dec 25, 2013, at 3:39 AM, tmish...@jcity.maeda.co.jp wrote:
>> 
>>> 
>>> 
>>> Hi Ralph,
>>> 
>>> I did valgrind and found uninitialised value errors. All of them
>>> occured in opal_tree_add_child as shown at the bottom. As a quick
>>> fix, I puted one line in "opal_tree.c", although it's not elegant:
>>> 
>>> void opal_tree_init(opal_tree_t *tree, opal_tree_comp_fn_t comp,
>>>   opal_tree_item_serialize_fn_t serialize,
>>>   opal_tree_item_deserialize_fn_t deserialize,
>>>   opal_tree_get_key_fn_t get_key)
>>> {
>>>   tree->comp = comp;
>>>   tree->serialize = serialize;
>>>   tree->deserialize = deserialize;
>>>   tree->get_key = get_key;
>>>   opal_tree_get_root(tree)->opal_tree_num_children = 0 ; /* added by
>>> tmishima */
>>> }
>>> 
>>> Then, these errors all disappeared and openmpi with lama worked fine.
>>> As I told you before, I built 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread tmishima


Hi Ralph,

Thank you for your reply. After that, I found more reasonable fix,
I guess. I moved OBJ_CONSTRUCT for opal_tree_item_t out of debug
part in opal_tree_construct as shown below:

static void opal_tree_construct(opal_tree_t *tree)
{
OBJ_CONSTRUCT( &(tree->opal_tree_sentinel), opal_tree_item_t ); /*
tmishima */
#if OPAL_ENABLE_DEBUG
/* These refcounts should never be used in assertions because they
   should never be removed from this list, added to another list,
   etc.  So set them to sentinel values. */

tree->opal_tree_sentinel.opal_tree_item_refcount  = 1;
tree->opal_tree_sentinel.opal_tree_item_belong_to = tree;
#endif
tree->opal_tree_sentinel.opal_tree_container = tree;
tree->opal_tree_sentinel.opal_tree_parent = >opal_tree_sentinel;
tree->opal_tree_sentinel.opal_tree_num_ancestors = -1;

tree->opal_tree_sentinel.opal_tree_next_sibling =
>opal_tree_sentinel;
tree->opal_tree_sentinel.opal_tree_prev_sibling =
>opal_tree_sentinel;

tree->opal_tree_sentinel.opal_tree_first_child = >
opal_tree_sentinel;
tree->opal_tree_sentinel.opal_tree_last_child = >
opal_tree_sentinel;

tree->opal_tree_num_items = 0;
tree->comp = NULL;
tree->serialize = NULL;
tree->deserialize = NULL;
tree->get_key = NULL;
}

In addtion, I checked how lama worked for the hierarchy inversion.
Then, it did not work on node04 which has the inversion and worked on
node09 which has normal one. Please foward this information to lama
developers.

Regerds,
Tetsuya Mishima

qsub: job 8380.manage.cluster completed
[mishima@manage openmpi-1.7.4rc2r30069]$ qsub -I -l nodes=4:ppn=8
qsub: waiting for job 8381.manage.cluster to start
qsub: job 8381.manage.cluster ready

[mishima@node09 ~]$ cd ~/Desktop/openmpi-1.7/demos/
[mishima@node09 demos]$ mpirun -np 2 -report-bindings -mca rmaps lama -mca
rmaps_lama_bind 1N -mca rmaps_lama_map Ncsbnh
 myprog
[node09.cluster:20144] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]: [B/B/B/B][./././.]
[node09.cluster:20144] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket
1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
cket 1[core 7[hwt 0]]: [./././.][B/B/B/B]
Hello world from process 1 of 2
Hello world from process 0 of 2
[mishima@node09 demos]$


qsub: job 8383.manage.cluster completed
[mishima@manage openmpi-1.7.4rc2r30069]$ qsub -I -l nodes=1:ppn=32
qsub: waiting for job 8384.manage.cluster to start
qsub: job 8384.manage.cluster ready

[mishima@node04 ~]$ cd ~/Desktop/openmpi-1.7/demos/
[mishima@node04 demos]$ mpirun -np 2 -report-bindings -mca rmaps lama -mca
rmaps_lama_bind 1N -mca rmaps_lama_map Ncsbnh
 myprog
--
RMaps LAMA detected that there are not enough resources to map the
remainder of the job. Check the command line options, and the number of
nodes allocated to this job.
 Application Context : 0
 # of Processes Successfully Mapped: 0
 # of Processes Requested  : 2
 Mapping  : Ncsbnh
 Binding  : 1N
 MPPR : [Not Provided]
 Ordering : s
--
[node04.cluster:20298] [[21003,0],0] ORTE_ERROR_LOG: Error in file
rmaps_lama_module.c at line 309

[node04.cluster:20298] [[21003,0],0] ORTE_ERROR_LOG: Error in file
base/rmaps_base_map_job.c at line 217

> Deeply appreciate all you help! Your fix looks reasonable to me and is
the kind of difference we frequently see between compilers and
environments, which is why initializing variables is so
> important. This one apparently slipped by the lama developers.
>
> I'll apply to trunk and cmr it across to 1.7.4.
>
> Thanks again
> Ralph
>
> On Dec 25, 2013, at 3:39 AM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Hi Ralph,
> >
> > I did valgrind and found uninitialised value errors. All of them
> > occured in opal_tree_add_child as shown at the bottom. As a quick
> > fix, I puted one line in "opal_tree.c", although it's not elegant:
> >
> > void opal_tree_init(opal_tree_t *tree, opal_tree_comp_fn_t comp,
> >opal_tree_item_serialize_fn_t serialize,
> >opal_tree_item_deserialize_fn_t deserialize,
> >opal_tree_get_key_fn_t get_key)
> > {
> >tree->comp = comp;
> >tree->serialize = serialize;
> >tree->deserialize = deserialize;
> >tree->get_key = get_key;
> >opal_tree_get_root(tree)->opal_tree_num_children = 0 ; /* added by
> > tmishima */
> > }
> >
> > Then, these errors all disappeared and openmpi with lama worked fine.
> > As I told you before, I built openmpi with PGI 13.10. As far as I
> > checked, no error was detected by valgrind with openmpi built by
> > GNU compiler. Therefore, it might depend on compiler...
> > Anyway, I would like to ask you (or openmpi team) to continue
> > further investigation.
> >
> > Regards,
> > Tetsuya Mishima
> >
> > valgrind -v 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread Ralph Castain
Deeply appreciate all you help! Your fix looks reasonable to me and is the kind 
of difference we frequently see between compilers and environments, which is 
why initializing variables is so important. This one apparently slipped by the 
lama developers.

I'll apply to trunk and cmr it across to 1.7.4.

Thanks again
Ralph

On Dec 25, 2013, at 3:39 AM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Hi Ralph,
> 
> I did valgrind and found uninitialised value errors. All of them
> occured in opal_tree_add_child as shown at the bottom. As a quick
> fix, I puted one line in "opal_tree.c", although it's not elegant:
> 
> void opal_tree_init(opal_tree_t *tree, opal_tree_comp_fn_t comp,
>opal_tree_item_serialize_fn_t serialize,
>opal_tree_item_deserialize_fn_t deserialize,
>opal_tree_get_key_fn_t get_key)
> {
>tree->comp = comp;
>tree->serialize = serialize;
>tree->deserialize = deserialize;
>tree->get_key = get_key;
>opal_tree_get_root(tree)->opal_tree_num_children = 0 ; /* added by
> tmishima */
> }
> 
> Then, these errors all disappeared and openmpi with lama worked fine.
> As I told you before, I built openmpi with PGI 13.10. As far as I
> checked, no error was detected by valgrind with openmpi built by
> GNU compiler. Therefore, it might depend on compiler...
> Anyway, I would like to ask you (or openmpi team) to continue
> further investigation.
> 
> Regards,
> Tetsuya Mishima
> 
> valgrind -v --error-limit=no --leak-check=yes --show-reachable=no mpirun
> -np 1 -mca rmaps lama -report-bindings -mca rmaps_base_verbose 100
> --display-map ~/Desktop/openmpi-1.7/demos/myprog 2>&1 | tee valgrind.log
> 
> 
> ==27313== Conditional jump or move depends on uninitialised value(s)
> ==27313==at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191)
> ==27313==by 0x81E3314: rmaps_lama_convert_hwloc_subtree
> (rmaps_lama_max_tree.c:320)
> ==27313==by 0x81E321D: rmaps_lama_convert_hwloc_tree_to_opal_tree
> (rmaps_lama_max_tree.c:267)
> ==27313==by 0x81E2EE8: rmaps_lama_build_max_tree
> (rmaps_lama_max_tree.c:154)
> ==27313==by 0x81E0E58: orte_rmaps_lama_map_core
> (rmaps_lama_module.c:664)
> ==27313==by 0x81E02D7: orte_rmaps_lama_map (rmaps_lama_module.c:303)
> ==27313==by 0x4C6468B: orte_rmaps_base_map_job
> (rmaps_base_map_job.c:204)
> ==27313==by 0x4F094CC: event_process_active_single_queue (event.c:1366)
> ==27313==by 0x4F090D8: event_process_active (event.c:1434)
> ==27313==by 0x4F050FF: opal_libevent2021_event_base_loop (event.c:1645)
> ==27313==by 0x4079A6: orterun (orterun.c:1049)
> ==27313==by 0x40694A: main (main.c:13)
> .
> ==27313== Conditional jump or move depends on uninitialised value(s)
> ==27313==at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191)
> ==27313==by 0x4EC5D0E: deserialize_add_tree_item (opal_tree.c:496)
> ==27313==by 0x4EC5578: opal_tree_deserialize (opal_tree.c:524)
> ==27313==by 0x4EC5609: opal_tree_dup (opal_tree.c:544)
> ==27313==by 0x81E2FF6: rmaps_lama_build_max_tree
> (rmaps_lama_max_tree.c:202)
> ==27313==by 0x81E0E58: orte_rmaps_lama_map_core
> (rmaps_lama_module.c:664)
> ==27313==by 0x81E02D7: orte_rmaps_lama_map (rmaps_lama_module.c:303)
> ==27313==by 0x4C6468B: orte_rmaps_base_map_job
> (rmaps_base_map_job.c:204)
> ==27313==by 0x4F094CC: event_process_active_single_queue (event.c:1366)
> ==27313==by 0x4F090D8: event_process_active (event.c:1434)
> ==27313==by 0x4F050FF: opal_libevent2021_event_base_loop (event.c:1645)
> ==27313==by 0x4079A6: orterun (orterun.c:1049)
> 
> ==27313== Conditional jump or move depends on uninitialised value(s)
> ==27313==at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191)
> ==27313==by 0x4EC5D0E: deserialize_add_tree_item (opal_tree.c:496)
> ==27313==by 0x4EC5578: opal_tree_deserialize (opal_tree.c:524)
> ==27313==by 0x4EC5609: opal_tree_dup (opal_tree.c:544)
> ==27313==by 0x81E2FF6: ???
> ==27313==by 0x81E0E58: ???
> ==27313==by 0x81E02D7: ???
> ==27313==by 0x4C6468B: orte_rmaps_base_map_job
> (rmaps_base_map_job.c:204)
> ==27313==by 0x4F094CC: event_process_active_single_queue (event.c:1366)
> ==27313==by 0x4F090D8: event_process_active (event.c:1434)
> ==27313==by 0x4F050FF: opal_libevent2021_event_base_loop (event.c:1645)
> ==27313==by 0x4079A6: orterun (orterun.c:1049)
> .
> ==27313== Conditional jump or move depends on uninitialised value(s)
> ==27313==at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191)
> ==27313==by 0x81E3314: ???
> ==27313==by 0x81E321D: ???
> ==27313==by 0x81E2EE8: ???
> ==27313==by 0x81E0E58: ???
> ==27313==by 0x81E02D7: ???
> ==27313==by 0x4C6468B: orte_rmaps_base_map_job
> (rmaps_base_map_job.c:204)
> ==27313==by 0x4F094CC: event_process_active_single_queue (event.c:1366)
> ==27313==by 0x4F090D8: event_process_active (event.c:1434)
> ==27313==by 0x4F050FF: 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread tmishima


Hi Ralph,

I did valgrind and found uninitialised value errors. All of them
occured in opal_tree_add_child as shown at the bottom. As a quick
fix, I puted one line in "opal_tree.c", although it's not elegant:

void opal_tree_init(opal_tree_t *tree, opal_tree_comp_fn_t comp,
opal_tree_item_serialize_fn_t serialize,
opal_tree_item_deserialize_fn_t deserialize,
opal_tree_get_key_fn_t get_key)
{
tree->comp = comp;
tree->serialize = serialize;
tree->deserialize = deserialize;
tree->get_key = get_key;
opal_tree_get_root(tree)->opal_tree_num_children = 0 ; /* added by
tmishima */
}

Then, these errors all disappeared and openmpi with lama worked fine.
As I told you before, I built openmpi with PGI 13.10. As far as I
checked, no error was detected by valgrind with openmpi built by
GNU compiler. Therefore, it might depend on compiler...
Anyway, I would like to ask you (or openmpi team) to continue
further investigation.

Regards,
Tetsuya Mishima

valgrind -v --error-limit=no --leak-check=yes --show-reachable=no mpirun
-np 1 -mca rmaps lama -report-bindings -mca rmaps_base_verbose 100
--display-map ~/Desktop/openmpi-1.7/demos/myprog 2>&1 | tee valgrind.log


==27313== Conditional jump or move depends on uninitialised value(s)
==27313==at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191)
==27313==by 0x81E3314: rmaps_lama_convert_hwloc_subtree
(rmaps_lama_max_tree.c:320)
==27313==by 0x81E321D: rmaps_lama_convert_hwloc_tree_to_opal_tree
(rmaps_lama_max_tree.c:267)
==27313==by 0x81E2EE8: rmaps_lama_build_max_tree
(rmaps_lama_max_tree.c:154)
==27313==by 0x81E0E58: orte_rmaps_lama_map_core
(rmaps_lama_module.c:664)
==27313==by 0x81E02D7: orte_rmaps_lama_map (rmaps_lama_module.c:303)
==27313==by 0x4C6468B: orte_rmaps_base_map_job
(rmaps_base_map_job.c:204)
==27313==by 0x4F094CC: event_process_active_single_queue (event.c:1366)
==27313==by 0x4F090D8: event_process_active (event.c:1434)
==27313==by 0x4F050FF: opal_libevent2021_event_base_loop (event.c:1645)
==27313==by 0x4079A6: orterun (orterun.c:1049)
==27313==by 0x40694A: main (main.c:13)
.
==27313== Conditional jump or move depends on uninitialised value(s)
==27313==at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191)
==27313==by 0x4EC5D0E: deserialize_add_tree_item (opal_tree.c:496)
==27313==by 0x4EC5578: opal_tree_deserialize (opal_tree.c:524)
==27313==by 0x4EC5609: opal_tree_dup (opal_tree.c:544)
==27313==by 0x81E2FF6: rmaps_lama_build_max_tree
(rmaps_lama_max_tree.c:202)
==27313==by 0x81E0E58: orte_rmaps_lama_map_core
(rmaps_lama_module.c:664)
==27313==by 0x81E02D7: orte_rmaps_lama_map (rmaps_lama_module.c:303)
==27313==by 0x4C6468B: orte_rmaps_base_map_job
(rmaps_base_map_job.c:204)
==27313==by 0x4F094CC: event_process_active_single_queue (event.c:1366)
==27313==by 0x4F090D8: event_process_active (event.c:1434)
==27313==by 0x4F050FF: opal_libevent2021_event_base_loop (event.c:1645)
==27313==by 0x4079A6: orterun (orterun.c:1049)

==27313== Conditional jump or move depends on uninitialised value(s)
==27313==at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191)
==27313==by 0x4EC5D0E: deserialize_add_tree_item (opal_tree.c:496)
==27313==by 0x4EC5578: opal_tree_deserialize (opal_tree.c:524)
==27313==by 0x4EC5609: opal_tree_dup (opal_tree.c:544)
==27313==by 0x81E2FF6: ???
==27313==by 0x81E0E58: ???
==27313==by 0x81E02D7: ???
==27313==by 0x4C6468B: orte_rmaps_base_map_job
(rmaps_base_map_job.c:204)
==27313==by 0x4F094CC: event_process_active_single_queue (event.c:1366)
==27313==by 0x4F090D8: event_process_active (event.c:1434)
==27313==by 0x4F050FF: opal_libevent2021_event_base_loop (event.c:1645)
==27313==by 0x4079A6: orterun (orterun.c:1049)
.
==27313== Conditional jump or move depends on uninitialised value(s)
==27313==at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191)
==27313==by 0x81E3314: ???
==27313==by 0x81E321D: ???
==27313==by 0x81E2EE8: ???
==27313==by 0x81E0E58: ???
==27313==by 0x81E02D7: ???
==27313==by 0x4C6468B: orte_rmaps_base_map_job
(rmaps_base_map_job.c:204)
==27313==by 0x4F094CC: event_process_active_single_queue (event.c:1366)
==27313==by 0x4F090D8: event_process_active (event.c:1434)
==27313==by 0x4F050FF: opal_libevent2021_event_base_loop (event.c:1645)
==27313==by 0x4079A6: orterun (orterun.c:1049)
==27313==by 0x40694A: main (main.c:13)



> Hi Ralph,
>
> Here is the output when I put "-mca rmaps_base_verbose 10 --display-map"
> and where it stopped(by gdb), which shows it stopped in a function of
lama.
>
> I usually use PGI 13.10, so I tried to change it to gnu compiler.
> Then, it works. Therefore, this problem depends on compiler.
>
> That's all what I could find today.
>
> Regards,
> Tetsuya Mishima
>
> [mishima@manage ~]$ gdb
> GNU gdb (GDB) CentOS 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-23 Thread tmishima


Hi Ralph,

Here is the output when I put "-mca rmaps_base_verbose 10 --display-map"
and where it stopped(by gdb), which shows it stopped in a function of lama.

I usually use PGI 13.10, so I tried to change it to gnu compiler.
Then, it works. Therefore, this problem depends on compiler.

That's all what I could find today.

Regards,
Tetsuya Mishima

[mishima@manage ~]$ gdb
GNU gdb (GDB) CentOS (7.0.1-42.el5.centos.1)

(gdb) attach 14666

0x2b4c5c33 in rmaps_lama_prune_max_tree ()
at ./rmaps_lama_max_tree.c:814

[mishima@manage demos]$ mpirun -np 2 -mca rmaps lama -report-bindings -mca
rmaps_base_verbose 10 --display-map myprog
[manage.cluster:21503] mca: base: components_register: registering rmaps
components
[manage.cluster:21503] mca: base: components_register: found loaded
component lama
[manage.cluster:21503] mca:rmaps:lama: Priority   0
[manage.cluster:21503] mca:rmaps:lama: Map   : NULL
[manage.cluster:21503] mca:rmaps:lama: Bind  : NULL
[manage.cluster:21503] mca:rmaps:lama: MPPR  : NULL
[manage.cluster:21503] mca:rmaps:lama: Order : NULL
[manage.cluster:21503] mca: base: components_register: component lama
register function successful
[manage.cluster:21503] mca: base: components_open: opening rmaps components
[manage.cluster:21503] mca: base: components_open: found loaded component
lama
[manage.cluster:21503] mca:rmaps:select: checking available component lama
[manage.cluster:21503] mca:rmaps:select: Querying component [lama]
[manage.cluster:21503] [[23940,0],0]: Final mapper priorities
[manage.cluster:21503]  Mapper: lama Priority: 0
[manage.cluster:21503] mca:rmaps: mapping job [23940,1]
[manage.cluster:21503] mca:rmaps: creating new map for job [23940,1]
[manage.cluster:21503] mca:rmaps: nprocs 2
[manage.cluster:21503] mca:rmaps:lama: Mapping job [23940,1]
[manage.cluster:21503] mca:rmaps:lama: Revised Parameters -
[manage.cluster:21503] mca:rmaps:lama: Map   : csbnh
[manage.cluster:21503] mca:rmaps:lama: Bind  : 1c
[manage.cluster:21503] mca:rmaps:lama: MPPR  : (null)
[manage.cluster:21503] mca:rmaps:lama: Order : s
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Binding  : [1c]
[manage.cluster:21503] mca:rmaps:lama: - Binding  :1 x   Core
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : [csbnh]
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (0)   Core (7
vs 0)
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (1) Socket (3
vs 1)
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (2)  Board (1
vs 3)
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (3)Machine (0
vs 7)
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (4) Hw. Thread (8
vs 8)
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - MPPR : [(null)]
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Ordering : [s]
[manage.cluster:21503] mca:rmaps:lama: - Ordering : Sequential
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] AVAILABLE NODES FOR MAPPING:
[manage.cluster:21503] node: manage daemon: 0
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Building the Max Tree...
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Converting Remote Tree: manage

[mishima@manage demos]$ ompi_info | grep "C compiler family"
  C compiler family name: GNU
[mishima@manage demos]$ mpirun -np 2 -mca rmaps lama myprog
Hello world from process 0 of 2
Hello world from process 1 of 2



> On Dec 21, 2013, at 8:16 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Ralph, thanks. I'll try it on Tuseday.
> >
> > Let me confirm one thing. I don't put "-with-libevent" when I build
> > openmpi.
> > Is there any possibility to build with external libevent automatically?
>
> No - only happens if you direct it
>
>
> >
> > Tetsuya Mishima
> >
> >
> >> Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to
> > your cmd line and let's see if it finishes the mapping.
> >>
> >> Unless you specifically built with an external libevent (which I
doubt),
> > there is no conflict. The connection issue is unlikely to be a factor
here
> > as it works when not using the lama mapper.
> >>
> >>
> >> On Dec 21, 2013, at 3:43 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >>>
> >>>
> >>> Thank you, Ralph.
> >>>
> >>> Then, this problem should depend on our environment.
> >>> But, at least, inversion problem is not the cause because
> >>> node05 has normal hier order.
> >>>
> >>> I can not connect to our cluster now. Tuesday, going
> >>> back to my office, I'll send you further report.
> >>>
> >>> Before that, please let 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-22 Thread Ralph Castain

On Dec 21, 2013, at 8:16 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Ralph, thanks. I'll try it on Tuseday.
> 
> Let me confirm one thing. I don't put "-with-libevent" when I build
> openmpi.
> Is there any possibility to build with external libevent automatically?

No - only happens if you direct it


> 
> Tetsuya Mishima
> 
> 
>> Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to
> your cmd line and let's see if it finishes the mapping.
>> 
>> Unless you specifically built with an external libevent (which I doubt),
> there is no conflict. The connection issue is unlikely to be a factor here
> as it works when not using the lama mapper.
>> 
>> 
>> On Dec 21, 2013, at 3:43 PM, tmish...@jcity.maeda.co.jp wrote:
>> 
>>> 
>>> 
>>> Thank you, Ralph.
>>> 
>>> Then, this problem should depend on our environment.
>>> But, at least, inversion problem is not the cause because
>>> node05 has normal hier order.
>>> 
>>> I can not connect to our cluster now. Tuesday, going
>>> back to my office, I'll send you further report.
>>> 
>>> Before that, please let me know your configuration. I will
>>> follow your configuation as much as possible. Our configuraion
>>> is very simple, only -with-tm -with-ibverbs -disable-ipv6.
>>> (on CentOS 5.8)
>>> 
>>> The 1.7 series is a llite bit unstable on our cluster yet.
>>> 
>>> Similar freezing(hang up) was observed with 1.7.3. At that
>>> time, lama worked well but putting "-rank-by something" caused
>>> same freezing (curiously, rank-by works with 1.7.4rc1).
>>> I checked where it stopped using gdb, then I found that it
>>> stopped to wait for event in a function of libevent(I can not
>>> recall the name).
>>> 
>>> Is this related to your "connection issue in the OOB
>>> subsystem"? Or libevent version conflict? I guess these two
>>> problems are related each other. They stopped at very early
>>> stage before reaching mapping function because no message
>>> appeared before freezing, which is my random guess.
>>> 
>>> Could you give me any hint or comment?
>>> 
>>> Regards,
>>> Tetsuya Mishima
>>> 
>>> 
 It seems to be working fine for me:
 
 [rhc@bend001 tcp]$ mpirun -np 2 -host bend001 -report-bindings -mca
>>> rmaps_lama_bind 1c -mca rmaps lama hostname
 bend001
 [bend001:17005] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]:
>>> [../BB/../../../..][../../../../../..]
 [bend001:17005] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
>>> [BB/../../../../..][../../../../../..]
 bend001
 [rhc@bend001 tcp]$
 
 (I also checked the internals using "-mca rmaps_base_verbose 10") so
> it
>>> could be your hier inversion causing problems again. Or it could be
> that
>>> you are hitting a connection issue we are seeing in
 some scenarios in the OOB subsystem - though if you are able to run
> using
>>> a non-lama mapper, that would seem unlikely.
 
 
 On Dec 20, 2013, at 8:09 PM, tmish...@jcity.maeda.co.jp wrote:
 
 
 
 Hi Ralph,
 
 Thank you very much. I tried many things such as:
 
 mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca
 rmaps_lama_bind 1c myprog
 
 But every try failed. At least they were accepted by openmpi-1.7.3 as
> far
 as I remember.
 Anyway, please check it when you have a time, because using lama comes
>>> from
 my curiosity.
 
 Regards,
 Tetsuya Mishima
 
 
 I'll try to take a look at it - my expectation is that lama might get
 stuck because you didn't tell it a pattern to map, and I doubt that
> code
 path has seen much testing.
 
 
 On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote:
 
 
 
 Hi Ralph, I'm glad to hear that, thanks.
 
 By the way, yesterday I tried to check how lama in 1.7.4rc treat numa
 node.
 
 Then, even wiht this simple command line, it freezed without any
 massage:
 
 mpirun -np 2 -host node05 -mca rmaps lama myprog
 
 Could you check what happened?
 
 Is it better to open new thread or continue this thread?
 
 Regards,
 Tetsuya Mishima
 
 
 I'll make it work so that NUMA can be either above or below socket
 
 On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:
 
 
 
 Hi Brice,
 
 Thank you for your comment. I understand what you mean.
 
 My opinion was made just considering easy way to adjust the code for
 inversion of hierarchy in object tree.
 
 Tetsuya Mishima
 
 
 I don't think there's any such difference.
 Also, all these NUMA architectures are reported the same by hwloc,
 and
 therefore used the same in Open MPI.
 
 And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours
 (and
 most recent AMD and Intel platforms).
 
 Brice
 
 
 
 Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
 
 Hi Ralph,
 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread tmishima


Ralph, thanks. I'll try it on Tuseday.

Let me confirm one thing. I don't put "-with-libevent" when I build
openmpi.
Is there any possibility to build with external libevent automatically?

Tetsuya Mishima


> Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to
your cmd line and let's see if it finishes the mapping.
>
> Unless you specifically built with an external libevent (which I doubt),
there is no conflict. The connection issue is unlikely to be a factor here
as it works when not using the lama mapper.
>
>
> On Dec 21, 2013, at 3:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Thank you, Ralph.
> >
> > Then, this problem should depend on our environment.
> > But, at least, inversion problem is not the cause because
> > node05 has normal hier order.
> >
> > I can not connect to our cluster now. Tuesday, going
> > back to my office, I'll send you further report.
> >
> > Before that, please let me know your configuration. I will
> > follow your configuation as much as possible. Our configuraion
> > is very simple, only -with-tm -with-ibverbs -disable-ipv6.
> > (on CentOS 5.8)
> >
> > The 1.7 series is a llite bit unstable on our cluster yet.
> >
> > Similar freezing(hang up) was observed with 1.7.3. At that
> > time, lama worked well but putting "-rank-by something" caused
> > same freezing (curiously, rank-by works with 1.7.4rc1).
> > I checked where it stopped using gdb, then I found that it
> > stopped to wait for event in a function of libevent(I can not
> > recall the name).
> >
> > Is this related to your "connection issue in the OOB
> > subsystem"? Or libevent version conflict? I guess these two
> > problems are related each other. They stopped at very early
> > stage before reaching mapping function because no message
> > appeared before freezing, which is my random guess.
> >
> > Could you give me any hint or comment?
> >
> > Regards,
> > Tetsuya Mishima
> >
> >
> >> It seems to be working fine for me:
> >>
> >> [rhc@bend001 tcp]$ mpirun -np 2 -host bend001 -report-bindings -mca
> > rmaps_lama_bind 1c -mca rmaps lama hostname
> >> bend001
> >> [bend001:17005] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]:
> > [../BB/../../../..][../../../../../..]
> >> [bend001:17005] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
> > [BB/../../../../..][../../../../../..]
> >> bend001
> >> [rhc@bend001 tcp]$
> >>
> >> (I also checked the internals using "-mca rmaps_base_verbose 10") so
it
> > could be your hier inversion causing problems again. Or it could be
that
> > you are hitting a connection issue we are seeing in
> >> some scenarios in the OOB subsystem - though if you are able to run
using
> > a non-lama mapper, that would seem unlikely.
> >>
> >>
> >> On Dec 20, 2013, at 8:09 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >>
> >>
> >> Hi Ralph,
> >>
> >> Thank you very much. I tried many things such as:
> >>
> >> mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca
> >> rmaps_lama_bind 1c myprog
> >>
> >> But every try failed. At least they were accepted by openmpi-1.7.3 as
far
> >> as I remember.
> >> Anyway, please check it when you have a time, because using lama comes
> > from
> >> my curiosity.
> >>
> >> Regards,
> >> Tetsuya Mishima
> >>
> >>
> >> I'll try to take a look at it - my expectation is that lama might get
> >> stuck because you didn't tell it a pattern to map, and I doubt that
code
> >> path has seen much testing.
> >>
> >>
> >> On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >>
> >>
> >> Hi Ralph, I'm glad to hear that, thanks.
> >>
> >> By the way, yesterday I tried to check how lama in 1.7.4rc treat numa
> >> node.
> >>
> >> Then, even wiht this simple command line, it freezed without any
> >> massage:
> >>
> >> mpirun -np 2 -host node05 -mca rmaps lama myprog
> >>
> >> Could you check what happened?
> >>
> >> Is it better to open new thread or continue this thread?
> >>
> >> Regards,
> >> Tetsuya Mishima
> >>
> >>
> >> I'll make it work so that NUMA can be either above or below socket
> >>
> >> On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >>
> >>
> >> Hi Brice,
> >>
> >> Thank you for your comment. I understand what you mean.
> >>
> >> My opinion was made just considering easy way to adjust the code for
> >> inversion of hierarchy in object tree.
> >>
> >> Tetsuya Mishima
> >>
> >>
> >> I don't think there's any such difference.
> >> Also, all these NUMA architectures are reported the same by hwloc,
> >> and
> >> therefore used the same in Open MPI.
> >>
> >> And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours
> >> (and
> >> most recent AMD and Intel platforms).
> >>
> >> Brice
> >>
> >>
> >>
> >> Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
> >>
> >> Hi Ralph,
> >>
> >> The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
> >> coherent)NUMA,
> >> which seems to be a little bit different from the traditional numa
> >> defined
> >> in openmpi.
> >>
> >> I 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread Ralph Castain
Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to your cmd 
line and let's see if it finishes the mapping.

Unless you specifically built with an external libevent (which I doubt), there 
is no conflict. The connection issue is unlikely to be a factor here as it 
works when not using the lama mapper.


On Dec 21, 2013, at 3:43 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Thank you, Ralph.
> 
> Then, this problem should depend on our environment.
> But, at least, inversion problem is not the cause because
> node05 has normal hier order.
> 
> I can not connect to our cluster now. Tuesday, going
> back to my office, I'll send you further report.
> 
> Before that, please let me know your configuration. I will
> follow your configuation as much as possible. Our configuraion
> is very simple, only -with-tm -with-ibverbs -disable-ipv6.
> (on CentOS 5.8)
> 
> The 1.7 series is a llite bit unstable on our cluster yet.
> 
> Similar freezing(hang up) was observed with 1.7.3. At that
> time, lama worked well but putting "-rank-by something" caused
> same freezing (curiously, rank-by works with 1.7.4rc1).
> I checked where it stopped using gdb, then I found that it
> stopped to wait for event in a function of libevent(I can not
> recall the name).
> 
> Is this related to your "connection issue in the OOB
> subsystem"? Or libevent version conflict? I guess these two
> problems are related each other. They stopped at very early
> stage before reaching mapping function because no message
> appeared before freezing, which is my random guess.
> 
> Could you give me any hint or comment?
> 
> Regards,
> Tetsuya Mishima
> 
> 
>> It seems to be working fine for me:
>> 
>> [rhc@bend001 tcp]$ mpirun -np 2 -host bend001 -report-bindings -mca
> rmaps_lama_bind 1c -mca rmaps lama hostname
>> bend001
>> [bend001:17005] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]:
> [../BB/../../../..][../../../../../..]
>> [bend001:17005] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
> [BB/../../../../..][../../../../../..]
>> bend001
>> [rhc@bend001 tcp]$
>> 
>> (I also checked the internals using "-mca rmaps_base_verbose 10") so it
> could be your hier inversion causing problems again. Or it could be that
> you are hitting a connection issue we are seeing in
>> some scenarios in the OOB subsystem - though if you are able to run using
> a non-lama mapper, that would seem unlikely.
>> 
>> 
>> On Dec 20, 2013, at 8:09 PM, tmish...@jcity.maeda.co.jp wrote:
>> 
>> 
>> 
>> Hi Ralph,
>> 
>> Thank you very much. I tried many things such as:
>> 
>> mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca
>> rmaps_lama_bind 1c myprog
>> 
>> But every try failed. At least they were accepted by openmpi-1.7.3 as far
>> as I remember.
>> Anyway, please check it when you have a time, because using lama comes
> from
>> my curiosity.
>> 
>> Regards,
>> Tetsuya Mishima
>> 
>> 
>> I'll try to take a look at it - my expectation is that lama might get
>> stuck because you didn't tell it a pattern to map, and I doubt that code
>> path has seen much testing.
>> 
>> 
>> On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote:
>> 
>> 
>> 
>> Hi Ralph, I'm glad to hear that, thanks.
>> 
>> By the way, yesterday I tried to check how lama in 1.7.4rc treat numa
>> node.
>> 
>> Then, even wiht this simple command line, it freezed without any
>> massage:
>> 
>> mpirun -np 2 -host node05 -mca rmaps lama myprog
>> 
>> Could you check what happened?
>> 
>> Is it better to open new thread or continue this thread?
>> 
>> Regards,
>> Tetsuya Mishima
>> 
>> 
>> I'll make it work so that NUMA can be either above or below socket
>> 
>> On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:
>> 
>> 
>> 
>> Hi Brice,
>> 
>> Thank you for your comment. I understand what you mean.
>> 
>> My opinion was made just considering easy way to adjust the code for
>> inversion of hierarchy in object tree.
>> 
>> Tetsuya Mishima
>> 
>> 
>> I don't think there's any such difference.
>> Also, all these NUMA architectures are reported the same by hwloc,
>> and
>> therefore used the same in Open MPI.
>> 
>> And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours
>> (and
>> most recent AMD and Intel platforms).
>> 
>> Brice
>> 
>> 
>> 
>> Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
>> 
>> Hi Ralph,
>> 
>> The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
>> coherent)NUMA,
>> which seems to be a little bit different from the traditional numa
>> defined
>> in openmpi.
>> 
>> I notice that ccNUMA object is almost same as L3cache object.
>> So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want
>> to
>> do.
>> Therefore, "do not touch it" is one of the solution, I think ...
>> 
>> Anyway, mixing up these two types of numa is the problem.
>> 
>> Regards,
>> Tetsuya Mishima
>> 
>> I can wait it'll be fixed in 1.7.5 or later, because putting
>> "-bind-to
>> numa"
>> and "-map-by numa" at the same time works as a 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread tmishima


Thank you, Ralph.

Then, this problem should depend on our environment.
But, at least, inversion problem is not the cause because
node05 has normal hier order.

I can not connect to our cluster now. Tuesday, going
back to my office, I'll send you further report.

Before that, please let me know your configuration. I will
follow your configuation as much as possible. Our configuraion
is very simple, only -with-tm -with-ibverbs -disable-ipv6.
(on CentOS 5.8)

The 1.7 series is a llite bit unstable on our cluster yet.

Similar freezing(hang up) was observed with 1.7.3. At that
time, lama worked well but putting "-rank-by something" caused
same freezing (curiously, rank-by works with 1.7.4rc1).
I checked where it stopped using gdb, then I found that it
stopped to wait for event in a function of libevent(I can not
recall the name).

Is this related to your "connection issue in the OOB
subsystem"? Or libevent version conflict? I guess these two
problems are related each other. They stopped at very early
stage before reaching mapping function because no message
appeared before freezing, which is my random guess.

Could you give me any hint or comment?

Regards,
Tetsuya Mishima


> It seems to be working fine for me:
>
> [rhc@bend001 tcp]$ mpirun -np 2 -host bend001 -report-bindings -mca
rmaps_lama_bind 1c -mca rmaps lama hostname
> bend001
> [bend001:17005] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]:
[../BB/../../../..][../../../../../..]
> [bend001:17005] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
[BB/../../../../..][../../../../../..]
> bend001
> [rhc@bend001 tcp]$
>
> (I also checked the internals using "-mca rmaps_base_verbose 10") so it
could be your hier inversion causing problems again. Or it could be that
you are hitting a connection issue we are seeing in
> some scenarios in the OOB subsystem - though if you are able to run using
a non-lama mapper, that would seem unlikely.
>
>
> On Dec 20, 2013, at 8:09 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
>
> Hi Ralph,
>
> Thank you very much. I tried many things such as:
>
> mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca
> rmaps_lama_bind 1c myprog
>
> But every try failed. At least they were accepted by openmpi-1.7.3 as far
> as I remember.
> Anyway, please check it when you have a time, because using lama comes
from
> my curiosity.
>
> Regards,
> Tetsuya Mishima
>
>
> I'll try to take a look at it - my expectation is that lama might get
> stuck because you didn't tell it a pattern to map, and I doubt that code
> path has seen much testing.
>
>
> On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
>
> Hi Ralph, I'm glad to hear that, thanks.
>
> By the way, yesterday I tried to check how lama in 1.7.4rc treat numa
> node.
>
> Then, even wiht this simple command line, it freezed without any
> massage:
>
> mpirun -np 2 -host node05 -mca rmaps lama myprog
>
> Could you check what happened?
>
> Is it better to open new thread or continue this thread?
>
> Regards,
> Tetsuya Mishima
>
>
> I'll make it work so that NUMA can be either above or below socket
>
> On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:
>
>
>
> Hi Brice,
>
> Thank you for your comment. I understand what you mean.
>
> My opinion was made just considering easy way to adjust the code for
> inversion of hierarchy in object tree.
>
> Tetsuya Mishima
>
>
> I don't think there's any such difference.
> Also, all these NUMA architectures are reported the same by hwloc,
> and
> therefore used the same in Open MPI.
>
> And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours
> (and
> most recent AMD and Intel platforms).
>
> Brice
>
>
>
> Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
>
> Hi Ralph,
>
> The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
> coherent)NUMA,
> which seems to be a little bit different from the traditional numa
> defined
> in openmpi.
>
> I notice that ccNUMA object is almost same as L3cache object.
> So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want
> to
> do.
> Therefore, "do not touch it" is one of the solution, I think ...
>
> Anyway, mixing up these two types of numa is the problem.
>
> Regards,
> Tetsuya Mishima
>
> I can wait it'll be fixed in 1.7.5 or later, because putting
> "-bind-to
> numa"
> and "-map-by numa" at the same time works as a workaround.
>
> Thanks,
> Tetsuya Mishima
>
> Yeah, it will impact everything that uses hwloc topology maps, I
> fear.
>
> One side note: you'll need to add --hetero-nodes to your cmd
> line.
> If
> we
> don't see that, we assume that all the node topologies are
> identical
> -
> which clearly isn't true here.
> I'll try to resolve the hier inversion over the holiday - won't
> be
> for
> 1.7.4, but hopefully for 1.7.5
> Thanks
> Ralph
>
> On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> I think it's normal for AMD opteron having 8/16 cores such as
> magny cours or interlagos. Because it usually has 2 numa 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread Ralph Castain
It seems to be working fine for me:

[rhc@bend001 tcp]$ mpirun -np 2 -host bend001 -report-bindings -mca 
rmaps_lama_bind 1c -mca rmaps lama hostname
bend001
[bend001:17005] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../..][../../../../../..]
[bend001:17005] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: 
[BB/../../../../..][../../../../../..]
bend001
[rhc@bend001 tcp]$ 

(I also checked the internals using "-mca rmaps_base_verbose 10") so it could 
be your hier inversion causing problems again. Or it could be that you are 
hitting a connection issue we are seeing in some scenarios in the OOB subsystem 
- though if you are able to run using a non-lama mapper, that would seem 
unlikely.


On Dec 20, 2013, at 8:09 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Hi Ralph,
> 
> Thank you very much. I tried many things such as:
> 
> mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca
> rmaps_lama_bind 1c myprog
> 
> But every try failed. At least they were accepted by openmpi-1.7.3 as far
> as I remember.
> Anyway, please check it when you have a time, because using lama comes from
> my curiosity.
> 
> Regards,
> Tetsuya Mishima
> 
> 
>> I'll try to take a look at it - my expectation is that lama might get
> stuck because you didn't tell it a pattern to map, and I doubt that code
> path has seen much testing.
>> 
>> 
>> On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote:
>> 
>>> 
>>> 
>>> Hi Ralph, I'm glad to hear that, thanks.
>>> 
>>> By the way, yesterday I tried to check how lama in 1.7.4rc treat numa
> node.
>>> 
>>> Then, even wiht this simple command line, it freezed without any
> massage:
>>> 
>>> mpirun -np 2 -host node05 -mca rmaps lama myprog
>>> 
>>> Could you check what happened?
>>> 
>>> Is it better to open new thread or continue this thread?
>>> 
>>> Regards,
>>> Tetsuya Mishima
>>> 
>>> 
 I'll make it work so that NUMA can be either above or below socket
 
 On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:
 
> 
> 
> Hi Brice,
> 
> Thank you for your comment. I understand what you mean.
> 
> My opinion was made just considering easy way to adjust the code for
> inversion of hierarchy in object tree.
> 
> Tetsuya Mishima
> 
> 
>> I don't think there's any such difference.
>> Also, all these NUMA architectures are reported the same by hwloc,
> and
>> therefore used the same in Open MPI.
>> 
>> And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours
>>> (and
>> most recent AMD and Intel platforms).
>> 
>> Brice
>> 
>> 
>> 
>> Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
>>> 
>>> Hi Ralph,
>>> 
>>> The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
>>> coherent)NUMA,
>>> which seems to be a little bit different from the traditional numa
> defined
>>> in openmpi.
>>> 
>>> I notice that ccNUMA object is almost same as L3cache object.
>>> So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want
>>> to
> do.
>>> Therefore, "do not touch it" is one of the solution, I think ...
>>> 
>>> Anyway, mixing up these two types of numa is the problem.
>>> 
>>> Regards,
>>> Tetsuya Mishima
>>> 
 I can wait it'll be fixed in 1.7.5 or later, because putting
>>> "-bind-to
 numa"
 and "-map-by numa" at the same time works as a workaround.
 
 Thanks,
 Tetsuya Mishima
 
> Yeah, it will impact everything that uses hwloc topology maps, I
> fear.
> 
> One side note: you'll need to add --hetero-nodes to your cmd
> line.
>>> If
>>> we
 don't see that, we assume that all the node topologies are
> identical
>>> -
 which clearly isn't true here.
> I'll try to resolve the hier inversion over the holiday - won't
> be
> for
 1.7.4, but hopefully for 1.7.5
> Thanks
> Ralph
> 
> On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
> 
>> 
>> I think it's normal for AMD opteron having 8/16 cores such as
>> magny cours or interlagos. Because it usually has 2 numa nodes
>> in a cpu(socket), numa-node can not include a socket. This type
>> of hierarchy would be natural.
>> 
>> (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
>> 
>> By the way, I think this inversion should affect rmaps_lama
>>> mapping.
>> 
>> Tetsuya Mishima
>> 
>>> Ick - yeah, that would be a problem. I haven't seen that type
> of
>> hierarchical inversion before - is node03 a different type of
>>> chip?
>>> Might take awhile for me to adjust the code to handle hier
>> inversion... :-(
>>> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima


Hi Ralph,

Thank you very much. I tried many things such as:

mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca
rmaps_lama_bind 1c myprog

But every try failed. At least they were accepted by openmpi-1.7.3 as far
as I remember.
Anyway, please check it when you have a time, because using lama comes from
my curiosity.

Regards,
Tetsuya Mishima


> I'll try to take a look at it - my expectation is that lama might get
stuck because you didn't tell it a pattern to map, and I doubt that code
path has seen much testing.
>
>
> On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Hi Ralph, I'm glad to hear that, thanks.
> >
> > By the way, yesterday I tried to check how lama in 1.7.4rc treat numa
node.
> >
> > Then, even wiht this simple command line, it freezed without any
massage:
> >
> >  mpirun -np 2 -host node05 -mca rmaps lama myprog
> >
> > Could you check what happened?
> >
> > Is it better to open new thread or continue this thread?
> >
> > Regards,
> > Tetsuya Mishima
> >
> >
> >> I'll make it work so that NUMA can be either above or below socket
> >>
> >> On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >>>
> >>>
> >>> Hi Brice,
> >>>
> >>> Thank you for your comment. I understand what you mean.
> >>>
> >>> My opinion was made just considering easy way to adjust the code for
> >>> inversion of hierarchy in object tree.
> >>>
> >>> Tetsuya Mishima
> >>>
> >>>
>  I don't think there's any such difference.
>  Also, all these NUMA architectures are reported the same by hwloc,
and
>  therefore used the same in Open MPI.
> 
>  And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours
> > (and
>  most recent AMD and Intel platforms).
> 
>  Brice
> 
> 
> 
>  Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
> >
> > Hi Ralph,
> >
> > The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
> > coherent)NUMA,
> > which seems to be a little bit different from the traditional numa
> >>> defined
> > in openmpi.
> >
> > I notice that ccNUMA object is almost same as L3cache object.
> > So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want
> > to
> >>> do.
> > Therefore, "do not touch it" is one of the solution, I think ...
> >
> > Anyway, mixing up these two types of numa is the problem.
> >
> > Regards,
> > Tetsuya Mishima
> >
> >> I can wait it'll be fixed in 1.7.5 or later, because putting
> > "-bind-to
> >> numa"
> >> and "-map-by numa" at the same time works as a workaround.
> >>
> >> Thanks,
> >> Tetsuya Mishima
> >>
> >>> Yeah, it will impact everything that uses hwloc topology maps, I
> >>> fear.
> >>>
> >>> One side note: you'll need to add --hetero-nodes to your cmd
line.
> > If
> > we
> >> don't see that, we assume that all the node topologies are
identical
> > -
> >> which clearly isn't true here.
> >>> I'll try to resolve the hier inversion over the holiday - won't
be
> >>> for
> >> 1.7.4, but hopefully for 1.7.5
> >>> Thanks
> >>> Ralph
> >>>
> >>> On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
> >>>
> 
>  I think it's normal for AMD opteron having 8/16 cores such as
>  magny cours or interlagos. Because it usually has 2 numa nodes
>  in a cpu(socket), numa-node can not include a socket. This type
>  of hierarchy would be natural.
> 
>  (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
> 
>  By the way, I think this inversion should affect rmaps_lama
> > mapping.
> 
>  Tetsuya Mishima
> 
> > Ick - yeah, that would be a problem. I haven't seen that type
of
>  hierarchical inversion before - is node03 a different type of
> > chip?
> > Might take awhile for me to adjust the code to handle hier
>  inversion... :-(
> > On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
> >
> >>
> >> Hi Ralph,
> >>
> >> I found the reason. I attached the main part of output with 32
> >> core node(node03) and 8 core node(node05) at the bottom.
> >>
> >> From this information, socket of node03 includes numa-node.
> >> On the other hand, numa-node of node05 includes socket.
> >> The direction of object tree is opposite.
> >>
> >> Since "-map-by socket" may be assumed as default,
> >> for node05, "-bind-to numa and -map-by socket" means
> >> upward search. For node03, this should be downward.
> >>
> >> I guess that openmpi-1.7.4rc1 will always assume numa-node
> >> includes socket. Is it right? Then, upward search is assumed
> >> in orte_rmaps_base_compute_bindings even for node03 when I
> >> put "-bind-to numa and 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread Ralph Castain
I'll try to take a look at it - my expectation is that lama might get stuck 
because you didn't tell it a pattern to map, and I doubt that code path has 
seen much testing.


On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Hi Ralph, I'm glad to hear that, thanks.
> 
> By the way, yesterday I tried to check how lama in 1.7.4rc treat numa node.
> 
> Then, even wiht this simple command line, it freezed without any massage:
> 
>  mpirun -np 2 -host node05 -mca rmaps lama myprog
> 
> Could you check what happened?
> 
> Is it better to open new thread or continue this thread?
> 
> Regards,
> Tetsuya Mishima
> 
> 
>> I'll make it work so that NUMA can be either above or below socket
>> 
>> On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:
>> 
>>> 
>>> 
>>> Hi Brice,
>>> 
>>> Thank you for your comment. I understand what you mean.
>>> 
>>> My opinion was made just considering easy way to adjust the code for
>>> inversion of hierarchy in object tree.
>>> 
>>> Tetsuya Mishima
>>> 
>>> 
 I don't think there's any such difference.
 Also, all these NUMA architectures are reported the same by hwloc, and
 therefore used the same in Open MPI.
 
 And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours
> (and
 most recent AMD and Intel platforms).
 
 Brice
 
 
 
 Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
> 
> Hi Ralph,
> 
> The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
> coherent)NUMA,
> which seems to be a little bit different from the traditional numa
>>> defined
> in openmpi.
> 
> I notice that ccNUMA object is almost same as L3cache object.
> So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want
> to
>>> do.
> Therefore, "do not touch it" is one of the solution, I think ...
> 
> Anyway, mixing up these two types of numa is the problem.
> 
> Regards,
> Tetsuya Mishima
> 
>> I can wait it'll be fixed in 1.7.5 or later, because putting
> "-bind-to
>> numa"
>> and "-map-by numa" at the same time works as a workaround.
>> 
>> Thanks,
>> Tetsuya Mishima
>> 
>>> Yeah, it will impact everything that uses hwloc topology maps, I
>>> fear.
>>> 
>>> One side note: you'll need to add --hetero-nodes to your cmd line.
> If
> we
>> don't see that, we assume that all the node topologies are identical
> -
>> which clearly isn't true here.
>>> I'll try to resolve the hier inversion over the holiday - won't be
>>> for
>> 1.7.4, but hopefully for 1.7.5
>>> Thanks
>>> Ralph
>>> 
>>> On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
>>> 
 
 I think it's normal for AMD opteron having 8/16 cores such as
 magny cours or interlagos. Because it usually has 2 numa nodes
 in a cpu(socket), numa-node can not include a socket. This type
 of hierarchy would be natural.
 
 (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
 
 By the way, I think this inversion should affect rmaps_lama
> mapping.
 
 Tetsuya Mishima
 
> Ick - yeah, that would be a problem. I haven't seen that type of
 hierarchical inversion before - is node03 a different type of
> chip?
> Might take awhile for me to adjust the code to handle hier
 inversion... :-(
> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
> 
>> 
>> Hi Ralph,
>> 
>> I found the reason. I attached the main part of output with 32
>> core node(node03) and 8 core node(node05) at the bottom.
>> 
>> From this information, socket of node03 includes numa-node.
>> On the other hand, numa-node of node05 includes socket.
>> The direction of object tree is opposite.
>> 
>> Since "-map-by socket" may be assumed as default,
>> for node05, "-bind-to numa and -map-by socket" means
>> upward search. For node03, this should be downward.
>> 
>> I guess that openmpi-1.7.4rc1 will always assume numa-node
>> includes socket. Is it right? Then, upward search is assumed
>> in orte_rmaps_base_compute_bindings even for node03 when I
>> put "-bind-to numa and -map-by socket" option.
>> 
>> [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
>> [node03.cluster:15508] mca:rmaps: compute bindings for job
> [38286,1]
 with
>> policy NUMA
>> [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1]
> with
>> bindings NUMA
>> [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode
> type
>> Machine
>> 
>> That's the reason of this trouble. Therefore, adding "-map-by
>>> core"
 works.
>> (mapping 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima


Hi Ralph, I'm glad to hear that, thanks.

By the way, yesterday I tried to check how lama in 1.7.4rc treat numa node.

Then, even wiht this simple command line, it freezed without any massage:

  mpirun -np 2 -host node05 -mca rmaps lama myprog

Could you check what happened?

Is it better to open new thread or continue this thread?

Regards,
Tetsuya Mishima


> I'll make it work so that NUMA can be either above or below socket
>
> On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Hi Brice,
> >
> > Thank you for your comment. I understand what you mean.
> >
> > My opinion was made just considering easy way to adjust the code for
> > inversion of hierarchy in object tree.
> >
> > Tetsuya Mishima
> >
> >
> >> I don't think there's any such difference.
> >> Also, all these NUMA architectures are reported the same by hwloc, and
> >> therefore used the same in Open MPI.
> >>
> >> And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours
(and
> >> most recent AMD and Intel platforms).
> >>
> >> Brice
> >>
> >>
> >>
> >> Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
> >>>
> >>> Hi Ralph,
> >>>
> >>> The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
> >>> coherent)NUMA,
> >>> which seems to be a little bit different from the traditional numa
> > defined
> >>> in openmpi.
> >>>
> >>> I notice that ccNUMA object is almost same as L3cache object.
> >>> So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want
to
> > do.
> >>> Therefore, "do not touch it" is one of the solution, I think ...
> >>>
> >>> Anyway, mixing up these two types of numa is the problem.
> >>>
> >>> Regards,
> >>> Tetsuya Mishima
> >>>
>  I can wait it'll be fixed in 1.7.5 or later, because putting
"-bind-to
>  numa"
>  and "-map-by numa" at the same time works as a workaround.
> 
>  Thanks,
>  Tetsuya Mishima
> 
> > Yeah, it will impact everything that uses hwloc topology maps, I
> > fear.
> >
> > One side note: you'll need to add --hetero-nodes to your cmd line.
If
> >>> we
>  don't see that, we assume that all the node topologies are identical
-
>  which clearly isn't true here.
> > I'll try to resolve the hier inversion over the holiday - won't be
> > for
>  1.7.4, but hopefully for 1.7.5
> > Thanks
> > Ralph
> >
> > On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
> >
> >>
> >> I think it's normal for AMD opteron having 8/16 cores such as
> >> magny cours or interlagos. Because it usually has 2 numa nodes
> >> in a cpu(socket), numa-node can not include a socket. This type
> >> of hierarchy would be natural.
> >>
> >> (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
> >>
> >> By the way, I think this inversion should affect rmaps_lama
mapping.
> >>
> >> Tetsuya Mishima
> >>
> >>> Ick - yeah, that would be a problem. I haven't seen that type of
> >> hierarchical inversion before - is node03 a different type of
chip?
> >>> Might take awhile for me to adjust the code to handle hier
> >> inversion... :-(
> >>> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
> >>>
> 
>  Hi Ralph,
> 
>  I found the reason. I attached the main part of output with 32
>  core node(node03) and 8 core node(node05) at the bottom.
> 
>  From this information, socket of node03 includes numa-node.
>  On the other hand, numa-node of node05 includes socket.
>  The direction of object tree is opposite.
> 
>  Since "-map-by socket" may be assumed as default,
>  for node05, "-bind-to numa and -map-by socket" means
>  upward search. For node03, this should be downward.
> 
>  I guess that openmpi-1.7.4rc1 will always assume numa-node
>  includes socket. Is it right? Then, upward search is assumed
>  in orte_rmaps_base_compute_bindings even for node03 when I
>  put "-bind-to numa and -map-by socket" option.
> 
>  [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
>  [node03.cluster:15508] mca:rmaps: compute bindings for job
> >>> [38286,1]
> >> with
>  policy NUMA
>  [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1]
> >>> with
>  bindings NUMA
>  [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode
> >>> type
>  Machine
> 
>  That's the reason of this trouble. Therefore, adding "-map-by
> > core"
> >> works.
>  (mapping pattern seems to be strange ...)
> 
>  [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
>  -report-bindings myprog
>  [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> >>> type
> >> Cache
>  [node03.cluster:15885] [[38679,0],0] bind:upward target 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread Ralph Castain
I'll make it work so that NUMA can be either above or below socket

On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Hi Brice,
> 
> Thank you for your comment. I understand what you mean.
> 
> My opinion was made just considering easy way to adjust the code for
> inversion of hierarchy in object tree.
> 
> Tetsuya Mishima
> 
> 
>> I don't think there's any such difference.
>> Also, all these NUMA architectures are reported the same by hwloc, and
>> therefore used the same in Open MPI.
>> 
>> And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours (and
>> most recent AMD and Intel platforms).
>> 
>> Brice
>> 
>> 
>> 
>> Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
>>> 
>>> Hi Ralph,
>>> 
>>> The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
>>> coherent)NUMA,
>>> which seems to be a little bit different from the traditional numa
> defined
>>> in openmpi.
>>> 
>>> I notice that ccNUMA object is almost same as L3cache object.
>>> So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want to
> do.
>>> Therefore, "do not touch it" is one of the solution, I think ...
>>> 
>>> Anyway, mixing up these two types of numa is the problem.
>>> 
>>> Regards,
>>> Tetsuya Mishima
>>> 
 I can wait it'll be fixed in 1.7.5 or later, because putting "-bind-to
 numa"
 and "-map-by numa" at the same time works as a workaround.
 
 Thanks,
 Tetsuya Mishima
 
> Yeah, it will impact everything that uses hwloc topology maps, I
> fear.
> 
> One side note: you'll need to add --hetero-nodes to your cmd line. If
>>> we
 don't see that, we assume that all the node topologies are identical -
 which clearly isn't true here.
> I'll try to resolve the hier inversion over the holiday - won't be
> for
 1.7.4, but hopefully for 1.7.5
> Thanks
> Ralph
> 
> On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
> 
>> 
>> I think it's normal for AMD opteron having 8/16 cores such as
>> magny cours or interlagos. Because it usually has 2 numa nodes
>> in a cpu(socket), numa-node can not include a socket. This type
>> of hierarchy would be natural.
>> 
>> (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
>> 
>> By the way, I think this inversion should affect rmaps_lama mapping.
>> 
>> Tetsuya Mishima
>> 
>>> Ick - yeah, that would be a problem. I haven't seen that type of
>> hierarchical inversion before - is node03 a different type of chip?
>>> Might take awhile for me to adjust the code to handle hier
>> inversion... :-(
>>> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
>>> 
 
 Hi Ralph,
 
 I found the reason. I attached the main part of output with 32
 core node(node03) and 8 core node(node05) at the bottom.
 
 From this information, socket of node03 includes numa-node.
 On the other hand, numa-node of node05 includes socket.
 The direction of object tree is opposite.
 
 Since "-map-by socket" may be assumed as default,
 for node05, "-bind-to numa and -map-by socket" means
 upward search. For node03, this should be downward.
 
 I guess that openmpi-1.7.4rc1 will always assume numa-node
 includes socket. Is it right? Then, upward search is assumed
 in orte_rmaps_base_compute_bindings even for node03 when I
 put "-bind-to numa and -map-by socket" option.
 
 [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
 [node03.cluster:15508] mca:rmaps: compute bindings for job
>>> [38286,1]
>> with
 policy NUMA
 [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1]
>>> with
 bindings NUMA
 [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode
>>> type
 Machine
 
 That's the reason of this trouble. Therefore, adding "-map-by
> core"
>> works.
 (mapping pattern seems to be strange ...)
 
 [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
 -report-bindings myprog
 [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
>>> type
>> Cache
 [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
>>> type
>> Cache
 [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
>>> type
>> Cache
 [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
>>> type
 NUMANode
 [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
>>> type
>> Cache
 [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
>>> type
>> Cache
 [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
>>> type
>> Cache
 [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima


Hi Brice,

Thank you for your comment. I understand what you mean.

My opinion was made just considering easy way to adjust the code for
inversion of hierarchy in object tree.

Tetsuya Mishima


> I don't think there's any such difference.
> Also, all these NUMA architectures are reported the same by hwloc, and
> therefore used the same in Open MPI.
>
> And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours (and
> most recent AMD and Intel platforms).
>
> Brice
>
>
>
> Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
> >
> > Hi Ralph,
> >
> > The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
> > coherent)NUMA,
> > which seems to be a little bit different from the traditional numa
defined
> > in openmpi.
> >
> > I notice that ccNUMA object is almost same as L3cache object.
> > So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want to
do.
> > Therefore, "do not touch it" is one of the solution, I think ...
> >
> > Anyway, mixing up these two types of numa is the problem.
> >
> > Regards,
> > Tetsuya Mishima
> >
> >> I can wait it'll be fixed in 1.7.5 or later, because putting "-bind-to
> >> numa"
> >> and "-map-by numa" at the same time works as a workaround.
> >>
> >> Thanks,
> >> Tetsuya Mishima
> >>
> >>> Yeah, it will impact everything that uses hwloc topology maps, I
fear.
> >>>
> >>> One side note: you'll need to add --hetero-nodes to your cmd line. If
> > we
> >> don't see that, we assume that all the node topologies are identical -
> >> which clearly isn't true here.
> >>> I'll try to resolve the hier inversion over the holiday - won't be
for
> >> 1.7.4, but hopefully for 1.7.5
> >>> Thanks
> >>> Ralph
> >>>
> >>> On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
> >>>
> 
>  I think it's normal for AMD opteron having 8/16 cores such as
>  magny cours or interlagos. Because it usually has 2 numa nodes
>  in a cpu(socket), numa-node can not include a socket. This type
>  of hierarchy would be natural.
> 
>  (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
> 
>  By the way, I think this inversion should affect rmaps_lama mapping.
> 
>  Tetsuya Mishima
> 
> > Ick - yeah, that would be a problem. I haven't seen that type of
>  hierarchical inversion before - is node03 a different type of chip?
> > Might take awhile for me to adjust the code to handle hier
>  inversion... :-(
> > On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
> >
> >>
> >> Hi Ralph,
> >>
> >> I found the reason. I attached the main part of output with 32
> >> core node(node03) and 8 core node(node05) at the bottom.
> >>
> >> From this information, socket of node03 includes numa-node.
> >> On the other hand, numa-node of node05 includes socket.
> >> The direction of object tree is opposite.
> >>
> >> Since "-map-by socket" may be assumed as default,
> >> for node05, "-bind-to numa and -map-by socket" means
> >> upward search. For node03, this should be downward.
> >>
> >> I guess that openmpi-1.7.4rc1 will always assume numa-node
> >> includes socket. Is it right? Then, upward search is assumed
> >> in orte_rmaps_base_compute_bindings even for node03 when I
> >> put "-bind-to numa and -map-by socket" option.
> >>
> >> [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
> >> [node03.cluster:15508] mca:rmaps: compute bindings for job
> > [38286,1]
>  with
> >> policy NUMA
> >> [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1]
> > with
> >> bindings NUMA
> >> [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode
> > type
> >> Machine
> >>
> >> That's the reason of this trouble. Therefore, adding "-map-by
core"
>  works.
> >> (mapping pattern seems to be strange ...)
> >>
> >> [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
> >> -report-bindings myprog
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
>  Cache
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
>  Cache
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
>  Cache
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
> >> NUMANode
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
>  Cache
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
>  Cache
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
>  Cache
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
> >> NUMANode
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
>  Cache
> >> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> > type
> 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread Brice Goglin
I don't think there's any such difference.
Also, all these NUMA architectures are reported the same by hwloc, and
therefore used the same in Open MPI.

And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours (and
most recent AMD and Intel platforms).

Brice



Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit :
>
> Hi Ralph,
>
> The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
> coherent)NUMA,
> which seems to be a little bit different from the traditional numa defined
> in openmpi.
>
> I notice that ccNUMA object is almost same as L3cache object.
> So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want to do.
> Therefore, "do not touch it" is one of the solution, I think ...
>
> Anyway, mixing up these two types of numa is the problem.
>
> Regards,
> Tetsuya Mishima
>
>> I can wait it'll be fixed in 1.7.5 or later, because putting "-bind-to
>> numa"
>> and "-map-by numa" at the same time works as a workaround.
>>
>> Thanks,
>> Tetsuya Mishima
>>
>>> Yeah, it will impact everything that uses hwloc topology maps, I fear.
>>>
>>> One side note: you'll need to add --hetero-nodes to your cmd line. If
> we
>> don't see that, we assume that all the node topologies are identical -
>> which clearly isn't true here.
>>> I'll try to resolve the hier inversion over the holiday - won't be for
>> 1.7.4, but hopefully for 1.7.5
>>> Thanks
>>> Ralph
>>>
>>> On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
>>>

 I think it's normal for AMD opteron having 8/16 cores such as
 magny cours or interlagos. Because it usually has 2 numa nodes
 in a cpu(socket), numa-node can not include a socket. This type
 of hierarchy would be natural.

 (node03 is Dell PowerEdge R815 and maybe quite common, I guess)

 By the way, I think this inversion should affect rmaps_lama mapping.

 Tetsuya Mishima

> Ick - yeah, that would be a problem. I haven't seen that type of
 hierarchical inversion before - is node03 a different type of chip?
> Might take awhile for me to adjust the code to handle hier
 inversion... :-(
> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
>
>>
>> Hi Ralph,
>>
>> I found the reason. I attached the main part of output with 32
>> core node(node03) and 8 core node(node05) at the bottom.
>>
>> From this information, socket of node03 includes numa-node.
>> On the other hand, numa-node of node05 includes socket.
>> The direction of object tree is opposite.
>>
>> Since "-map-by socket" may be assumed as default,
>> for node05, "-bind-to numa and -map-by socket" means
>> upward search. For node03, this should be downward.
>>
>> I guess that openmpi-1.7.4rc1 will always assume numa-node
>> includes socket. Is it right? Then, upward search is assumed
>> in orte_rmaps_base_compute_bindings even for node03 when I
>> put "-bind-to numa and -map-by socket" option.
>>
>> [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
>> [node03.cluster:15508] mca:rmaps: compute bindings for job
> [38286,1]
 with
>> policy NUMA
>> [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1]
> with
>> bindings NUMA
>> [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode
> type
>> Machine
>>
>> That's the reason of this trouble. Therefore, adding "-map-by core"
 works.
>> (mapping pattern seems to be strange ...)
>>
>> [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
>> -report-bindings myprog
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
>> NUMANode
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
>> NUMANode
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
>> NUMANode
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 Cache
>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
> type
 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima


Hi Ralph,

The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
coherent)NUMA,
which seems to be a little bit different from the traditional numa defined
in openmpi.

I notice that ccNUMA object is almost same as L3cache object.
So "-bind-to l3cache" or "-map-by l3cache" is valid for what I want to do.
Therefore, "do not touch it" is one of the solution, I think ...

Anyway, mixing up these two types of numa is the problem.

Regards,
Tetsuya Mishima

> I can wait it'll be fixed in 1.7.5 or later, because putting "-bind-to
> numa"
> and "-map-by numa" at the same time works as a workaround.
>
> Thanks,
> Tetsuya Mishima
>
> > Yeah, it will impact everything that uses hwloc topology maps, I fear.
> >
> > One side note: you'll need to add --hetero-nodes to your cmd line. If
we
> don't see that, we assume that all the node topologies are identical -
> which clearly isn't true here.
> >
> > I'll try to resolve the hier inversion over the holiday - won't be for
> 1.7.4, but hopefully for 1.7.5
> >
> > Thanks
> > Ralph
> >
> > On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
> >
> > >
> > >
> > > I think it's normal for AMD opteron having 8/16 cores such as
> > > magny cours or interlagos. Because it usually has 2 numa nodes
> > > in a cpu(socket), numa-node can not include a socket. This type
> > > of hierarchy would be natural.
> > >
> > > (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
> > >
> > > By the way, I think this inversion should affect rmaps_lama mapping.
> > >
> > > Tetsuya Mishima
> > >
> > >> Ick - yeah, that would be a problem. I haven't seen that type of
> > > hierarchical inversion before - is node03 a different type of chip?
> > >>
> > >> Might take awhile for me to adjust the code to handle hier
> > > inversion... :-(
> > >>
> > >> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
> > >>
> > >>>
> > >>>
> > >>> Hi Ralph,
> > >>>
> > >>> I found the reason. I attached the main part of output with 32
> > >>> core node(node03) and 8 core node(node05) at the bottom.
> > >>>
> > >>> From this information, socket of node03 includes numa-node.
> > >>> On the other hand, numa-node of node05 includes socket.
> > >>> The direction of object tree is opposite.
> > >>>
> > >>> Since "-map-by socket" may be assumed as default,
> > >>> for node05, "-bind-to numa and -map-by socket" means
> > >>> upward search. For node03, this should be downward.
> > >>>
> > >>> I guess that openmpi-1.7.4rc1 will always assume numa-node
> > >>> includes socket. Is it right? Then, upward search is assumed
> > >>> in orte_rmaps_base_compute_bindings even for node03 when I
> > >>> put "-bind-to numa and -map-by socket" option.
> > >>>
> > >>> [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
> > >>> [node03.cluster:15508] mca:rmaps: compute bindings for job
[38286,1]
> > > with
> > >>> policy NUMA
> > >>> [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1]
with
> > >>> bindings NUMA
> > >>> [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode
type
> > >>> Machine
> > >>>
> > >>> That's the reason of this trouble. Therefore, adding "-map-by core"
> > > works.
> > >>> (mapping pattern seems to be strange ...)
> > >>>
> > >>> [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
> > >>> -report-bindings myprog
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > >>> NUMANode
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > >>> NUMANode
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > >>> NUMANode
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > >>> NUMANode
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode
type
> > > Cache
> > >>> 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima


I can wait it'll be fixed in 1.7.5 or later, because putting "-bind-to
numa"
and "-map-by numa" at the same time works as a workaround.

Thanks,
Tetsuya Mishima

> Yeah, it will impact everything that uses hwloc topology maps, I fear.
>
> One side note: you'll need to add --hetero-nodes to your cmd line. If we
don't see that, we assume that all the node topologies are identical -
which clearly isn't true here.
>
> I'll try to resolve the hier inversion over the holiday - won't be for
1.7.4, but hopefully for 1.7.5
>
> Thanks
> Ralph
>
> On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > I think it's normal for AMD opteron having 8/16 cores such as
> > magny cours or interlagos. Because it usually has 2 numa nodes
> > in a cpu(socket), numa-node can not include a socket. This type
> > of hierarchy would be natural.
> >
> > (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
> >
> > By the way, I think this inversion should affect rmaps_lama mapping.
> >
> > Tetsuya Mishima
> >
> >> Ick - yeah, that would be a problem. I haven't seen that type of
> > hierarchical inversion before - is node03 a different type of chip?
> >>
> >> Might take awhile for me to adjust the code to handle hier
> > inversion... :-(
> >>
> >> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >>>
> >>>
> >>> Hi Ralph,
> >>>
> >>> I found the reason. I attached the main part of output with 32
> >>> core node(node03) and 8 core node(node05) at the bottom.
> >>>
> >>> From this information, socket of node03 includes numa-node.
> >>> On the other hand, numa-node of node05 includes socket.
> >>> The direction of object tree is opposite.
> >>>
> >>> Since "-map-by socket" may be assumed as default,
> >>> for node05, "-bind-to numa and -map-by socket" means
> >>> upward search. For node03, this should be downward.
> >>>
> >>> I guess that openmpi-1.7.4rc1 will always assume numa-node
> >>> includes socket. Is it right? Then, upward search is assumed
> >>> in orte_rmaps_base_compute_bindings even for node03 when I
> >>> put "-bind-to numa and -map-by socket" option.
> >>>
> >>> [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
> >>> [node03.cluster:15508] mca:rmaps: compute bindings for job [38286,1]
> > with
> >>> policy NUMA
> >>> [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1] with
> >>> bindings NUMA
> >>> [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode type
> >>> Machine
> >>>
> >>> That's the reason of this trouble. Therefore, adding "-map-by core"
> > works.
> >>> (mapping pattern seems to be strange ...)
> >>>
> >>> [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
> >>> -report-bindings myprog
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> >>> NUMANode
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> >>> NUMANode
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> >>> NUMANode
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> >>> NUMANode
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> >>> NUMANode
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> >>> NUMANode
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> >>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > Cache
> 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread Ralph Castain
Yeah, it will impact everything that uses hwloc topology maps, I fear.

One side note: you'll need to add --hetero-nodes to your cmd line. If we don't 
see that, we assume that all the node topologies are identical - which clearly 
isn't true here.

I'll try to resolve the hier inversion over the holiday - won't be for 1.7.4, 
but hopefully for 1.7.5

Thanks
Ralph

On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> I think it's normal for AMD opteron having 8/16 cores such as
> magny cours or interlagos. Because it usually has 2 numa nodes
> in a cpu(socket), numa-node can not include a socket. This type
> of hierarchy would be natural.
> 
> (node03 is Dell PowerEdge R815 and maybe quite common, I guess)
> 
> By the way, I think this inversion should affect rmaps_lama mapping.
> 
> Tetsuya Mishima
> 
>> Ick - yeah, that would be a problem. I haven't seen that type of
> hierarchical inversion before - is node03 a different type of chip?
>> 
>> Might take awhile for me to adjust the code to handle hier
> inversion... :-(
>> 
>> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
>> 
>>> 
>>> 
>>> Hi Ralph,
>>> 
>>> I found the reason. I attached the main part of output with 32
>>> core node(node03) and 8 core node(node05) at the bottom.
>>> 
>>> From this information, socket of node03 includes numa-node.
>>> On the other hand, numa-node of node05 includes socket.
>>> The direction of object tree is opposite.
>>> 
>>> Since "-map-by socket" may be assumed as default,
>>> for node05, "-bind-to numa and -map-by socket" means
>>> upward search. For node03, this should be downward.
>>> 
>>> I guess that openmpi-1.7.4rc1 will always assume numa-node
>>> includes socket. Is it right? Then, upward search is assumed
>>> in orte_rmaps_base_compute_bindings even for node03 when I
>>> put "-bind-to numa and -map-by socket" option.
>>> 
>>> [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
>>> [node03.cluster:15508] mca:rmaps: compute bindings for job [38286,1]
> with
>>> policy NUMA
>>> [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1] with
>>> bindings NUMA
>>> [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode type
>>> Machine
>>> 
>>> That's the reason of this trouble. Therefore, adding "-map-by core"
> works.
>>> (mapping pattern seems to be strange ...)
>>> 
>>> [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
>>> -report-bindings myprog
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
>>> NUMANode
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> Cache
>>> [node03.cluster:15885] [[38679,0],0] 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima


I think it's normal for AMD opteron having 8/16 cores such as
magny cours or interlagos. Because it usually has 2 numa nodes
in a cpu(socket), numa-node can not include a socket. This type
of hierarchy would be natural.

(node03 is Dell PowerEdge R815 and maybe quite common, I guess)

By the way, I think this inversion should affect rmaps_lama mapping.

Tetsuya Mishima

> Ick - yeah, that would be a problem. I haven't seen that type of
hierarchical inversion before - is node03 a different type of chip?
>
> Might take awhile for me to adjust the code to handle hier
inversion... :-(
>
> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Hi Ralph,
> >
> > I found the reason. I attached the main part of output with 32
> > core node(node03) and 8 core node(node05) at the bottom.
> >
> > From this information, socket of node03 includes numa-node.
> > On the other hand, numa-node of node05 includes socket.
> > The direction of object tree is opposite.
> >
> > Since "-map-by socket" may be assumed as default,
> > for node05, "-bind-to numa and -map-by socket" means
> > upward search. For node03, this should be downward.
> >
> > I guess that openmpi-1.7.4rc1 will always assume numa-node
> > includes socket. Is it right? Then, upward search is assumed
> > in orte_rmaps_base_compute_bindings even for node03 when I
> > put "-bind-to numa and -map-by socket" option.
> >
> > [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
> > [node03.cluster:15508] mca:rmaps: compute bindings for job [38286,1]
with
> > policy NUMA
> > [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1] with
> > bindings NUMA
> > [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode type
> > Machine
> >
> > That's the reason of this trouble. Therefore, adding "-map-by core"
works.
> > (mapping pattern seems to be strange ...)
> >
> > [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
> > -report-bindings myprog
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > NUMANode
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > NUMANode
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > NUMANode
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > NUMANode
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > NUMANode
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > NUMANode
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > NUMANode
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
Cache
> > [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> > NUMANode
> > [node03.cluster:15885] MCW rank 2 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> > cket 0[core 3[hwt 0]]:
> > [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
> > [node03.cluster:15885] MCW rank 3 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> > cket 0[core 3[hwt 0]]:
> > 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread Ralph Castain
Ick - yeah, that would be a problem. I haven't seen that type of hierarchical 
inversion before - is node03 a different type of chip?

Might take awhile for me to adjust the code to handle hier inversion... :-(

On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Hi Ralph,
> 
> I found the reason. I attached the main part of output with 32
> core node(node03) and 8 core node(node05) at the bottom.
> 
> From this information, socket of node03 includes numa-node.
> On the other hand, numa-node of node05 includes socket.
> The direction of object tree is opposite.
> 
> Since "-map-by socket" may be assumed as default,
> for node05, "-bind-to numa and -map-by socket" means
> upward search. For node03, this should be downward.
> 
> I guess that openmpi-1.7.4rc1 will always assume numa-node
> includes socket. Is it right? Then, upward search is assumed
> in orte_rmaps_base_compute_bindings even for node03 when I
> put "-bind-to numa and -map-by socket" option.
> 
> [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
> [node03.cluster:15508] mca:rmaps: compute bindings for job [38286,1] with
> policy NUMA
> [node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1] with
> bindings NUMA
> [node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode type
> Machine
> 
> That's the reason of this trouble. Therefore, adding "-map-by core" works.
> (mapping pattern seems to be strange ...)
> 
> [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
> -report-bindings myprog
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> NUMANode
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> NUMANode
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> NUMANode
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> NUMANode
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> NUMANode
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> NUMANode
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> NUMANode
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
> [node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
> NUMANode
> [node03.cluster:15885] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> cket 0[core 3[hwt 0]]:
> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
> [node03.cluster:15885] MCW rank 3 bound to socket 0[core 0[hwt 0]], socket
> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> cket 0[core 3[hwt 0]]:
> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
> [node03.cluster:15885] MCW rank 4 bound to socket 0[core 4[hwt 0]], socket
> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
> cket 0[core 7[hwt 0]]:
> [././././B/B/B/B][./././././././.][./././././././.][./././././././.]
> [node03.cluster:15885] MCW rank 5 bound to socket 0[core 4[hwt 0]], socket
> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
> cket 0[core 7[hwt 0]]:
> [././././B/B/B/B][./././././././.][./././././././.][./././././././.]
> [node03.cluster:15885] MCW rank 6 bound 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima


Hi Ralph,

I found the reason. I attached the main part of output with 32
core node(node03) and 8 core node(node05) at the bottom.

>From this information, socket of node03 includes numa-node.
On the other hand, numa-node of node05 includes socket.
The direction of object tree is opposite.

Since "-map-by socket" may be assumed as default,
for node05, "-bind-to numa and -map-by socket" means
upward search. For node03, this should be downward.

I guess that openmpi-1.7.4rc1 will always assume numa-node
includes socket. Is it right? Then, upward search is assumed
in orte_rmaps_base_compute_bindings even for node03 when I
put "-bind-to numa and -map-by socket" option.

[node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage
[node03.cluster:15508] mca:rmaps: compute bindings for job [38286,1] with
policy NUMA
[node03.cluster:15508] mca:rmaps: bind upwards for job [38286,1] with
bindings NUMA
[node03.cluster:15508] [[38286,0],0] bind:upward target NUMANode type
Machine

That's the reason of this trouble. Therefore, adding "-map-by core" works.
(mapping pattern seems to be strange ...)

[mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by core
-report-bindings myprog
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
NUMANode
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type Cache
[node03.cluster:15885] [[38679,0],0] bind:upward target NUMANode type
NUMANode
[node03.cluster:15885] MCW rank 2 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]:
[B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 3 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]:
[B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 4 bound to socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:
[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 5 bound to socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:
[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 6 bound to socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:
[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 7 bound to socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:
[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:15885] MCW rank 0 bound to socket 0[core 0[hwt 0]], 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-18 Thread tmishima


Hi, here is the output with "-mca rmaps_base_verbose 10
-mca ess_base_verbose 5". Please see the attached file.

(See attached file: output.txt)

Regards,
Tetsuya Mishima

> Hmm...try adding "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5" to
your cmd line and let's see what it thinks it found.
>
>
> On Dec 18, 2013, at 6:55 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Hi, I report one more problem with openmpi-1.7.4rc1,
> > which is more serious.
> >
> > For our 32 core nodes(AMD magny cours based) which has
> > 8 numa-nodes, "-bind-to numa" does not work. Without
> > this option, it works. For your infomation, at the
> > bottom of this mail, I added the lstopo information
> > of the node.
> >
> > Regards,
> > Tetsuya Mishima
> >
> > [mishima@manage ~]$ qsub -I -l nodes=1:ppn=32
> > qsub: waiting for job 8352.manage.cluster to start
> > qsub: job 8352.manage.cluster ready
> >
> > [mishima@node03 demos]$ mpirun -np 8 -report-bindings -bind-to numa
myprog
> > [node03.cluster:15316] [[37582,0],0] bind:upward target NUMANode type
> > Machine
> >
--
> > A request was made to bind to NUMA, but an appropriate target could not
> > be found on node node03.
> >
--
> > [mishima@node03 ~]$ cd ~/Desktop/openmpi-1.7/demos/
> > [mishima@node03 demos]$ mpirun -np 8 -report-bindings myprog
> > [node03.cluster:15282] MCW rank 2 bound to socket 1[core 8[hwt 0]]:
> > [./././././././.][B/././././././.][./././././././.][
> > ./././././././.]
> > [node03.cluster:15282] MCW rank 3 bound to socket 1[core 9[hwt 0]]:
> > [./././././././.][./B/./././././.][./././././././.][
> > ./././././././.]
> > [node03.cluster:15282] MCW rank 4 bound to socket 2[core 16[hwt 0]]:
> > [./././././././.][./././././././.][B/././././././.]
> > [./././././././.]
> > [node03.cluster:15282] MCW rank 5 bound to socket 2[core 17[hwt 0]]:
> > [./././././././.][./././././././.][./B/./././././.]
> > [./././././././.]
> > [node03.cluster:15282] MCW rank 6 bound to socket 3[core 24[hwt 0]]:
> > [./././././././.][./././././././.][./././././././.]
> > [B/././././././.]
> > [node03.cluster:15282] MCW rank 7 bound to socket 3[core 25[hwt 0]]:
> > [./././././././.][./././././././.][./././././././.]
> > [./B/./././././.]
> > [node03.cluster:15282] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> > [B/././././././.][./././././././.][./././././././.][
> > ./././././././.]
> > [node03.cluster:15282] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> > [./B/./././././.][./././././././.][./././././././.][
> > ./././././././.]
> > Hello world from process 2 of 8
> > Hello world from process 5 of 8
> > Hello world from process 4 of 8
> > Hello world from process 3 of 8
> > Hello world from process 1 of 8
> > Hello world from process 7 of 8
> > Hello world from process 6 of 8
> > Hello world from process 0 of 8
> > [mishima@node03 demos]$ ~/opt/hwloc/bin/lstopo-no-graphics
> > Machine (126GB)
> >  Socket L#0 (32GB)
> >NUMANode L#0 (P#0 16GB) + L3 L#0 (5118KB)
> >  L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU
L#0
> > (P#0)
> >  L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU
L#1
> > (P#1)
> >  L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU
L#2
> > (P#2)
> >  L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU
L#3
> > (P#3)
> >NUMANode L#1 (P#1 16GB) + L3 L#1 (5118KB)
> >  L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU
L#4
> > (P#4)
> >  L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU
L#5
> > (P#5)
> >  L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU
L#6
> > (P#6)
> >  L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU
L#7
> > (P#7)
> >  Socket L#1 (32GB)
> >NUMANode L#2 (P#6 16GB) + L3 L#2 (5118KB)
> >  L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU
L#8
> > (P#8)
> >  L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU
L#9
> > (P#9)
> >  L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 +
PU
> > L#10 (P#10)
> >  L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 +
PU
> > L#11 (P#11)
> >NUMANode L#3 (P#7 16GB) + L3 L#3 (5118KB)
> >  L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 +
PU
> > L#12 (P#12)
> >  L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 +
PU
> > L#13 (P#13)
> >  L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 +
PU
> > L#14 (P#14)
> >  L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 +
PU
> > L#15 (P#15)
> >  Socket L#2 (32GB)
> >NUMANode L#4 (P#4 16GB) + L3 L#4 (5118KB)
> >  L2 L#16 (512KB) + L1d L#16 (64KB) + L1i L#16 (64KB) + Core L#16 +
PU
> > L#16 (P#16)
> >  L2 L#17 (512KB) + L1d L#17 (64KB) + L1i L#17 (64KB) + Core L#17 +
PU
> > L#17 (P#17)
> >  L2 L#18 (512KB) 

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-18 Thread Ralph Castain
Hmm...try adding "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5" to your 
cmd line and let's see what it thinks it found.


On Dec 18, 2013, at 6:55 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Hi, I report one more problem with openmpi-1.7.4rc1,
> which is more serious.
> 
> For our 32 core nodes(AMD magny cours based) which has
> 8 numa-nodes, "-bind-to numa" does not work. Without
> this option, it works. For your infomation, at the
> bottom of this mail, I added the lstopo information
> of the node.
> 
> Regards,
> Tetsuya Mishima
> 
> [mishima@manage ~]$ qsub -I -l nodes=1:ppn=32
> qsub: waiting for job 8352.manage.cluster to start
> qsub: job 8352.manage.cluster ready
> 
> [mishima@node03 demos]$ mpirun -np 8 -report-bindings -bind-to numa myprog
> [node03.cluster:15316] [[37582,0],0] bind:upward target NUMANode type
> Machine
> --
> A request was made to bind to NUMA, but an appropriate target could not
> be found on node node03.
> --
> [mishima@node03 ~]$ cd ~/Desktop/openmpi-1.7/demos/
> [mishima@node03 demos]$ mpirun -np 8 -report-bindings myprog
> [node03.cluster:15282] MCW rank 2 bound to socket 1[core 8[hwt 0]]:
> [./././././././.][B/././././././.][./././././././.][
> ./././././././.]
> [node03.cluster:15282] MCW rank 3 bound to socket 1[core 9[hwt 0]]:
> [./././././././.][./B/./././././.][./././././././.][
> ./././././././.]
> [node03.cluster:15282] MCW rank 4 bound to socket 2[core 16[hwt 0]]:
> [./././././././.][./././././././.][B/././././././.]
> [./././././././.]
> [node03.cluster:15282] MCW rank 5 bound to socket 2[core 17[hwt 0]]:
> [./././././././.][./././././././.][./B/./././././.]
> [./././././././.]
> [node03.cluster:15282] MCW rank 6 bound to socket 3[core 24[hwt 0]]:
> [./././././././.][./././././././.][./././././././.]
> [B/././././././.]
> [node03.cluster:15282] MCW rank 7 bound to socket 3[core 25[hwt 0]]:
> [./././././././.][./././././././.][./././././././.]
> [./B/./././././.]
> [node03.cluster:15282] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> [B/././././././.][./././././././.][./././././././.][
> ./././././././.]
> [node03.cluster:15282] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> [./B/./././././.][./././././././.][./././././././.][
> ./././././././.]
> Hello world from process 2 of 8
> Hello world from process 5 of 8
> Hello world from process 4 of 8
> Hello world from process 3 of 8
> Hello world from process 1 of 8
> Hello world from process 7 of 8
> Hello world from process 6 of 8
> Hello world from process 0 of 8
> [mishima@node03 demos]$ ~/opt/hwloc/bin/lstopo-no-graphics
> Machine (126GB)
>  Socket L#0 (32GB)
>NUMANode L#0 (P#0 16GB) + L3 L#0 (5118KB)
>  L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
> (P#0)
>  L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
> (P#1)
>  L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
> (P#2)
>  L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
> (P#3)
>NUMANode L#1 (P#1 16GB) + L3 L#1 (5118KB)
>  L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4
> (P#4)
>  L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5
> (P#5)
>  L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6
> (P#6)
>  L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7
> (P#7)
>  Socket L#1 (32GB)
>NUMANode L#2 (P#6 16GB) + L3 L#2 (5118KB)
>  L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU L#8
> (P#8)
>  L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU L#9
> (P#9)
>  L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 + PU
> L#10 (P#10)
>  L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 + PU
> L#11 (P#11)
>NUMANode L#3 (P#7 16GB) + L3 L#3 (5118KB)
>  L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 + PU
> L#12 (P#12)
>  L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 + PU
> L#13 (P#13)
>  L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 + PU
> L#14 (P#14)
>  L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 + PU
> L#15 (P#15)
>  Socket L#2 (32GB)
>NUMANode L#4 (P#4 16GB) + L3 L#4 (5118KB)
>  L2 L#16 (512KB) + L1d L#16 (64KB) + L1i L#16 (64KB) + Core L#16 + PU
> L#16 (P#16)
>  L2 L#17 (512KB) + L1d L#17 (64KB) + L1i L#17 (64KB) + Core L#17 + PU
> L#17 (P#17)
>  L2 L#18 (512KB) + L1d L#18 (64KB) + L1i L#18 (64KB) + Core L#18 + PU
> L#18 (P#18)
>  L2 L#19 (512KB) + L1d L#19 (64KB) + L1i L#19 (64KB) + Core L#19 + PU
> L#19 (P#19)
>NUMANode L#5 (P#5 16GB) + L3 L#5 (5118KB)
>  L2 L#20 (512KB) + L1d L#20 (64KB) + L1i L#20 (64KB) + Core L#20 + PU
> L#20 (P#20)
>  L2 L#21 (512KB) + L1d L#21 (64KB) + L1i L#21 (64KB) + Core L#21 + PU
> L#21 (P#21)
>  

[OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-18 Thread tmishima


Hi, I report one more problem with openmpi-1.7.4rc1,
which is more serious.

For our 32 core nodes(AMD magny cours based) which has
8 numa-nodes, "-bind-to numa" does not work. Without
this option, it works. For your infomation, at the
bottom of this mail, I added the lstopo information
of the node.

Regards,
Tetsuya Mishima

[mishima@manage ~]$ qsub -I -l nodes=1:ppn=32
qsub: waiting for job 8352.manage.cluster to start
qsub: job 8352.manage.cluster ready

[mishima@node03 demos]$ mpirun -np 8 -report-bindings -bind-to numa myprog
[node03.cluster:15316] [[37582,0],0] bind:upward target NUMANode type
Machine
--
A request was made to bind to NUMA, but an appropriate target could not
be found on node node03.
--
[mishima@node03 ~]$ cd ~/Desktop/openmpi-1.7/demos/
[mishima@node03 demos]$ mpirun -np 8 -report-bindings myprog
[node03.cluster:15282] MCW rank 2 bound to socket 1[core 8[hwt 0]]:
[./././././././.][B/././././././.][./././././././.][
./././././././.]
[node03.cluster:15282] MCW rank 3 bound to socket 1[core 9[hwt 0]]:
[./././././././.][./B/./././././.][./././././././.][
./././././././.]
[node03.cluster:15282] MCW rank 4 bound to socket 2[core 16[hwt 0]]:
[./././././././.][./././././././.][B/././././././.]
[./././././././.]
[node03.cluster:15282] MCW rank 5 bound to socket 2[core 17[hwt 0]]:
[./././././././.][./././././././.][./B/./././././.]
[./././././././.]
[node03.cluster:15282] MCW rank 6 bound to socket 3[core 24[hwt 0]]:
[./././././././.][./././././././.][./././././././.]
[B/././././././.]
[node03.cluster:15282] MCW rank 7 bound to socket 3[core 25[hwt 0]]:
[./././././././.][./././././././.][./././././././.]
[./B/./././././.]
[node03.cluster:15282] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././././.][./././././././.][./././././././.][
./././././././.]
[node03.cluster:15282] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
[./B/./././././.][./././././././.][./././././././.][
./././././././.]
Hello world from process 2 of 8
Hello world from process 5 of 8
Hello world from process 4 of 8
Hello world from process 3 of 8
Hello world from process 1 of 8
Hello world from process 7 of 8
Hello world from process 6 of 8
Hello world from process 0 of 8
[mishima@node03 demos]$ ~/opt/hwloc/bin/lstopo-no-graphics
Machine (126GB)
  Socket L#0 (32GB)
NUMANode L#0 (P#0 16GB) + L3 L#0 (5118KB)
  L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
(P#0)
  L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
(P#1)
  L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
(P#2)
  L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
(P#3)
NUMANode L#1 (P#1 16GB) + L3 L#1 (5118KB)
  L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4
(P#4)
  L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5
(P#5)
  L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6
(P#6)
  L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7
(P#7)
  Socket L#1 (32GB)
NUMANode L#2 (P#6 16GB) + L3 L#2 (5118KB)
  L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU L#8
(P#8)
  L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU L#9
(P#9)
  L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 + PU
L#10 (P#10)
  L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 + PU
L#11 (P#11)
NUMANode L#3 (P#7 16GB) + L3 L#3 (5118KB)
  L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 + PU
L#12 (P#12)
  L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 + PU
L#13 (P#13)
  L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 + PU
L#14 (P#14)
  L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 + PU
L#15 (P#15)
  Socket L#2 (32GB)
NUMANode L#4 (P#4 16GB) + L3 L#4 (5118KB)
  L2 L#16 (512KB) + L1d L#16 (64KB) + L1i L#16 (64KB) + Core L#16 + PU
L#16 (P#16)
  L2 L#17 (512KB) + L1d L#17 (64KB) + L1i L#17 (64KB) + Core L#17 + PU
L#17 (P#17)
  L2 L#18 (512KB) + L1d L#18 (64KB) + L1i L#18 (64KB) + Core L#18 + PU
L#18 (P#18)
  L2 L#19 (512KB) + L1d L#19 (64KB) + L1i L#19 (64KB) + Core L#19 + PU
L#19 (P#19)
NUMANode L#5 (P#5 16GB) + L3 L#5 (5118KB)
  L2 L#20 (512KB) + L1d L#20 (64KB) + L1i L#20 (64KB) + Core L#20 + PU
L#20 (P#20)
  L2 L#21 (512KB) + L1d L#21 (64KB) + L1i L#21 (64KB) + Core L#21 + PU
L#21 (P#21)
  L2 L#22 (512KB) + L1d L#22 (64KB) + L1i L#22 (64KB) + Core L#22 + PU
L#22 (P#22)
  L2 L#23 (512KB) + L1d L#23 (64KB) + L1i L#23 (64KB) + Core L#23 + PU
L#23 (P#23)
  Socket L#3 (32GB)
NUMANode L#6 (P#2 16GB) + L3 L#6 (5118KB)
  L2 L#24 (512KB) + L1d L#24 (64KB) + L1i L#24 (64KB) + Core L#24 + PU
L#24 (P#24)
  L2 L#25 (512KB) + L1d L#25 (64KB) + L1i L#25 (64KB) + Core