Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread Ralph Castain
Added this info to the ticket, and added you to it as well. Thanks again Ralph On Dec 25, 2013, at 3:42 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > Thank you for your reply. After that, I found more reasonable fix, > I guess. I moved OBJ_CONSTRUCT for opal_tree_item_t out of

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread tmishima
Hi Ralph, Thank you for your reply. After that, I found more reasonable fix, I guess. I moved OBJ_CONSTRUCT for opal_tree_item_t out of debug part in opal_tree_construct as shown below: static void opal_tree_construct(opal_tree_t *tree) { OBJ_CONSTRUCT( &(tree->opal_tree_sentinel),

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread Ralph Castain
Deeply appreciate all you help! Your fix looks reasonable to me and is the kind of difference we frequently see between compilers and environments, which is why initializing variables is so important. This one apparently slipped by the lama developers. I'll apply to trunk and cmr it across to

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-25 Thread tmishima
Hi Ralph, I did valgrind and found uninitialised value errors. All of them occured in opal_tree_add_child as shown at the bottom. As a quick fix, I puted one line in "opal_tree.c", although it's not elegant: void opal_tree_init(opal_tree_t *tree, opal_tree_comp_fn_t comp,

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-23 Thread tmishima
Hi Ralph, Here is the output when I put "-mca rmaps_base_verbose 10 --display-map" and where it stopped(by gdb), which shows it stopped in a function of lama. I usually use PGI 13.10, so I tried to change it to gnu compiler. Then, it works. Therefore, this problem depends on compiler. That's

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-22 Thread Ralph Castain
On Dec 21, 2013, at 8:16 PM, tmish...@jcity.maeda.co.jp wrote: > > > Ralph, thanks. I'll try it on Tuseday. > > Let me confirm one thing. I don't put "-with-libevent" when I build > openmpi. > Is there any possibility to build with external libevent automatically? No - only happens if you

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread tmishima
Ralph, thanks. I'll try it on Tuseday. Let me confirm one thing. I don't put "-with-libevent" when I build openmpi. Is there any possibility to build with external libevent automatically? Tetsuya Mishima > Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to your cmd line

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread Ralph Castain
Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to your cmd line and let's see if it finishes the mapping. Unless you specifically built with an external libevent (which I doubt), there is no conflict. The connection issue is unlikely to be a factor here as it works when not

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread tmishima
Thank you, Ralph. Then, this problem should depend on our environment. But, at least, inversion problem is not the cause because node05 has normal hier order. I can not connect to our cluster now. Tuesday, going back to my office, I'll send you further report. Before that, please let me know

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-21 Thread Ralph Castain
It seems to be working fine for me: [rhc@bend001 tcp]$ mpirun -np 2 -host bend001 -report-bindings -mca rmaps_lama_bind 1c -mca rmaps lama hostname bend001 [bend001:17005] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../..][../../../../../..] [bend001:17005] MCW rank 0 bound to

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima
Hi Ralph, Thank you very much. I tried many things such as: mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca rmaps_lama_bind 1c myprog But every try failed. At least they were accepted by openmpi-1.7.3 as far as I remember. Anyway, please check it when you have a time, because

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread Ralph Castain
I'll try to take a look at it - my expectation is that lama might get stuck because you didn't tell it a pattern to map, and I doubt that code path has seen much testing. On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, I'm glad to hear that, thanks. > > By

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima
Hi Ralph, I'm glad to hear that, thanks. By the way, yesterday I tried to check how lama in 1.7.4rc treat numa node. Then, even wiht this simple command line, it freezed without any massage: mpirun -np 2 -host node05 -mca rmaps lama myprog Could you check what happened? Is it better to

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread Ralph Castain
I'll make it work so that NUMA can be either above or below socket On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Brice, > > Thank you for your comment. I understand what you mean. > > My opinion was made just considering easy way to adjust the code for > inversion

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima
Hi Brice, Thank you for your comment. I understand what you mean. My opinion was made just considering easy way to adjust the code for inversion of hierarchy in object tree. Tetsuya Mishima > I don't think there's any such difference. > Also, all these NUMA architectures are reported the

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread Brice Goglin
I don't think there's any such difference. Also, all these NUMA architectures are reported the same by hwloc, and therefore used the same in Open MPI. And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours (and most recent AMD and Intel platforms). Brice Le 20/12/2013 11:33,

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-20 Thread tmishima
Hi Ralph, The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache coherent)NUMA, which seems to be a little bit different from the traditional numa defined in openmpi. I notice that ccNUMA object is almost same as L3cache object. So "-bind-to l3cache" or "-map-by l3cache" is valid

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima
I can wait it'll be fixed in 1.7.5 or later, because putting "-bind-to numa" and "-map-by numa" at the same time works as a workaround. Thanks, Tetsuya Mishima > Yeah, it will impact everything that uses hwloc topology maps, I fear. > > One side note: you'll need to add --hetero-nodes to your

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread Ralph Castain
Yeah, it will impact everything that uses hwloc topology maps, I fear. One side note: you'll need to add --hetero-nodes to your cmd line. If we don't see that, we assume that all the node topologies are identical - which clearly isn't true here. I'll try to resolve the hier inversion over the

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima
I think it's normal for AMD opteron having 8/16 cores such as magny cours or interlagos. Because it usually has 2 numa nodes in a cpu(socket), numa-node can not include a socket. This type of hierarchy would be natural. (node03 is Dell PowerEdge R815 and maybe quite common, I guess) By the

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread Ralph Castain
Ick - yeah, that would be a problem. I haven't seen that type of hierarchical inversion before - is node03 a different type of chip? Might take awhile for me to adjust the code to handle hier inversion... :-( On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > >

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima
Hi Ralph, I found the reason. I attached the main part of output with 32 core node(node03) and 8 core node(node05) at the bottom. >From this information, socket of node03 includes numa-node. On the other hand, numa-node of node05 includes socket. The direction of object tree is opposite.

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-18 Thread tmishima
Hi, here is the output with "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5". Please see the attached file. (See attached file: output.txt) Regards, Tetsuya Mishima > Hmm...try adding "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5" to your cmd line and let's see what it thinks it

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-18 Thread Ralph Castain
Hmm...try adding "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5" to your cmd line and let's see what it thinks it found. On Dec 18, 2013, at 6:55 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi, I report one more problem with openmpi-1.7.4rc1, > which is more serious. > > For our 32

[OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-18 Thread tmishima
Hi, I report one more problem with openmpi-1.7.4rc1, which is more serious. For our 32 core nodes(AMD magny cours based) which has 8 numa-nodes, "-bind-to numa" does not work. Without this option, it works. For your infomation, at the bottom of this mail, I added the lstopo information of the