Added this info to the ticket, and added you to it as well. Thanks again Ralph
On Dec 25, 2013, at 3:42 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > Thank you for your reply. After that, I found more reasonable fix, > I guess. I moved OBJ_CONSTRUCT for opal_tree_item_t out of debug > part in opal_tree_construct as shown below: > > static void opal_tree_construct(opal_tree_t *tree) > { > OBJ_CONSTRUCT( &(tree->opal_tree_sentinel), opal_tree_item_t ); /* > tmishima */ > #if OPAL_ENABLE_DEBUG > /* These refcounts should never be used in assertions because they > should never be removed from this list, added to another list, > etc. So set them to sentinel values. */ > > tree->opal_tree_sentinel.opal_tree_item_refcount = 1; > tree->opal_tree_sentinel.opal_tree_item_belong_to = tree; > #endif > tree->opal_tree_sentinel.opal_tree_container = tree; > tree->opal_tree_sentinel.opal_tree_parent = &tree->opal_tree_sentinel; > tree->opal_tree_sentinel.opal_tree_num_ancestors = -1; > > tree->opal_tree_sentinel.opal_tree_next_sibling = > &tree->opal_tree_sentinel; > tree->opal_tree_sentinel.opal_tree_prev_sibling = > &tree->opal_tree_sentinel; > > tree->opal_tree_sentinel.opal_tree_first_child = &tree-> > opal_tree_sentinel; > tree->opal_tree_sentinel.opal_tree_last_child = &tree-> > opal_tree_sentinel; > > tree->opal_tree_num_items = 0; > tree->comp = NULL; > tree->serialize = NULL; > tree->deserialize = NULL; > tree->get_key = NULL; > } > > In addtion, I checked how lama worked for the hierarchy inversion. > Then, it did not work on node04 which has the inversion and worked on > node09 which has normal one. Please foward this information to lama > developers. > > Regerds, > Tetsuya Mishima > > qsub: job 8380.manage.cluster completed > [mishima@manage openmpi-1.7.4rc2r30069]$ qsub -I -l nodes=4:ppn=8 > qsub: waiting for job 8381.manage.cluster to start > qsub: job 8381.manage.cluster ready > > [mishima@node09 ~]$ cd ~/Desktop/openmpi-1.7/demos/ > [mishima@node09 demos]$ mpirun -np 2 -report-bindings -mca rmaps lama -mca > rmaps_lama_bind 1N -mca rmaps_lama_map Ncsbnh > myprog > [node09.cluster:20144] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so > cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] > [node09.cluster:20144] MCW rank 1 bound to socket 1[core 4[hwt 0]], socket > 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so > cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] > Hello world from process 1 of 2 > Hello world from process 0 of 2 > [mishima@node09 demos]$ > > > qsub: job 8383.manage.cluster completed > [mishima@manage openmpi-1.7.4rc2r30069]$ qsub -I -l nodes=1:ppn=32 > qsub: waiting for job 8384.manage.cluster to start > qsub: job 8384.manage.cluster ready > > [mishima@node04 ~]$ cd ~/Desktop/openmpi-1.7/demos/ > [mishima@node04 demos]$ mpirun -np 2 -report-bindings -mca rmaps lama -mca > rmaps_lama_bind 1N -mca rmaps_lama_map Ncsbnh > myprog > -------------------------------------------------------------------------- > RMaps LAMA detected that there are not enough resources to map the > remainder of the job. Check the command line options, and the number of > nodes allocated to this job. > Application Context : 0 > # of Processes Successfully Mapped: 0 > # of Processes Requested : 2 > Mapping : Ncsbnh > Binding : 1N > MPPR : [Not Provided] > Ordering : s > -------------------------------------------------------------------------- > [node04.cluster:20298] [[21003,0],0] ORTE_ERROR_LOG: Error in file > rmaps_lama_module.c at line 309 > > [node04.cluster:20298] [[21003,0],0] ORTE_ERROR_LOG: Error in file > base/rmaps_base_map_job.c at line 217 > >> Deeply appreciate all you help! Your fix looks reasonable to me and is > the kind of difference we frequently see between compilers and > environments, which is why initializing variables is so >> important. This one apparently slipped by the lama developers. >> >> I'll apply to trunk and cmr it across to 1.7.4. >> >> Thanks again >> Ralph >> >> On Dec 25, 2013, at 3:39 AM, tmish...@jcity.maeda.co.jp wrote: >> >>> >>> >>> Hi Ralph, >>> >>> I did valgrind and found uninitialised value errors. All of them >>> occured in opal_tree_add_child as shown at the bottom. As a quick >>> fix, I puted one line in "opal_tree.c", although it's not elegant: >>> >>> void opal_tree_init(opal_tree_t *tree, opal_tree_comp_fn_t comp, >>> opal_tree_item_serialize_fn_t serialize, >>> opal_tree_item_deserialize_fn_t deserialize, >>> opal_tree_get_key_fn_t get_key) >>> { >>> tree->comp = comp; >>> tree->serialize = serialize; >>> tree->deserialize = deserialize; >>> tree->get_key = get_key; >>> opal_tree_get_root(tree)->opal_tree_num_children = 0 ; /* added by >>> tmishima */ >>> } >>> >>> Then, these errors all disappeared and openmpi with lama worked fine. >>> As I told you before, I built openmpi with PGI 13.10. As far as I >>> checked, no error was detected by valgrind with openmpi built by >>> GNU compiler. Therefore, it might depend on compiler... >>> Anyway, I would like to ask you (or openmpi team) to continue >>> further investigation. >>> >>> Regards, >>> Tetsuya Mishima >>> >>> valgrind -v --error-limit=no --leak-check=yes --show-reachable=no > mpirun >>> -np 1 -mca rmaps lama -report-bindings -mca rmaps_base_verbose 100 >>> --display-map ~/Desktop/openmpi-1.7/demos/myprog 2>&1 | tee > valgrind.log >>> >>> .... >>> ==27313== Conditional jump or move depends on uninitialised value(s) >>> ==27313== at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191) >>> ==27313== by 0x81E3314: rmaps_lama_convert_hwloc_subtree >>> (rmaps_lama_max_tree.c:320) >>> ==27313== by 0x81E321D: rmaps_lama_convert_hwloc_tree_to_opal_tree >>> (rmaps_lama_max_tree.c:267) >>> ==27313== by 0x81E2EE8: rmaps_lama_build_max_tree >>> (rmaps_lama_max_tree.c:154) >>> ==27313== by 0x81E0E58: orte_rmaps_lama_map_core >>> (rmaps_lama_module.c:664) >>> ==27313== by 0x81E02D7: orte_rmaps_lama_map > (rmaps_lama_module.c:303) >>> ==27313== by 0x4C6468B: orte_rmaps_base_map_job >>> (rmaps_base_map_job.c:204) >>> ==27313== by 0x4F094CC: event_process_active_single_queue > (event.c:1366) >>> ==27313== by 0x4F090D8: event_process_active (event.c:1434) >>> ==27313== by 0x4F050FF: opal_libevent2021_event_base_loop > (event.c:1645) >>> ==27313== by 0x4079A6: orterun (orterun.c:1049) >>> ==27313== by 0x40694A: main (main.c:13) >>> ..... >>> ==27313== Conditional jump or move depends on uninitialised value(s) >>> ==27313== at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191) >>> ==27313== by 0x4EC5D0E: deserialize_add_tree_item (opal_tree.c:496) >>> ==27313== by 0x4EC5578: opal_tree_deserialize (opal_tree.c:524) >>> ==27313== by 0x4EC5609: opal_tree_dup (opal_tree.c:544) >>> ==27313== by 0x81E2FF6: rmaps_lama_build_max_tree >>> (rmaps_lama_max_tree.c:202) >>> ==27313== by 0x81E0E58: orte_rmaps_lama_map_core >>> (rmaps_lama_module.c:664) >>> ==27313== by 0x81E02D7: orte_rmaps_lama_map > (rmaps_lama_module.c:303) >>> ==27313== by 0x4C6468B: orte_rmaps_base_map_job >>> (rmaps_base_map_job.c:204) >>> ==27313== by 0x4F094CC: event_process_active_single_queue > (event.c:1366) >>> ==27313== by 0x4F090D8: event_process_active (event.c:1434) >>> ==27313== by 0x4F050FF: opal_libevent2021_event_base_loop > (event.c:1645) >>> ==27313== by 0x4079A6: orterun (orterun.c:1049) >>> .... >>> ==27313== Conditional jump or move depends on uninitialised value(s) >>> ==27313== at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191) >>> ==27313== by 0x4EC5D0E: deserialize_add_tree_item (opal_tree.c:496) >>> ==27313== by 0x4EC5578: opal_tree_deserialize (opal_tree.c:524) >>> ==27313== by 0x4EC5609: opal_tree_dup (opal_tree.c:544) >>> ==27313== by 0x81E2FF6: ??? >>> ==27313== by 0x81E0E58: ??? >>> ==27313== by 0x81E02D7: ??? >>> ==27313== by 0x4C6468B: orte_rmaps_base_map_job >>> (rmaps_base_map_job.c:204) >>> ==27313== by 0x4F094CC: event_process_active_single_queue > (event.c:1366) >>> ==27313== by 0x4F090D8: event_process_active (event.c:1434) >>> ==27313== by 0x4F050FF: opal_libevent2021_event_base_loop > (event.c:1645) >>> ==27313== by 0x4079A6: orterun (orterun.c:1049) >>> ..... >>> ==27313== Conditional jump or move depends on uninitialised value(s) >>> ==27313== at 0x4EC52A4: opal_tree_add_child (opal_tree.c:191) >>> ==27313== by 0x81E3314: ??? >>> ==27313== by 0x81E321D: ??? >>> ==27313== by 0x81E2EE8: ??? >>> ==27313== by 0x81E0E58: ??? >>> ==27313== by 0x81E02D7: ??? >>> ==27313== by 0x4C6468B: orte_rmaps_base_map_job >>> (rmaps_base_map_job.c:204) >>> ==27313== by 0x4F094CC: event_process_active_single_queue > (event.c:1366) >>> ==27313== by 0x4F090D8: event_process_active (event.c:1434) >>> ==27313== by 0x4F050FF: opal_libevent2021_event_base_loop > (event.c:1645) >>> ==27313== by 0x4079A6: orterun (orterun.c:1049) >>> ==27313== by 0x40694A: main (main.c:13) >>> >>> >>> >>>> Hi Ralph, >>>> >>>> Here is the output when I put "-mca rmaps_base_verbose 10 > --display-map" >>>> and where it stopped(by gdb), which shows it stopped in a function of >>> lama. >>>> >>>> I usually use PGI 13.10, so I tried to change it to gnu compiler. >>>> Then, it works. Therefore, this problem depends on compiler. >>>> >>>> That's all what I could find today. >>>> >>>> Regards, >>>> Tetsuya Mishima >>>> >>>> [mishima@manage ~]$ gdb >>>> GNU gdb (GDB) CentOS (7.0.1-42.el5.centos.1) >>>> .... >>>> (gdb) attach 14666 >>>> .... >>>> 0x00002aaaab4c5c33 in rmaps_lama_prune_max_tree () >>>> at ./rmaps_lama_max_tree.c:814 >>>> >>>> [mishima@manage demos]$ mpirun -np 2 -mca rmaps lama -report-bindings >>> -mca >>>> rmaps_base_verbose 10 --display-map myprog >>>> [manage.cluster:21503] mca: base: components_register: registering > rmaps >>>> components >>>> [manage.cluster:21503] mca: base: components_register: found loaded >>>> component lama >>>> [manage.cluster:21503] mca:rmaps:lama: Priority 0 >>>> [manage.cluster:21503] mca:rmaps:lama: Map : NULL >>>> [manage.cluster:21503] mca:rmaps:lama: Bind : NULL >>>> [manage.cluster:21503] mca:rmaps:lama: MPPR : NULL >>>> [manage.cluster:21503] mca:rmaps:lama: Order : NULL >>>> [manage.cluster:21503] mca: base: components_register: component lama >>>> register function successful >>>> [manage.cluster:21503] mca: base: components_open: opening rmaps >>> components >>>> [manage.cluster:21503] mca: base: components_open: found loaded > component >>>> lama >>>> [manage.cluster:21503] mca:rmaps:select: checking available component >>> lama >>>> [manage.cluster:21503] mca:rmaps:select: Querying component [lama] >>>> [manage.cluster:21503] [[23940,0],0]: Final mapper priorities >>>> [manage.cluster:21503] Mapper: lama Priority: 0 >>>> [manage.cluster:21503] mca:rmaps: mapping job [23940,1] >>>> [manage.cluster:21503] mca:rmaps: creating new map for job [23940,1] >>>> [manage.cluster:21503] mca:rmaps: nprocs 2 >>>> [manage.cluster:21503] mca:rmaps:lama: Mapping job [23940,1] >>>> [manage.cluster:21503] mca:rmaps:lama: Revised Parameters ----- >>>> [manage.cluster:21503] mca:rmaps:lama: Map : csbnh >>>> [manage.cluster:21503] mca:rmaps:lama: Bind : 1c >>>> [manage.cluster:21503] mca:rmaps:lama: MPPR : (null) >>>> [manage.cluster:21503] mca:rmaps:lama: Order : s >>>> [manage.cluster:21503] mca:rmaps:lama: > --------------------------------- >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Binding : [1c] >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Binding : 1 x > Core >>>> [manage.cluster:21503] mca:rmaps:lama: > --------------------------------- >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Mapping : [csbnh] >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Mapping : (0) Core > (7 >>>> vs 0) >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Mapping : (1) Socket > (3 >>>> vs 1) >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Mapping : (2) Board > (1 >>>> vs 3) >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Mapping : (3) Machine > (0 >>>> vs 7) >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Mapping : (4) Hw. Thread > (8 >>>> vs 8) >>>> [manage.cluster:21503] mca:rmaps:lama: > --------------------------------- >>>> [manage.cluster:21503] mca:rmaps:lama: ----- MPPR : [(null)] >>>> [manage.cluster:21503] mca:rmaps:lama: > --------------------------------- >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Ordering : [s] >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Ordering : Sequential >>>> [manage.cluster:21503] mca:rmaps:lama: > --------------------------------- >>>> [manage.cluster:21503] AVAILABLE NODES FOR MAPPING: >>>> [manage.cluster:21503] node: manage daemon: 0 >>>> [manage.cluster:21503] mca:rmaps:lama: > --------------------------------- >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Building the Max Tree... >>>> [manage.cluster:21503] mca:rmaps:lama: > --------------------------------- >>>> [manage.cluster:21503] mca:rmaps:lama: ----- Converting Remote Tree: >>> manage >>>> >>>> [mishima@manage demos]$ ompi_info | grep "C compiler family" >>>> C compiler family name: GNU >>>> [mishima@manage demos]$ mpirun -np 2 -mca rmaps lama myprog >>>> Hello world from process 0 of 2 >>>> Hello world from process 1 of 2 >>>> >>>> >>>> >>>>> On Dec 21, 2013, at 8:16 PM, tmish...@jcity.maeda.co.jp wrote: >>>>> >>>>>> >>>>>> >>>>>> Ralph, thanks. I'll try it on Tuseday. >>>>>> >>>>>> Let me confirm one thing. I don't put "-with-libevent" when I build >>>>>> openmpi. >>>>>> Is there any possibility to build with external libevent >>> automatically? >>>>> >>>>> No - only happens if you direct it >>>>> >>>>> >>>>>> >>>>>> Tetsuya Mishima >>>>>> >>>>>> >>>>>>> Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" >>> to >>>>>> your cmd line and let's see if it finishes the mapping. >>>>>>> >>>>>>> Unless you specifically built with an external libevent (which I >>>> doubt), >>>>>> there is no conflict. The connection issue is unlikely to be a > factor >>>> here >>>>>> as it works when not using the lama mapper. >>>>>>> >>>>>>> >>>>>>> On Dec 21, 2013, at 3:43 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thank you, Ralph. >>>>>>>> >>>>>>>> Then, this problem should depend on our environment. >>>>>>>> But, at least, inversion problem is not the cause because >>>>>>>> node05 has normal hier order. >>>>>>>> >>>>>>>> I can not connect to our cluster now. Tuesday, going >>>>>>>> back to my office, I'll send you further report. >>>>>>>> >>>>>>>> Before that, please let me know your configuration. I will >>>>>>>> follow your configuation as much as possible. Our configuraion >>>>>>>> is very simple, only -with-tm -with-ibverbs -disable-ipv6. >>>>>>>> (on CentOS 5.8) >>>>>>>> >>>>>>>> The 1.7 series is a llite bit unstable on our cluster yet. >>>>>>>> >>>>>>>> Similar freezing(hang up) was observed with 1.7.3. At that >>>>>>>> time, lama worked well but putting "-rank-by something" caused >>>>>>>> same freezing (curiously, rank-by works with 1.7.4rc1). >>>>>>>> I checked where it stopped using gdb, then I found that it >>>>>>>> stopped to wait for event in a function of libevent(I can not >>>>>>>> recall the name). >>>>>>>> >>>>>>>> Is this related to your "connection issue in the OOB >>>>>>>> subsystem"? Or libevent version conflict? I guess these two >>>>>>>> problems are related each other. They stopped at very early >>>>>>>> stage before reaching mapping function because no message >>>>>>>> appeared before freezing, which is my random guess. >>>>>>>> >>>>>>>> Could you give me any hint or comment? >>>>>>>> >>>>>>>> Regards, >>>>>>>> Tetsuya Mishima >>>>>>>> >>>>>>>> >>>>>>>>> It seems to be working fine for me: >>>>>>>>> >>>>>>>>> [rhc@bend001 tcp]$ mpirun -np 2 -host bend001 -report-bindings >>> -mca >>>>>>>> rmaps_lama_bind 1c -mca rmaps lama hostname >>>>>>>>> bend001 >>>>>>>>> [bend001:17005] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: >>>>>>>> [../BB/../../../..][../../../../../..] >>>>>>>>> [bend001:17005] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: >>>>>>>> [BB/../../../../..][../../../../../..] >>>>>>>>> bend001 >>>>>>>>> [rhc@bend001 tcp]$ >>>>>>>>> >>>>>>>>> (I also checked the internals using "-mca rmaps_base_verbose 10") >>> so >>>>>> it >>>>>>>> could be your hier inversion causing problems again. Or it could > be >>>>>> that >>>>>>>> you are hitting a connection issue we are seeing in >>>>>>>>> some scenarios in the OOB subsystem - though if you are able to >>> run >>>>>> using >>>>>>>> a non-lama mapper, that would seem unlikely. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Dec 20, 2013, at 8:09 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Ralph, >>>>>>>>> >>>>>>>>> Thank you very much. I tried many things such as: >>>>>>>>> >>>>>>>>> mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca >>>>>>>>> rmaps_lama_bind 1c myprog >>>>>>>>> >>>>>>>>> But every try failed. At least they were accepted by > openmpi-1.7.3 >>>> as >>>>>> far >>>>>>>>> as I remember. >>>>>>>>> Anyway, please check it when you have a time, because using lama >>>> comes >>>>>>>> from >>>>>>>>> my curiosity. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Tetsuya Mishima >>>>>>>>> >>>>>>>>> >>>>>>>>> I'll try to take a look at it - my expectation is that lama might >>>> get >>>>>>>>> stuck because you didn't tell it a pattern to map, and I doubt >>> that >>>>>> code >>>>>>>>> path has seen much testing. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Ralph, I'm glad to hear that, thanks. >>>>>>>>> >>>>>>>>> By the way, yesterday I tried to check how lama in 1.7.4rc treat >>>> numa >>>>>>>>> node. >>>>>>>>> >>>>>>>>> Then, even wiht this simple command line, it freezed without any >>>>>>>>> massage: >>>>>>>>> >>>>>>>>> mpirun -np 2 -host node05 -mca rmaps lama myprog >>>>>>>>> >>>>>>>>> Could you check what happened? >>>>>>>>> >>>>>>>>> Is it better to open new thread or continue this thread? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Tetsuya Mishima >>>>>>>>> >>>>>>>>> >>>>>>>>> I'll make it work so that NUMA can be either above or below > socket >>>>>>>>> >>>>>>>>> On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Brice, >>>>>>>>> >>>>>>>>> Thank you for your comment. I understand what you mean. >>>>>>>>> >>>>>>>>> My opinion was made just considering easy way to adjust the code >>> for >>>>>>>>> inversion of hierarchy in object tree. >>>>>>>>> >>>>>>>>> Tetsuya Mishima >>>>>>>>> >>>>>>>>> >>>>>>>>> I don't think there's any such difference. >>>>>>>>> Also, all these NUMA architectures are reported the same by > hwloc, >>>>>>>>> and >>>>>>>>> therefore used the same in Open MPI. >>>>>>>>> >>>>>>>>> And yes, L3 and NUMA are topologically-identical on AMD >>> Magny-Cours >>>>>>>>> (and >>>>>>>>> most recent AMD and Intel platforms). >>>>>>>>> >>>>>>>>> Brice >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Le 20/12/2013 11:33, tmish...@jcity.maeda.co.jp a écrit : >>>>>>>>> >>>>>>>>> Hi Ralph, >>>>>>>>> >>>>>>>>> The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache >>>>>>>>> coherent)NUMA,>>>>>>> which seems to be a little bit different > from the traditional numa >>>>>>>>> defined >>>>>>>>> in openmpi. >>>>>>>>> >>>>>>>>> I notice that ccNUMA object is almost same as L3cache object. >>>>>>>>> So "-bind-to l3cache" or "-map-by l3cache" is valid for what I >>> want >>>>>>>>> to >>>>>>>>> do. >>>>>>>>> Therefore, "do not touch it" is one of the solution, I think ... >>>>>>>>> >>>>>>>>> Anyway, mixing up these two types of numa is the problem. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Tetsuya Mishima >>>>>>>>> >>>>>>>>> I can wait it'll be fixed in 1.7.5 or later, because putting >>>>>>>>> "-bind-to >>>>>>>>> numa" >>>>>>>>> and "-map-by numa" at the same time works as a workaround. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Tetsuya Mishima >>>>>>>>> >>>>>>>>> Yeah, it will impact everything that uses hwloc topology maps, I >>>>>>>>> fear. >>>>>>>>> >>>>>>>>> One side note: you'll need to add --hetero-nodes to your cmd >>>>>>>>> line. >>>>>>>>> If >>>>>>>>> we >>>>>>>>> don't see that, we assume that all the node topologies are >>>>>>>>> identical >>>>>>>>> - >>>>>>>>> which clearly isn't true here. >>>>>>>>> I'll try to resolve the hier inversion over the holiday - won't >>>>>>>>> be >>>>>>>>> for >>>>>>>>> 1.7.4, but hopefully for 1.7.5 >>>>>>>>> Thanks >>>>>>>>> Ralph >>>>>>>>> >>>>>>>>> On Dec 18, 2013, at 9:44 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I think it's normal for AMD opteron having 8/16 cores such as >>>>>>>>> magny cours or interlagos. Because it usually has 2 numa nodes >>>>>>>>> in a cpu(socket), numa-node can not include a socket. This type >>>>>>>>> of hierarchy would be natural. >>>>>>>>> >>>>>>>>> (node03 is Dell PowerEdge R815 and maybe quite common, I guess) >>>>>>>>> >>>>>>>>> By the way, I think this inversion should affect rmaps_lama >>>>>>>>> mapping. >>>>>>>>> >>>>>>>>> Tetsuya Mishima >>>>>>>>> >>>>>>>>> Ick - yeah, that would be a problem. I haven't seen that type >>>>>>>>> of >>>>>>>>> hierarchical inversion before - is node03 a different type of >>>>>>>>> chip? >>>>>>>>> Might take awhile for me to adjust the code to handle hier >>>>>>>>> inversion... :-( >>>>>>>>> On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Ralph, >>>>>>>>> >>>>>>>>> I found the reason. I attached the main part of output with 32 >>>>>>>>> core node(node03) and 8 core node(node05) at the bottom. >>>>>>>>> >>>>>>>>> From this information, socket of node03 includes numa-node. >>>>>>>>> On the other hand, numa-node of node05 includes socket. >>>>>>>>> The direction of object tree is opposite. >>>>>>>>> >>>>>>>>> Since "-map-by socket" may be assumed as default, >>>>>>>>> for node05, "-bind-to numa and -map-by socket" means >>>>>>>>> upward search. For node03, this should be downward. >>>>>>>>> >>>>>>>>> I guess that openmpi-1.7.4rc1 will always assume numa-node >>>>>>>>> includes socket. Is it right? Then, upward search is assumed >>>>>>>>> in orte_rmaps_base_compute_bindings even for node03 when I >>>>>>>>> put "-bind-to numa and -map-by socket" option. >>>>>>>>> >>>>>>>>> [node03.cluster:15508] [[38286,0],0] rmaps:base:compute_usage >>>>>>>>> [node03.cluster:15508] mca:rmaps: compute bindings for job >>>>>>>>> [38286,1] >>>>>>>>> with >>>>>>>>> policy NUMA >>>>>>>>> [node03.cluster:15508] mca:rmaps: bind upwards for job >>>>>>>>> [38286,1] >>>>>>>>> with >>>>>>>>> bindings NUMA >>>>>>>>> [node03.cluster:15508] [[38286,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Machine >>>>>>>>> >>>>>>>>> That's the reason of this trouble. Therefore, adding "-map-by >>>>>>>>> core" >>>>>>>>> works. >>>>>>>>> (mapping pattern seems to be strange ...) >>>>>>>>> >>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -bind-to numa -map-by >>>>>>>>> core >>>>>>>>> -report-bindings myprog >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> NUMANode >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> NUMANode >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> NUMANode >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> NUMANode >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> NUMANode >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode> >>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> NUMANode >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> NUMANode >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Cache >>>>>>>>> [node03.cluster:15885] [[38679,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> NUMANode >>>>>>>>> [node03.cluster:15885] MCW rank 2 bound to socket 0[core 0[hwt >>>>>>>>> 0]], >>>>>>>>> socket >>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>> cket 0[core 3[hwt 0]]: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>> [node03.cluster:15885] MCW rank 3 bound to socket 0[core 0[hwt >>>>>>>>> 0]], >>>>>>>>> socket >>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>> cket 0[core 3[hwt 0]]: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>> [node03.cluster:15885] MCW rank 4 bound to socket 0[core 4[hwt >>>>>>>>> 0]], >>>>>>>>> socket >>>>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so >>>>>>>>> cket 0[core 7[hwt 0]]: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.] >>>>>>>>> [node03.cluster:15885] MCW rank 5 bound to socket 0[core 4[hwt >>>>>>>>> 0]], >>>>>>>>> socket >>>>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so >>>>>>>>> cket 0[core 7[hwt 0]]: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.] >>>>>>>>> [node03.cluster:15885] MCW rank 6 bound to socket 0[core 4[hwt >>>>>>>>> 0]], >>>>>>>>> socket >>>>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so >>>>>>>>> cket 0[core 7[hwt 0]]: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.] >>>>>>>>> [node03.cluster:15885] MCW rank 7 bound to socket 0[core 4[hwt >>>>>>>>> 0]], >>>>>>>>> socket >>>>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so >>>>>>>>> cket 0[core 7[hwt 0]]: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.] >>>>>>>>> [node03.cluster:15885] MCW rank 0 bound to socket 0[core 0[hwt >>>>>>>>> 0]], >>>>>>>>> socket >>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>> cket 0[core 3[hwt 0]]: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>> [node03.cluster:15885] MCW rank 1 bound to socket 0[core 0[hwt >>>>>>>>> 0]], >>>>>>>>> socket >>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>> cket 0[core 3[hwt 0]]: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>> Hello world from process 6 of 8 >>>>>>>>> Hello world from process 5 of 8 >>>>>>>>> Hello world from process 0 of 8 >>>>>>>>> Hello world from process 7 of 8 >>>>>>>>> Hello world from process 3 of 8 >>>>>>>>> Hello world from process 4 of 8 >>>>>>>>> Hello world from process 2 of 8 >>>>>>>>> Hello world from process 1 of 8 >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Tetsuya Mishima >>>>>>>>> >>>>>>>>> [node03.cluster:15508] Type: Machine Number of child objects: >>>>>>>>> 4 >>>>>>>>> Name=NULL >>>>>>>>> total=132358820KB >>>>>>>>> Backend=Linux >>>>>>>>> OSName=Linux >>>>>>>>> OSRelease=2.6.18-308.16.1.el5 >>>>>>>>> OSVersion="#1 SMP Tue Oct 2 22:01:43 EDT 2012" >>>>>>>>> Architecture=x86_64 >>>>>>>>> Cpuset: 0xffffffff >>>>>>>>> Online: 0xffffffff >>>>>>>>> Allowed: 0xffffffff >>>>>>>>> Bind CPU proc: TRUE >>>>>>>>> Bind CPU thread: TRUE >>>>>>>>> Bind MEM proc: FALSE >>>>>>>>> Bind MEM thread: TRUE >>>>>>>>> Type: Socket Number of child objects: 2 >>>>>>>>> Name=NULL >>>>>>>>> total=33071780KB >>>>>>>>> CPUModel="AMD Opteron(tm) Processor 6136" >>>>>>>>> Cpuset: 0x000000ff >>>>>>>>> Online: 0x000000ff >>>>>>>>> Allowed: 0x000000ff >>>>>>>>> Type: NUMANode Number of child objects: 1 >>>>>>>>> >>>>>>>>> >>>>>>>>> [node05.cluster:21750] Type: Machine Number of child objects: >>>>>>>>> 2 >>>>>>>>> Name=NULL >>>>>>>>> total=33080072KB >>>>>>>>> Backend=Linux>>>> OSName=Linux >>>>>>>>> OSRelease=2.6.18-308.16.1.el5 >>>>>>>>> OSVersion="#1 SMP Tue Oct 2 22:01:43 EDT 2012" >>>>>>>>> Architecture=x86_64 >>>>>>>>> Cpuset: 0x000000ff >>>>>>>>> Online: 0x000000ff >>>>>>>>> Allowed: 0x000000ff >>>>>>>>> Bind CPU proc: TRUE >>>>>>>>> Bind CPU thread: TRUE >>>>>>>>> Bind MEM proc: FALSE >>>>>>>>> Bind MEM thread: TRUE >>>>>>>>> Type: NUMANode Number of child objects: 1 >>>>>>>>> Name=NULL >>>>>>>>> local=16532232KB >>>>>>>>> total=16532232KB >>>>>>>>> Cpuset: 0x0000000f >>>>>>>>> Online: 0x0000000f >>>>>>>>> Allowed: 0x0000000f >>>>>>>>> Type: Socket Number of child objects: 1 >>>>>>>>> >>>>>>>>> >>>>>>>>> Hmm...try adding "-mca rmaps_base_verbose 10 -mca >>>>>>>>> ess_base_verbose >>>>>>>>> 5" >>>>>>>>> to >>>>>>>>> your cmd line and let's see what it thinks it found. >>>>>>>>> >>>>>>>>> On Dec 18, 2013, at 6:55 PM, tmish...@jcity.maeda.co.jp >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi, I report one more problem with openmpi-1.7.4rc1, >>>>>>>>> which is more serious. >>>>>>>>> >>>>>>>>> For our 32 core nodes(AMD magny cours based) which has >>>>>>>>> 8 numa-nodes, "-bind-to numa" does not work. Without >>>>>>>>> this option, it works. For your infomation, at the >>>>>>>>> bottom of this mail, I added the lstopo information >>>>>>>>> of the node. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Tetsuya Mishima >>>>>>>>> >>>>>>>>> [mishima@manage ~]$ qsub -I -l nodes=1:ppn=32>> qsub: waiting for >>>> job >>>>>> 8352.manage.cluster to start >>>>>>>>> qsub: job 8352.manage.cluster ready >>>>>>>>> >>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -report-bindings >>>>>>>>> -bind-to >>>>>>>>> numa >>>>>>>>> myprog >>>>>>>>> [node03.cluster:15316] [[37582,0],0] bind:upward target >>>>>>>>> NUMANode >>>>>>>>> type >>>>>>>>> Machine >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> > -------------------------------------------------------------------------- >>>>>>>>> A request was made to bind to NUMA, but an appropriate >>>>>>>>> target >>>>>>>>> could >>>>>>>>> not >>>>>>>>> be found on node node03. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> > -------------------------------------------------------------------------- >>>>>>>>> [mishima@node03 ~]$ cd ~/Desktop/openmpi-1.7/demos/ >>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -report-bindings myprog >>>>>>>>> [node03.cluster:15282] MCW rank 2 bound to socket 1[core 8 >>>>>>>>> [hwt >>>>>>>>> 0]]: >>>>>>>>> [./././././././.][B/././././././.][./././././././.][ >>>>>>>>> ./././././././.]>>>>>>>>>>>> [node03.cluster:15282] MCW rank >>>>>>>>> 3 bound to socket 1[core 9[hwt >>>>>>>>> 0]]: >>>>>>>>> [./././././././.][./B/./././././.][./././././././.][ >>>>>>>>> ./././././././.] >>>>>>>>> [node03.cluster:15282] MCW rank 4 bound to socket 2[core 16 >>>>>>>>> [hwt >>>>>>>>> 0]]: >>>>>>>>> [./././././././.][./././././././.][B/././././././.] >>>>>>>>> [./././././././.] >>>>>>>>> [node03.cluster:15282] MCW rank 5 bound to socket 2[core 17 >>>>>>>>> [hwt >>>>>>>>> 0]]: >>>>>>>>> [./././././././.][./././././././.][./B/./././././.] >>>>>>>>> [./././././././.] >>>>>>>>> [node03.cluster:15282] MCW rank 6 bound to socket 3[core 24 >>>>>>>>> [hwt >>>>>>>>> 0]]: >>>>>>>>> [./././././././.][./././././././.][./././././././.] >>>>>>>>> [B/././././././.] >>>>>>>>> [node03.cluster:15282] MCW rank 7 bound to socket 3[core 25 >>>>>>>>> [hwt >>>>>>>>> 0]]: >>>>>>>>> [./././././././.][./././././././.][./././././././.] >>>>>>>>> [./B/./././././.] >>>>>>>>> [node03.cluster:15282] MCW rank 0 bound to socket 0[core 0 >>>>>>>>> [hwt >>>>>>>>> 0]]: >>>>>>>>> [B/././././././.][./././././././.][./././././././.][ >>>>>>>>> ./././././././.] >>>>>>>>> [node03.cluster:15282] MCW rank 1 bound to socket 0[core 1 >>>>>>>>> [hwt >>>>>>>>> 0]]: >>>>>>>>> [./B/./././././.][./././././././.][./././././././.][ >>>>>>>>> ./././././././.] >>>>>>>>> Hello world from process 2 of 8 >>>>>>>>> Hello world from process 5 of 8 >>>>>>>>> Hello world from process 4 of 8 >>>>>>>>> Hello world from process 3 of 8>>>>>>>>>> Hello world from >>>>>>>>> process 1 of 8 >>>>>>>>> Hello world from process 7 of 8 >>>>>>>>> Hello world from process 6 of 8 >>>>>>>>> Hello world from process 0 of 8>>>>>>> [mishima@node03 demos]$ > ~/opt/hwloc/bin/lstopo-no-graphics >>>>>>>>> Machine (126GB) >>>>>>>>> Socket L#0 (32GB) >>>>>>>>> NUMANode L#0 (P#0 16GB) + L3 L#0 (5118KB) >>>>>>>>> L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#0 >>>>>>>>> (P#0) >>>>>>>>> L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#1 >>>>>>>>> (P#1) >>>>>>>>> L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#2 >>>>>>>>> (P#2) >>>>>>>>> L2 L#3 (512K > B) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#3 >>>>>>>>> (P#3) >>>>>>>>> NUMANode L#1 (P#1 16GB) + L3 L#1 (5118KB) >>>>>>>>> L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#4 >>>>>>>>> (P#4) >>>>>>>>> L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#5 >>>>>>>>> (P#5) >>>>>>>>> L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#6 >>>>>>>>> (P#6) >>>>>>>>> L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 >>>>>>>>> + >>>>>>>>> PU>>>>>> L#7 >>>>>>>>> (P#7) >>>>>>>>> Socket L#1 (32GB) >>>>>>>>> NUMANode L#2 (P#6 16GB) + L3 L#2 (5118KB) >>>>>>>>> L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#8 >>>>>>>>> (P#8) >>>>>>>>> L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#9 >>>>>>>>> (P#9) >>>>>>>>> L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core >>>>>>>>> L#10 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#10 (P#10) >>>>>>>>> L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core >>>>>>>>> L#11 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#11 (P#11) >>>>>>>>> NUMANode L#3 (P#7 16GB) + L3 L#3 (5118KB) >>>>>>>>> L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core >>>>>>>>> L#12 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#12 (P#12) >>>>>>>>> L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core >>>>>>>>> L#13 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#13 (P#13) >>>>>>>>> L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core >>>>>>>>> L#14 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#14 (P#14) >>>>>>>>> L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core >>>>>>>>> L#15 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#15 (P#15) >>>>>>>>> Socket L#2 (32GB) >>>>>>>>> NUMANode L#4 (P#4 16GB) + L3 L#4 (5118KB) >>>>>>>>> L2 L#16 (512KB) + L1d L#16 (64KB) + L1i L#16 (64KB) + Core >>>>>>>>> L#16 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#16 (P#16) >>>>>>>>> L2 L#17 (512KB) + L1d L#17 (64KB) + L1i L#17 (64KB) + Core >>>>>>>>> L#17 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#17 (P#17)> >>>>> L2 L#18 (512KB) + L1d L#18 (64KB) + >>>>>>>>> L1i >>>>>>>>> L#18 (64KB) + Core L#18 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#18 (P#18) >>>>>>>>> L2 L#19 (512KB) + L1d L#19 (64KB) + L1i L#19 (64KB) + Core >>>>>>>>> L#19 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#19 (P#19) >>>>>>>>> NUMANode L#5 (P#5 16GB) + L3 L#5 (5118KB) >>>>>>>>> L2 L#20 (512KB) + L1d L#20 (64KB) + L1i L#20 (64KB) + Core >>>>>>>>> L#20 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#20 (P#20) >>>>>>>>> L2 L#21 (512KB) + L1d L#21 (64KB) + L1i L#21 (64KB) + Core >>>>>>>>> L#21 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#21 (P#21) >>>>>>>>> L2 L#22 (512KB) + L1d L#22 (64KB) + L1i L#22 (64KB) + Core >>>>>>>>> L#22 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#22 (P#22) >>>>>>>>> L2 L#23 (512KB) + L1d L#23 (64KB) + L1i L#23 (64KB) + Core >>>>>>>>> L#23 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#23 (P#23) >>>>>>>>> Socket L#3 (32GB) >>>>>>>>> NUMANode L#6 (P#2 16GB) + L3 L#6 (5118KB) >>>>>>>>> L2 L#24 (512KB) + L1d L#24 (64KB) + L1i L#24 (64KB) + Core >>>>>>>>> L#24 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#24 (P#24)>>>>> L2 L#25 (512KB) + L1d L#25 (64KB) + L1i >>>>>>>>> L#25 >>>>>>>>> (64KB) + Core L#25 + >>>>>>>>> PU >>>>>>>>> L#25 (P#25) >>>>>>>>> L2 L#26 (512KB) + L1d L#26 (64KB) + L1i L#26 (64KB) + Core >>>>>>>>> L#26 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#26 (P#26) >>>>>>>>> L2 L#27 (512KB) + L1d L#27 (64KB) + L1i L#27 (64KB) + Core >>>>>>>>> L#27 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#27 (P#27) >>>>>>>>> NUMANode L#7 (P#3 16GB) + L3 L#7 (5118KB) >>>>>>>>> L2 L#28 (512KB) + L1d L#28 (64KB) + L1i L#28 (64KB) + Core >>>>>>>>> L#28 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#28 (P#28) >>>>>>>>> L2 L#29 (512KB) + L1d L#29 (64KB) + L1i L#29 (64KB) + Core >>>>>>>>> L#29 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#29 (P#29) >>>>>>>>> L2 L#30 (512KB) + L1d L#30 (64KB) + L1i L#30 (64KB) + Core >>>>>>>>> L#30 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#30 (P#30) >>>>>>>>> L2 L#31 (512KB) + L1d L#31 (64KB) + L1i L#31 (64KB) + Core >>>>>>>>> L#31 >>>>>>>>> + >>>>>>>>> PU >>>>>>>>> L#31 (P#31) >>>>>>>>> HostBridge L#0 >>>>>>>>> PCIBridge >>>>>>>>> PCI 14e4:1639 >>>>>>>>> Net L#0 "eth0" >>>>>>>>> PCI 14e4:1639 >>>>>>>>> Net L#1 "eth1" >>>>>>>>> PCIBridge >>>>>>>>> PCI 14e4:1639 >>>>>>>>> Net L#2 "eth2" >>>>>>>>> PCI 14e4:1639 >>>>>>>>> Net L#3 "eth3" >>>>>>>>> PCIBridge >>>>>>>>> PCIBridge >>>>>>>>> PCIBridge >>>>>>>>> PCI 1000:0072 >>>>>>>>> Block L#4 "sdb" >>>>>>>>> Block L#5 "sda" >>>>>>>>> PCI 1002:4390 >>>>>>>>> Block L#6 "sr0" >>>>>>>>> PCIBridge >>>>>>>>> PCI 102b:0532 >>>>>>>>> HostBridge L#7 >>>>>>>>> PCIBridge >>>>>>>>> PCI 15b3:6274 >>>>>>>>> Net L#7 "ib0" >>>>>>>>> OpenFabrics L#8 "mthca0" >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org>> >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________> >>>> users >>> mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________ >>> ________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ > >>> >>>> >>>>>> >>>>>>>> >>>>>>>>> users mailing list >>>>>>>>> >>> users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users