Added this info to the ticket, and added you to it as well.
Thanks again
Ralph
On Dec 25, 2013, at 3:42 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph,
>
> Thank you for your reply. After that, I found more reasonable fix,
> I guess. I moved OBJ_CONSTRUCT for opal_tree_item_t out of de
Hi Ralph,
Thank you for your reply. After that, I found more reasonable fix,
I guess. I moved OBJ_CONSTRUCT for opal_tree_item_t out of debug
part in opal_tree_construct as shown below:
static void opal_tree_construct(opal_tree_t *tree)
{
OBJ_CONSTRUCT( &(tree->opal_tree_sentinel), opal_tre
Deeply appreciate all you help! Your fix looks reasonable to me and is the kind
of difference we frequently see between compilers and environments, which is
why initializing variables is so important. This one apparently slipped by the
lama developers.
I'll apply to trunk and cmr it across to 1
Hi Ralph,
I did valgrind and found uninitialised value errors. All of them
occured in opal_tree_add_child as shown at the bottom. As a quick
fix, I puted one line in "opal_tree.c", although it's not elegant:
void opal_tree_init(opal_tree_t *tree, opal_tree_comp_fn_t comp,
op
Hi Ralph,
Here is the output when I put "-mca rmaps_base_verbose 10 --display-map"
and where it stopped(by gdb), which shows it stopped in a function of lama.
I usually use PGI 13.10, so I tried to change it to gnu compiler.
Then, it works. Therefore, this problem depends on compiler.
That's a
On Dec 21, 2013, at 8:16 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Ralph, thanks. I'll try it on Tuseday.
>
> Let me confirm one thing. I don't put "-with-libevent" when I build
> openmpi.
> Is there any possibility to build with external libevent automatically?
No - only happens if you dir
Ralph, thanks. I'll try it on Tuseday.
Let me confirm one thing. I don't put "-with-libevent" when I build
openmpi.
Is there any possibility to build with external libevent automatically?
Tetsuya Mishima
> Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to
your cmd line and
Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to your cmd
line and let's see if it finishes the mapping.
Unless you specifically built with an external libevent (which I doubt), there
is no conflict. The connection issue is unlikely to be a factor here as it
works when not
Thank you, Ralph.
Then, this problem should depend on our environment.
But, at least, inversion problem is not the cause because
node05 has normal hier order.
I can not connect to our cluster now. Tuesday, going
back to my office, I'll send you further report.
Before that, please let me know y
It seems to be working fine for me:
[rhc@bend001 tcp]$ mpirun -np 2 -host bend001 -report-bindings -mca
rmaps_lama_bind 1c -mca rmaps lama hostname
bend001
[bend001:17005] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]:
[../BB/../../../..][../../../../../..]
[bend001:17005] MCW rank 0 bound to so
Hi Ralph,
Thank you very much. I tried many things such as:
mpirun -np 2 -host node05 -report-bindings -mca rmaps lama -mca
rmaps_lama_bind 1c myprog
But every try failed. At least they were accepted by openmpi-1.7.3 as far
as I remember.
Anyway, please check it when you have a time, because u
I'll try to take a look at it - my expectation is that lama might get stuck
because you didn't tell it a pattern to map, and I doubt that code path has
seen much testing.
On Dec 20, 2013, at 5:52 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph, I'm glad to hear that, thanks.
>
> By the
Hi Ralph, I'm glad to hear that, thanks.
By the way, yesterday I tried to check how lama in 1.7.4rc treat numa node.
Then, even wiht this simple command line, it freezed without any massage:
mpirun -np 2 -host node05 -mca rmaps lama myprog
Could you check what happened?
Is it better to ope
I'll make it work so that NUMA can be either above or below socket
On Dec 20, 2013, at 2:57 AM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Brice,
>
> Thank you for your comment. I understand what you mean.
>
> My opinion was made just considering easy way to adjust the code for
> inversion of
Hi Brice,
Thank you for your comment. I understand what you mean.
My opinion was made just considering easy way to adjust the code for
inversion of hierarchy in object tree.
Tetsuya Mishima
> I don't think there's any such difference.
> Also, all these NUMA architectures are reported the sam
I don't think there's any such difference.
Also, all these NUMA architectures are reported the same by hwloc, and
therefore used the same in Open MPI.
And yes, L3 and NUMA are topologically-identical on AMD Magny-Cours (and
most recent AMD and Intel platforms).
Brice
Le 20/12/2013 11:33, tmish
Hi Ralph,
The numa-node in AMD Mangy-Cours/Interlagos is so called cc(cache
coherent)NUMA,
which seems to be a little bit different from the traditional numa defined
in openmpi.
I notice that ccNUMA object is almost same as L3cache object.
So "-bind-to l3cache" or "-map-by l3cache" is valid for
I can wait it'll be fixed in 1.7.5 or later, because putting "-bind-to
numa"
and "-map-by numa" at the same time works as a workaround.
Thanks,
Tetsuya Mishima
> Yeah, it will impact everything that uses hwloc topology maps, I fear.
>
> One side note: you'll need to add --hetero-nodes to your c
Yeah, it will impact everything that uses hwloc topology maps, I fear.
One side note: you'll need to add --hetero-nodes to your cmd line. If we don't
see that, we assume that all the node topologies are identical - which clearly
isn't true here.
I'll try to resolve the hier inversion over the h
I think it's normal for AMD opteron having 8/16 cores such as
magny cours or interlagos. Because it usually has 2 numa nodes
in a cpu(socket), numa-node can not include a socket. This type
of hierarchy would be natural.
(node03 is Dell PowerEdge R815 and maybe quite common, I guess)
By the way,
Ick - yeah, that would be a problem. I haven't seen that type of hierarchical
inversion before - is node03 a different type of chip?
Might take awhile for me to adjust the code to handle hier inversion... :-(
On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi Ralph,
>
>
Hi Ralph,
I found the reason. I attached the main part of output with 32
core node(node03) and 8 core node(node05) at the bottom.
>From this information, socket of node03 includes numa-node.
On the other hand, numa-node of node05 includes socket.
The direction of object tree is opposite.
Since
Hi, here is the output with "-mca rmaps_base_verbose 10
-mca ess_base_verbose 5". Please see the attached file.
(See attached file: output.txt)
Regards,
Tetsuya Mishima
> Hmm...try adding "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5" to
your cmd line and let's see what it thinks it foun
Hmm...try adding "-mca rmaps_base_verbose 10 -mca ess_base_verbose 5" to your
cmd line and let's see what it thinks it found.
On Dec 18, 2013, at 6:55 PM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Hi, I report one more problem with openmpi-1.7.4rc1,
> which is more serious.
>
> For our 32 core
Hi, I report one more problem with openmpi-1.7.4rc1,
which is more serious.
For our 32 core nodes(AMD magny cours based) which has
8 numa-nodes, "-bind-to numa" does not work. Without
this option, it works. For your infomation, at the
bottom of this mail, I added the lstopo information
of the no
25 matches
Mail list logo