[hwloc-devel] Create success (hwloc r1.2.1rc1r3540)

2011-07-04 Thread MPI Team
Creating nightly hwloc snapshot SVN tarball was a success.

Snapshot:   hwloc 1.2.1rc1r3540
Start time: Mon Jul  4 21:03:32 EDT 2011
End time:   Mon Jul  4 21:05:48 EDT 2011

Your friendly daemon,
Cyrador


[hwloc-devel] Create success (hwloc r1.3a1r3537)

2011-07-04 Thread MPI Team
Creating nightly hwloc snapshot SVN tarball was a success.

Snapshot:   hwloc 1.3a1r3537
Start time: Mon Jul  4 21:01:02 EDT 2011
End time:   Mon Jul  4 21:03:31 EDT 2011

Your friendly daemon,
Cyrador


Re: [OMPI devel] TIPC BTL Segmentation fault

2011-07-04 Thread Xin He

Hi, here is the result:

ehhexxn@oak:~/git/test$ mpirun -n 2 -mca btl tipc,self valgrind 
./hello_c > 11.out

==30850== Memcheck, a memory error detector
==30850== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==30850== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for 
copyright info

==30850== Command: ./hello_c
==30850==
==30849== Memcheck, a memory error detector
==30849== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==30849== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for 
copyright info

==30849== Command: ./hello_c
==30849==
==30849== Jump to the invalid address stated on the next line
==30849==at 0xDEAFBEEDDEAFBEED: ???
==30849==by 0x50151F1: opal_list_construct (opal_list.c:88)
==30849==by 0xA8A49F1: opal_obj_run_constructors (opal_object.h:427)
==30849==by 0xA8A4E59: mca_pml_ob1_comm_construct (pml_ob1_comm.c:56)
==30849==by 0xA8A1385: opal_obj_run_constructors (opal_object.h:427)
==30849==by 0xA8A149F: opal_obj_new (opal_object.h:477)
==30849==by 0xA8A12FA: opal_obj_new_debug (opal_object.h:252)
==30849==by 0xA8A2A5F: mca_pml_ob1_add_comm (pml_ob1.c:182)
==30849==by 0x4E95F50: ompi_mpi_init (ompi_mpi_init.c:770)
==30849==by 0x4EC6C32: PMPI_Init (pinit.c:84)
==30849==by 0x400935: main (in /home/ehhexxn/git/test/hello_c)
==30849==  Address 0xdeafbeeddeafbeed is not stack'd, malloc'd or 
(recently) free'd

==30849==
[oak:30849] *** Process received signal ***
[oak:30849] Signal: Segmentation fault (11)
[oak:30849] Signal code: Invalid permissions (2)
[oak:30849] Failing at address: 0xdeafbeeddeafbeed
==30849== Invalid read of size 1
==30849==at 0xA011FDB: ??? (in /lib/libgcc_s.so.1)
==30849==by 0xA012B0B: _Unwind_Backtrace (in /lib/libgcc_s.so.1)
==30849==by 0x60BE69D: backtrace (backtrace.c:91)
==30849==by 0x4FAB055: opal_backtrace_buffer (backtrace_execinfo.c:54)
==30849==by 0x5026DF3: show_stackframe (stacktrace.c:348)
==30849==by 0x5DB1B3F: ??? (in /lib/libpthread-2.12.1.so)
==30849==by 0xDEAFBEEDDEAFBEEC: ???
==30849==by 0x50151F1: opal_list_construct (opal_list.c:88)
==30849==by 0xA8A49F1: opal_obj_run_constructors (opal_object.h:427)
==30849==by 0xA8A4E59: mca_pml_ob1_comm_construct (pml_ob1_comm.c:56)
==30849==by 0xA8A1385: opal_obj_run_constructors (opal_object.h:427)
==30849==by 0xA8A149F: opal_obj_new (opal_object.h:477)
==30849==  Address 0xdeafbeeddeafbeed is not stack'd, malloc'd or 
(recently) free'd

==30849==
==30849==
==30849== Process terminating with default action of signal 11 
(SIGSEGV): dumping core

==30849==  General Protection Fault
==30849==at 0xA011FDB: ??? (in /lib/libgcc_s.so.1)
==30849==by 0xA012B0B: _Unwind_Backtrace (in /lib/libgcc_s.so.1)
==30849==by 0x60BE69D: backtrace (backtrace.c:91)
==30849==by 0x4FAB055: opal_backtrace_buffer (backtrace_execinfo.c:54)
==30849==by 0x5026DF3: show_stackframe (stacktrace.c:348)
==30849==by 0x5DB1B3F: ??? (in /lib/libpthread-2.12.1.so)
==30849==by 0xDEAFBEEDDEAFBEEC: ???
==30849==by 0x50151F1: opal_list_construct (opal_list.c:88)
==30849==by 0xA8A49F1: opal_obj_run_constructors (opal_object.h:427)
==30849==by 0xA8A4E59: mca_pml_ob1_comm_construct (pml_ob1_comm.c:56)
==30849==by 0xA8A1385: opal_obj_run_constructors (opal_object.h:427)
==30849==by 0xA8A149F: opal_obj_new (opal_object.h:477)
==30850== Jump to the invalid address stated on the next line
==30850==at 0xDEAFBEEDDEAFBEED: ???
==30850==by 0x50151F1: opal_list_construct (opal_list.c:88)
==30850==by 0xA8A49F1: opal_obj_run_constructors (opal_object.h:427)
==30850==by 0xA8A4E59: mca_pml_ob1_comm_construct (pml_ob1_comm.c:56)
==30850==by 0xA8A1385: opal_obj_run_constructors (opal_object.h:427)
==30850==by 0xA8A149F: opal_obj_new (opal_object.h:477)
==30850==by 0xA8A12FA: opal_obj_new_debug (opal_object.h:252)
==30850==by 0xA8A2A5F: mca_pml_ob1_add_comm (pml_ob1.c:182)
==30850==by 0x4E95F50: ompi_mpi_init (ompi_mpi_init.c:770)
==30850==by 0x4EC6C32: PMPI_Init (pinit.c:84)
==30850==by 0x400935: main (in /home/ehhexxn/git/test/hello_c)
==30850==  Address 0xdeafbeeddeafbeed is not stack'd, malloc'd or 
(recently) free'd

==30850==
[oak:30850] *** Process received signal ***
[oak:30850] Signal: Segmentation fault (11)
[oak:30850] Signal code: Invalid permissions (2)
[oak:30850] Failing at address: 0xdeafbeeddeafbeed
==30849==
==30849== HEAP SUMMARY:
==30849== in use at exit: 2,338,964 bytes in 3,213 blocks
==30849==   total heap usage: 5,205 allocs, 1,992 frees, 12,942,078 
bytes allocated

==30849==
==30850== Invalid read of size 1
==30850==at 0xA011FDB: ??? (in /lib/libgcc_s.so.1)
==30850==by 0xA012B0B: _Unwind_Backtrace (in /lib/libgcc_s.so.1)
==30850==by 0x60BE69D: backtrace (backtrace.c:91)
==30850==by 0x4FAB055: opal_backtrace_buffer (backtrace_execinfo.c:54)
==30850==by 0x5026DF3: show_stackframe 

Re: [OMPI devel] TIPC BTL Segmentation fault

2011-07-04 Thread Jeff Squyres
Keep in mind, too, that opal_object is the "base" object -- put in C++ terms, 
it's the abstract class that all other classes are made of.  So it's rare that 
we could create a opal_object by itself.  opal_objects are usually created as 
part of some other, higher-level object.

What's the full call stack of where Valgrind is showing the error?

Make sure you have the most recent valgrind (www.valgrind.org); the versions 
that ship in various distros may be somewhat old.  Newer valgrind versions show 
lots of things that older versions don't.  A new valgrind *might* be able to 
show some prior memory fault that is causing the issue...?


On Jul 4, 2011, at 7:45 AM, Xin He wrote:

> Hi,
> 
> I ran the program with valgrind, and it showed almost the same error. It 
> appeared that the segmentation fault happened during
> the initiation of an opal_object.  That's why it puzzled me.
> 
> /Xin
> 
> 
> On 07/04/2011 01:40 PM, Jeff Squyres wrote:
>> Ah -- so this is in the template code.  I suspect this code might have bit 
>> rotted a bit.  :-\
>> 
>> If you run this through valgrind, does anything obvious show up?  I ask 
>> because this kind of error is typically a symptom of the real error.  I.e., 
>> the real error was some kind of memory corruption that occurred earlier, and 
>> this is the memory access that exposes that prior memory corruption.
>> 
>> 
>> On Jul 4, 2011, at 5:08 AM, Xin He wrote:
>> 
>>> Yes, it is a opal_object.
>>> 
>>> And this error seems to be caused by these code:
>>> 
>>>  void mca_btl_template_proc_construct(mca_btl_template_proc_t* 
>>> template_proc){
>>> ...
>>>.
>>> /* add to list of all proc instance */
>>> OPAL_THREAD_LOCK(_btl_template_component.template_lock);
>>> 
>>> opal_list_append(_btl_template_component.template_procs,_proc->super);
>>> OPAL_THREAD_UNLOCK(_btl_template_component.template_lock);
>>> }
>>> 
>>> /Xin
>>> 
>>> On 07/02/2011 10:49 PM, Jeff Squyres (jsquyres) wrote:
 Do u know which object it is that is being constructed?  When you compile 
 with debugging enabled, theres strings in the object struct that identify 
 te file and line where the obj was created.
 
 Sent from my phone. No type good.
 
 On Jun 29, 2011, at 8:48 AM, "Xin He"
 
  wrote:
 
 
> Hi,
> 
> As I advanced in my implementation of TIPC BTL, I added the component and 
> tried to run hello_c program to test.
> 
> Then I got this segmentation fault. It seemed happening after the call 
> "mca_btl_tipc_add_procs".
> 
> The error message displayed:
> 
> [oak:23192] *** Process received signal ***
> [oak:23192] Signal: Segmentation fault (11)
> [oak:23192] Signal code:  (128)
> [oak:23192] Failing at address: (nil)
> [oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40]
> [oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10]
> [oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2]
> [oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2]
> [oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a]
> [oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386]
> [oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0]
> [oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb]
> [oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60]
> [oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51]
> [oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33]
> [oak:23192] [11] hello_i(main+0x22) [0x400936]
> [oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e]
> [oak:23192] [13] hello_i() [0x400859]
> [oak:23192] *** End of error message ***
> 
> I used gdb to check the stack:
> (gdb) bt
> #0  0x77afac10 in opal_obj_run_constructors (object=0x6ca980)
>at ../opal/class/opal_object.h:427
> #1  0x77afb1f2 in opal_list_construct (list=0x6ca958) at 
> class/opal_list.c:88
> #2  0x72d479f2 in opal_obj_run_constructors (object=0x6ca958)
>at ../../../../opal/class/opal_object.h:427
> #3  0x72d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0)
>at pml_ob1_comm.c:55
> #4  0x72d44386 in opal_obj_run_constructors (object=0x6ca8c0)
>at ../../../../opal/class/opal_object.h:427
> #5  0x72d444a0 in opal_obj_new (cls=0x72f6c040)
>at ../../../../opal/class/opal_object.h:477
> #6  0x72d442fb in opal_obj_new_debug (type=0x72f6c040,
>file=0x72d62840 "pml_ob1.c", line=182)
>at ../../../../opal/class/opal_object.h:252
> #7  0x72d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at 
> pml_ob1.c:182
> #8  0x7797bf51 in ompi_mpi_init (argc=1, 

Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

2011-07-04 Thread Jeff Squyres
On Jul 3, 2011, at 8:40 PM, Kawashima wrote:

>> Does your llp sed path order MPI matching ordering?  Eg if some prior isend 
>> is already queued, could the llp send overtake it?
> 
> Yes, LLP send may overtake queued isend.
> But we use correct PML send_sequence. So the LLP message is queued as
> unexpected message on receiver side, and I think it's no problem.

Good!  I just wanted to ask because I couldn't quite tell from your prior 
description.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] TIPC BTL Segmentation fault

2011-07-04 Thread Jeff Squyres
Ah -- so this is in the template code.  I suspect this code might have bit 
rotted a bit.  :-\

If you run this through valgrind, does anything obvious show up?  I ask because 
this kind of error is typically a symptom of the real error.  I.e., the real 
error was some kind of memory corruption that occurred earlier, and this is the 
memory access that exposes that prior memory corruption.


On Jul 4, 2011, at 5:08 AM, Xin He wrote:

> Yes, it is a opal_object.
> 
> And this error seems to be caused by these code:
> 
>  void mca_btl_template_proc_construct(mca_btl_template_proc_t* template_proc){
> ...
>.
> /* add to list of all proc instance */
> OPAL_THREAD_LOCK(_btl_template_component.template_lock);
> opal_list_append(_btl_template_component.template_procs, 
> _proc->super);
> OPAL_THREAD_UNLOCK(_btl_template_component.template_lock);
> }
> 
> /Xin
> 
> On 07/02/2011 10:49 PM, Jeff Squyres (jsquyres) wrote:
>> Do u know which object it is that is being constructed?  When you compile 
>> with debugging enabled, theres strings in the object struct that identify te 
>> file and line where the obj was created. 
>> 
>> Sent from my phone. No type good. 
>> 
>> On Jun 29, 2011, at 8:48 AM, "Xin He" 
>> 
>>  wrote:
>> 
>> 
>>> Hi,
>>> 
>>> As I advanced in my implementation of TIPC BTL, I added the component and 
>>> tried to run hello_c program to test.
>>> 
>>> Then I got this segmentation fault. It seemed happening after the call 
>>> "mca_btl_tipc_add_procs".
>>> 
>>> The error message displayed:
>>> 
>>> [oak:23192] *** Process received signal ***
>>> [oak:23192] Signal: Segmentation fault (11)
>>> [oak:23192] Signal code:  (128)
>>> [oak:23192] Failing at address: (nil)
>>> [oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40]
>>> [oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10]
>>> [oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2]
>>> [oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2]
>>> [oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a]
>>> [oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386]
>>> [oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0]
>>> [oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb]
>>> [oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60]
>>> [oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51]
>>> [oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33]
>>> [oak:23192] [11] hello_i(main+0x22) [0x400936]
>>> [oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e]
>>> [oak:23192] [13] hello_i() [0x400859]
>>> [oak:23192] *** End of error message ***
>>> 
>>> I used gdb to check the stack:
>>> (gdb) bt
>>> #0  0x77afac10 in opal_obj_run_constructors (object=0x6ca980)
>>>at ../opal/class/opal_object.h:427
>>> #1  0x77afb1f2 in opal_list_construct (list=0x6ca958) at 
>>> class/opal_list.c:88
>>> #2  0x72d479f2 in opal_obj_run_constructors (object=0x6ca958)
>>>at ../../../../opal/class/opal_object.h:427
>>> #3  0x72d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0)
>>>at pml_ob1_comm.c:55
>>> #4  0x72d44386 in opal_obj_run_constructors (object=0x6ca8c0)
>>>at ../../../../opal/class/opal_object.h:427
>>> #5  0x72d444a0 in opal_obj_new (cls=0x72f6c040)
>>>at ../../../../opal/class/opal_object.h:477
>>> #6  0x72d442fb in opal_obj_new_debug (type=0x72f6c040,
>>>file=0x72d62840 "pml_ob1.c", line=182)
>>>at ../../../../opal/class/opal_object.h:252
>>> #7  0x72d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at 
>>> pml_ob1.c:182
>>> #8  0x7797bf51 in ompi_mpi_init (argc=1, argv=0x7fffdf58, 
>>> requested=0,
>>>provided=0x7fffde28) at runtime/ompi_mpi_init.c:770
>>> #9  0x779acc33 in PMPI_Init (argc=0x7fffde5c, 
>>> argv=0x7fffde50)
>>>at pinit.c:84
>>> #10 0x00400936 in main (argc=1, argv=0x7fffdf58) at hello_c.c:17
>>> 
>>> It seems the error happened when an object is constructed. Any idea why 
>>> this is happening?
>>> 
>>> Thanks.
>>> 
>>> Best regards,
>>> Xin
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> 
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> ___
>> devel mailing list
>> 
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] hwloc trunk nightly 1.3a1r3511 fails to build on CentOS 5.6 & RHEL 5.6

2011-07-04 Thread Brice Goglin
All this should be fixed now, and the configure output is now clear (it
doesn't change its mind about pci_init/cleanup or pci_lookup_name
without any obvious reason anymore).

FC7:
checking for pci/pci.h... yes
checking for pci_init in -lpci... no
checking for pci_init in -lpci with -lz... yes
checking for pci_lookup_name in -lpci... no
checking for inet_ntoa in -lresolv... yes
checking for pci_lookup_name in -lpci with -lresolv... yes

RHEL5.6:
checking for pci/pci.h... yes
checking for pci_init in -lpci... yes
checking for pci_lookup_name in -lpci... no
checking for inet_ntoa in -lresolv... yes
checking for pci_lookup_name in -lpci with -lresolv... yes

RHEL5.3:
checking for pci/pci.h... yes
checking for pci_init in -lpci... yes
checking for pci_lookup_name in -lpci... yes

Christopher, it should work starting with trunk r3535.

Brice




Le 30/06/2011 07:50, Brice Goglin a écrit :
> Le 29/06/2011 13:18, Brice Goglin a écrit :
>> I don't think we finally fixed this.
>>
>> IIRC, we need either a way to bypass the cache, or always add -lresolv
>> even if it's useless (or find another way to detect if lresolv is needed).
> Redefining our own HWLOC_AC_CHECK_LIB_NO_CACHE looks possible.
>
> Otherwise, we could use something different from AC_CHECK_LIB for the
> second check (AC_SEARCH_LIBS uses a different cache name).
> Or even use AC_LINK_IFELSE/AC_TRY_LINK which never cache anything.
>
> Brice
>
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



Re: [OMPI devel] TIPC BTL Segmentation fault

2011-07-04 Thread Xin He

Yes, it is a opal_object.

And this error seems to be caused by these code:

 void mca_btl_template_proc_construct(mca_btl_template_proc_t* 
template_proc){

...
   .
/* add to list of all proc instance */
OPAL_THREAD_LOCK(_btl_template_component.template_lock);
opal_list_append(_btl_template_component.template_procs, 
_proc->super);

OPAL_THREAD_UNLOCK(_btl_template_component.template_lock);
}

/Xin

On 07/02/2011 10:49 PM, Jeff Squyres (jsquyres) wrote:

Do u know which object it is that is being constructed?  When you compile with 
debugging enabled, theres strings in the object struct that identify te file 
and line where the obj was created.

Sent from my phone. No type good.

On Jun 29, 2011, at 8:48 AM, "Xin He"  wrote:


Hi,

As I advanced in my implementation of TIPC BTL, I added the component and tried 
to run hello_c program to test.

Then I got this segmentation fault. It seemed happening after the call 
"mca_btl_tipc_add_procs".

The error message displayed:

[oak:23192] *** Process received signal ***
[oak:23192] Signal: Segmentation fault (11)
[oak:23192] Signal code:  (128)
[oak:23192] Failing at address: (nil)
[oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40]
[oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10]
[oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2]
[oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2]
[oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a]
[oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386]
[oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0]
[oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb]
[oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60]
[oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51]
[oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33]
[oak:23192] [11] hello_i(main+0x22) [0x400936]
[oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e]
[oak:23192] [13] hello_i() [0x400859]
[oak:23192] *** End of error message ***

I used gdb to check the stack:
(gdb) bt
#0  0x77afac10 in opal_obj_run_constructors (object=0x6ca980)
at ../opal/class/opal_object.h:427
#1  0x77afb1f2 in opal_list_construct (list=0x6ca958) at 
class/opal_list.c:88
#2  0x72d479f2 in opal_obj_run_constructors (object=0x6ca958)
at ../../../../opal/class/opal_object.h:427
#3  0x72d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0)
at pml_ob1_comm.c:55
#4  0x72d44386 in opal_obj_run_constructors (object=0x6ca8c0)
at ../../../../opal/class/opal_object.h:427
#5  0x72d444a0 in opal_obj_new (cls=0x72f6c040)
at ../../../../opal/class/opal_object.h:477
#6  0x72d442fb in opal_obj_new_debug (type=0x72f6c040,
file=0x72d62840 "pml_ob1.c", line=182)
at ../../../../opal/class/opal_object.h:252
#7  0x72d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at pml_ob1.c:182
#8  0x7797bf51 in ompi_mpi_init (argc=1, argv=0x7fffdf58, 
requested=0,
provided=0x7fffde28) at runtime/ompi_mpi_init.c:770
#9  0x779acc33 in PMPI_Init (argc=0x7fffde5c, argv=0x7fffde50)
at pinit.c:84
#10 0x00400936 in main (argc=1, argv=0x7fffdf58) at hello_c.c:17

It seems the error happened when an object is constructed. Any idea why this is 
happening?

Thanks.

Best regards,
Xin


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [hwloc-devel] hwloc_distances as utility?

2011-07-04 Thread Brice Goglin
Le 03/07/2011 23:55, Jiri Hladky a écrit :
> Hi all,
>
> I have come across tests/hwloc_distances test and I believe that it
> would be great to convert this into the utility
> "hwloc-report-instances" published under utils/ directory. Please let
> me know what you think about it.
>
> It would take the same input as hwloc-info (read topology from
> different formats instead of discovering the topology on the local
> machine), support both logical and physical indexes (-l and -p) switch.

By the way, lstopo shows distance information, but it does not change it
depending on -l/-p. We may want to fix this.

> I have used stream memory bandwidth benchmark
> (http://www.cs.virginia.edu/stream/) in the past to produce the
> similar output as tests/hwloc_distances. It was interesting to see
> that numactl and kernel scheduler are both using number of hopes
> instead of memory bandwidth.

Actually, Linux only uses the number of hops on one specific MIPS
architecture (SGI IP27 Origin 200/2000). In other cases, it uses the
cpu-to-memory latency (usually reported by ACPI or so).

> On some systems number of hops does not represent memory bandwidth. I
> have reported this in BZ 655041
>
> https://bugzilla.redhat.com/show_bug.cgi?id=655041

This bug is private unfortunately.

> In any case I believe that hwloc-report-instances would be useful
> utility. Please let me know your opinion.

Agreed.

There are still several things to improve regarding distances.
Everything should be in https://svn.open-mpi.org/trac/hwloc/ticket/43

Brice