Re: [OMPI users] Using OMPI TESTING openmpi-1.0.2a9 with 2 ethernet cards

Jeff Squyres Sat, 4 Mar 2006 14:24:42 -0500

On Mar 3, 2006, at 9:07 AM, Jose Pedro Garcia Mahedero wrote:

cluster master machine
eth0 mpihosts_out --> for outside use (getting its own ip via dhcp)

eth1, mpihosts_cluster --> for cluster use (serves ip's tocluster nodes)


 -------  TESTS  1,2  -openmpi-1.0.2a9 ------

1.- cd openmpi-1.0.1
2.- make clean
3.- cd openmpi-1.0.2a9
4.- ./configure
5.- make all install

no parameters  /usr/local/etc/openmpi-mca-params.conf
mpirun -np 2  --hostfile mpihosts_cluster ping_pong
mpirun -np 2  --hostfile mpihosts_out ping_pong

GIve the same results:

Signal:11 info.si_errno:0 (Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x6
*** End of error message ***
[0] func:/usr/local/lib/libopal.so.0 [0x40101cb2]
[1] func:[0xffffe440]
[2] func:/usr/local/lib/openmpi/mca_btl_tcp.so [0x404541d6]

[3] func:/usr/local/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x149) [0x404502f9]

Yoinks -- whatever we do, we should not be seg faulting. :-( It isapparently dying in the mca_btl_tcp_add_procs() function, which iswhere we're creating MPI networking mappings from one TCP peer toanother.

I am unable to repeat this error (I tried it on a cluster with asimilar setup to yours). Can you recompile the TCP BTL withdebugging symbols so that we can get a little more information?


Do the following:

cd top_of_your_open_mpi_source_tree
cd ompi/mca/btl/tcp
make CFLAGS=-g clean all install

Then run the test again (you shouldn't need to recompile yourapplication; this just recompiled and re-installed the TCP BTLplugin). The output stack trace for the mca_btl_tcp stuff should nowinclude line numbers, and tell us exactly where it is dying. If youget a corefile, can you load that up in gdb and send the output of"bt full"?


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Re: [OMPI users] Using OMPI TESTING openmpi-1.0.2a9 with 2 ethernet cards

Reply via email to