Hi: A customer is running our parallel application on an SGI Altix
machine. They compiled OMPI 1.2.8 themselves. The Altix uses IB
interfaces and they recently upgraded to OFED 1.3 (in SGI Propack 6).
They are receiving a bus error in ompi_free_list_grow:

[r1i0n0:01321] *** Process received signal ***
[r1i0n0:01321] Signal: Bus error (7)
[r1i0n0:01321] Signal code:  (2)
[r1i0n0:01321] Failing at address: 0x2b04ba07c4a0
[r1i0n0:01321] [ 0] /lib64/libpthread.so.0 [0x2b04b00cfc00]
[r1i0n0:01321] [ 1] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/libmpi.so.0(ompi_free_list_grow+0x14a)
 
[0x2b04af7dc058]
[r1i0n0:01321] [ 2] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/openmpi/mca_btl_sm.so(mca_btl_sm_alloc+0x321)
 
[0x2b04b38c8e35]
[r1i0n0:01321] [ 3] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_start_copy+0x26d)
 
[0x2b04b3378f91]
[r1i0n0:01321] [ 4] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x546)
 
[0x2b04b3370c7e]
[r1i0n0:01321] [ 5] 
/usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/libmpi.so.0(MPI_Send+0x28)
 
[0x2b04af814098]

Here is some more information about the machine:

SGI Altix ICE 8200 cluster; each node has two quad core Xeons with 16GB
SUSE Linux Enterprise Server 10 Service Pack 2
GNU C Library stable release version 2.4 (20080421)
gcc (GCC) 4.1.2 20070115 (SUSE Linux)
SGI Propack 6 (just upgraded from Propack 5 SP3: changed from 
OFED 1.2 to 1.3)

The output from ompi_info is attached.

I would appreciate any help debugging this.

Thanks,
Allen

-- 
Allen Barnett
E-Mail: al...@transpireinc.com
Skype:  allenbarnett
Ph:     518-887-2930

Attachment: ompi_info.txt.bz2
Description: application/bzip

Reply via email to