Hi: A customer is running our parallel application on an SGI Altix machine. They compiled OMPI 1.2.8 themselves. The Altix uses IB interfaces and they recently upgraded to OFED 1.3 (in SGI Propack 6). They are receiving a bus error in ompi_free_list_grow:
[r1i0n0:01321] *** Process received signal *** [r1i0n0:01321] Signal: Bus error (7) [r1i0n0:01321] Signal code: (2) [r1i0n0:01321] Failing at address: 0x2b04ba07c4a0 [r1i0n0:01321] [ 0] /lib64/libpthread.so.0 [0x2b04b00cfc00] [r1i0n0:01321] [ 1] /usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/libmpi.so.0(ompi_free_list_grow+0x14a) [0x2b04af7dc058] [r1i0n0:01321] [ 2] /usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/openmpi/mca_btl_sm.so(mca_btl_sm_alloc+0x321) [0x2b04b38c8e35] [r1i0n0:01321] [ 3] /usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_start_copy+0x26d) [0x2b04b3378f91] [r1i0n0:01321] [ 4] /usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x546) [0x2b04b3370c7e] [r1i0n0:01321] [ 5] /usr/local/attila/severian-0.3.2-beta/lib/x86_64-Linux/libmpi.so.0(MPI_Send+0x28) [0x2b04af814098] Here is some more information about the machine: SGI Altix ICE 8200 cluster; each node has two quad core Xeons with 16GB SUSE Linux Enterprise Server 10 Service Pack 2 GNU C Library stable release version 2.4 (20080421) gcc (GCC) 4.1.2 20070115 (SUSE Linux) SGI Propack 6 (just upgraded from Propack 5 SP3: changed from OFED 1.2 to 1.3) The output from ompi_info is attached. I would appreciate any help debugging this. Thanks, Allen -- Allen Barnett E-Mail: al...@transpireinc.com Skype: allenbarnett Ph: 518-887-2930
ompi_info.txt.bz2
Description: application/bzip