I've dug a little deeper and thing the problem has something to do with 10MB 
sized /tmp filesystem.

[bloscel@k1n11 ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
compute_x86_64         32G  1.1G   31G   4% /
tmpfs                  32G     0   32G   0% /dev/shm
tmpfs                  10M   80K   10M   1% /tmp
tmpfs                  10M     0   10M   0% /var/tmp
/dev/lb                53T  109G   53T   1% /gpfs/lb
/dev/sb               3.3T   38G  3.3T   2% /gpfs/sb

[bloscel@k1n11 ~]$ mktemp
/tmp/tmp.L8owhNH1AN

[bloscel@k1n11 ~]$ ompi_info -a | grep /dev/shm
               MCA shmem: parameter "shmem_mmap_backing_file_base_dir" (current 
value: </dev/shm>, data source: default value)

[bloscel@k1n11 ~]$ ompi_info -a | grep orte_tmpdir_base
                MCA orte: parameter "orte_tmpdir_base" (current value: <none>, 
data source: default value)
[bloscel@k1n11 ~]$

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Wednesday, June 05, 2013 11:14 AM
To: Open MPI Users (us...@open-mpi.org)
Subject: EXTERNAL: [OMPI users] How to diagnose bus error with 1.6.4

I am running into a bus error that does not happen with MVAPICH, and I am 
guessing it has something to do with shared-memory communication.  Has anyone 
had a similar experience or have any insights on what this could be?

Thanks

[k1n08:12688] mca: base: components_open: Looking for shmem components
[k1n08:12688] mca: base: components_open: opening shmem components
[k1n08:12688] mca: base: components_open: found loaded component mmap
[k1n08:12688] mca: base: components_open: component mmap register function 
successful
[k1n08:12688] mca: base: components_open: component mmap open function 
successful
[k1n08:12688] mca: base: components_open: found loaded component posix
[k1n08:12688] mca: base: components_open: component posix has no register 
function
[k1n08:12688] mca: base: components_open: component posix open function 
successful
[k1n08:12688] mca: base: components_open: found loaded component sysv
[k1n08:12688] mca: base: components_open: component sysv has no register 
function
[k1n08:12688] mca: base: components_open: component sysv open function 
successful
[k1n08:12688] shmem: base: runtime_query: Auto-selecting shmem components
[k1n08:12688] shmem: base: runtime_query: (shmem) Querying component (run-time) 
[mmap]
[k1n08:12688] shmem: base: runtime_query: (shmem) Query of component [mmap] set 
priority to 50
[k1n08:12688] shmem: base: runtime_query: (shmem) Querying component (run-time) 
[posix]
[k1n08:12688] shmem: base: runtime_query: (shmem) Skipping component [posix]. 
Run-time Query failed to return a module
[k1n08:12688] shmem: base: runtime_query: (shmem) Querying component (run-time) 
[sysv]
[k1n08:12688] shmem: base: runtime_query: (shmem) Skipping component [sysv]. 
Run-time Query failed to return a module
[k1n08:12688] shmem: base: runtime_query: (shmem) Selected component [mmap]
[k1n08:12688] mca: base: close: unloading component posix
[k1n08:12688] mca: base: close: unloading component sysv
[k1n08:12688] *** Process received signal ***
[k1n08:12688] Signal: Bus error (7)
[k1n08:12688] Signal code: Non-existant physical address (2)
[k1n08:12688] Failing at address: 0x2ac1e088e030
[k1n08:12688] [ 0] /lib64/libpthread.so.0(+0xf500) [0x2ac1de7c0500]
[k1n08:12688] [ 1] 
/applocal/cfd/test/bin/test_openmpi(__intel_ssse3_rep_memcpy+0xcdb) [0x1495cab]
[k1n08:12688] [ 2] 
/applocal/cfd/test/bin/test_openmpi(opal_convertor_pack+0x101) [0x125c111]
[k1n08:12688] [ 3] 
/applocal/cfd/test/bin/test_openmpi(mca_btl_sm_prepare_src+0xc5) [0x13aab25]
[k1n08:12688] [ 4] 
/applocal/cfd/test/bin/test_openmpi(mca_pml_ob1_send_request_start_rndv+0x67) 
[0x12fa9a7]
[k1n08:12688] [ 5] /applocal/cfd/test/bin/test_openmpi(mca_pml_ob1_isend+0x3ab) 
[0x12ef02b]
[k1n08:12688] [ 6] 
/applocal/cfd/test/bin/test_openmpi(ompi_coll_tuned_sendrecv_actual+0x94) 
[0x12d67f4]
[k1n08:12688] [ 7] 
/applocal/cfd/test/bin/test_openmpi(ompi_coll_tuned_bcast_intra_split_bintree+0x94d)
 [0x12d45fd]
[k1n08:12688] [ 8] 
/applocal/cfd/test/bin/test_openmpi(ompi_coll_tuned_bcast_intra_dec_fixed+0x143)
 [0x12d5dd3]
[k1n08:12688] [ 9] 
/applocal/cfd/test/bin/test_openmpi(mca_coll_sync_bcast+0x66) [0x12d6aa6]
[k1n08:12688] [10] /applocal/cfd/test/bin/test_openmpi(MPI_Bcast+0x5a) 
[0x11f95da]
[k1n08:12688] [11] /applocal/cfd/test/bin/test_openmpi(mpi_bcast_f+0x6e) 
[0x11dca5e]
[k1n08:12688] [12] 
/applocal/cfd/test/bin/test_openmpi(wpf_calc_mod_mp_wpf_calc_+0x10f0) [0x541be0]
[k1n08:12688] [13] 
/applocal/cfd/test/bin/test_openmpi(special_init_mod_mp_special_init_geom_+0x3f4)
 [0x683254]
[k1n08:12688] [14] 
/applocal/cfd/test/bin/test_openmpi(setup_mod_mp_setup_domains_+0x56b) 
[0x53effb]
[k1n08:12688] [15] /applocal/cfd/test/bin/test_openmpi(MAIN__+0x1ab7) [0x5e8be7]
[k1n08:12688] [16] /applocal/cfd/test/bin/test_openmpi(main+0x3c) [0x4ff82c]
[k1n08:12688] [17] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2ac1de9eccdd]
[k1n08:12688] [18] /applocal/cfd/test/bin/test_openmpi() [0x4ff729]
[k1n08:12688] *** End of error message ***

Reply via email to