Hi, I'm sorry once more to answer late, but the last two days our mail server was down (hardware error).
> Did you configure this --enable-debug? Yes, I used the following command. ../openmpi-1.8.2rc3/configure --prefix=/usr/local/openmpi-1.8.2_64_gcc \ --libdir=/usr/local/openmpi-1.8.2_64_gcc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ --with-jdk-headers=/usr/local/jdk1.8.0/include \ JAVA_HOME=/usr/local/jdk1.8.0 \ LDFLAGS="-m64 -L/usr/local/gcc-4.9.0/lib/amd64" \ CC="gcc" CXX="g++" FC="gfortran" \ CFLAGS="-m64" CXXFLAGS="-m64" FCFLAGS="-m64" \ CPP="cpp" CXXCPP="cpp" \ CPPFLAGS="" CXXCPPFLAGS="" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --enable-heterogeneous \ --enable-mpi-thread-multiple \ --with-threads=posix \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-std=c11 -m64" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc > If so, you should get a line number in the backtrace I got them for gdb (see below), but not for "dbx". Kind regards Siegmar > > > On Aug 5, 2014, at 2:59 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi, > > > > I'm sorry to answer so late, but last week I didn't have Internet > > access. In the meantime I've installed openmpi-1.8.2rc3 and I get > > the same error. > > > >> This looks like the typical type of alignment error that we used > >> to see when testing regularly on SPARC. :-\ > >> > >> It looks like the error was happening in mca_db_hash.so. Could > >> you get a stack trace / file+line number where it was failing > >> in mca_db_hash? (i.e., the actual bad code will likely be under > >> opal/mca/db/hash somewhere) > > > > Unfortunately I don't get a file+line number from a file in > > opal/mca/db/Hash. > > > > > > > > tyr small_prog 102 ompi_info | grep MPI: > > Open MPI: 1.8.2rc3 > > tyr small_prog 103 which mpicc > > /usr/local/openmpi-1.8.2_64_gcc/bin/mpicc > > tyr small_prog 104 mpicc init_finalize.c > > tyr small_prog 106 /opt/solstudio12.3/bin/sparcv9/dbx /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec > > For information about new features see `help changes' > > To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc > > Reading mpiexec > > Reading ld.so.1 > > Reading libopen-rte.so.7.0.4 > > Reading libopen-pal.so.6.2.0 > > Reading libsendfile.so.1 > > Reading libpicl.so.1 > > Reading libkstat.so.1 > > Reading liblgrp.so.1 > > Reading libsocket.so.1 > > Reading libnsl.so.1 > > Reading libgcc_s.so.1 > > Reading librt.so.1 > > Reading libm.so.2 > > Reading libpthread.so.1 > > Reading libc.so.1 > > Reading libdoor.so.1 > > Reading libaio.so.1 > > Reading libmd.so.1 > > (dbx) check -all > > access checking - ON > > memuse checking - ON > > (dbx) run -np 1 a.outRunning: mpiexec -np 1 a.out > > (process id 27833) > > Reading rtcapihook.so > > Reading libdl.so.1 > > Reading rtcaudit.so > > Reading libmapmalloc.so.1 > > Reading libgen.so.1 > > Reading libc_psr.so.1 > > Reading rtcboot.so > > Reading librtc.so > > Reading libmd_psr.so.1 > > RTC: Enabling Error Checking... > > RTC: Running program... > > Write to unallocated (wua) on thread 1: > > Attempting to write 1 byte at address 0xffffffff79f04000 > > t@1 (l@1) stopped in _readdir at 0xffffffff55174da0 > > 0xffffffff55174da0: _readdir+0x0064: call _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80 > > (dbx) where > > current thread: t@1 > > =>[1] _readdir(0xffffffff79f00300, 0x2e6800, 0x4, 0x2d, 0x4, 0xffffffff79f00300), at 0xffffffff55174da0 > > [2] list_files_by_dir(0x100138fd8, 0xffffffff7fffd1f0, 0xffffffff7fffd1e8, 0xffffffff7fffd210, 0x0, 0xffffffff702a0010), at > > 0xffffffff63174594 > > [3] foreachfile_callback(0x100138fd8, 0xffffffff7fffd458, 0x0, 0x2e, 0x0, 0xffffffff702a0010), at 0xffffffff6317461c > > [4] foreach_dirinpath(0x1001d8a28, 0x0, 0xffffffff631745e0, 0xffffffff7fffd458, 0x0, 0xffffffff702a0010), at 0xffffffff63171684 > > [5] lt_dlforeachfile(0x1001d8a28, 0xffffffff6319656c, 0x0, 0x53, 0x2f, 0xf), at 0xffffffff63174748 > > [6] find_dyn_components(0x0, 0xffffffff6323b570, 0x0, 0x1, 0xffffffff7fffd6a0, 0xffffffff702a0010), at 0xffffffff63195e38 > > [7] mca_base_component_find(0x0, 0xffffffff6323b570, 0xffffffff6335e1b0, 0x0, 0xffffffff7fffd6a0, 0x1), at 0xffffffff631954d8 > > [8] mca_base_framework_components_register(0xffffffff6335e1c0, 0x0, 0x3e, 0x0, 0x3b, 0x100800), at 0xffffffff631b1638 > > [9] mca_base_framework_register(0xffffffff6335e1c0, 0x0, 0x2, 0xffffffff7fffd8d0, 0x0, 0xffffffff702a0010), at 0xffffffff631b24d4 > > [10] mca_base_framework_open(0xffffffff6335e1c0, 0x0, 0x2, 0xffffffff7fffd990, 0x0, 0xffffffff702a0010), at 0xffffffff631b25d0 > > [11] opal_init(0xffffffff7fffdd70, 0xffffffff7fffdd78, 0x100117c60, 0xffffffff7fffde58, 0x400, 0x100117c60), at > > 0xffffffff63153694 > > [12] orterun(0x4, 0xffffffff7fffde58, 0x2, 0xffffffff7fffdda0, 0x0, 0xffffffff702a0010), at 0x100005078 > > [13] main(0x4, 0xffffffff7fffde58, 0xffffffff7fffde80, 0x100117c60, 0x100000000, 0xffffffff6a700200), at 0x100003d68 > > (dbx) > > > > > > > > I get the following output with gdb. > > > > tyr small_prog 107 /usr/local/gdb-7.6.1_64_gcc/bin/gdb /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec > > GNU gdb (GDB) 7.6.1 > > Copyright (C) 2013 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > and "show warranty" for details. > > This GDB was configured as "sparc-sun-solaris2.10". > > For bug reporting instructions, please see: > > <http://www.gnu.org/software/gdb/bugs/>... > > Reading symbols from /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/bin/orterun...done. > > (gdb) run -np 1 a.out > > Starting program: /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec -np 1 a.out > > [Thread debugging using libthread_db enabled] > > [New Thread 1 (LWP 1)] > > [New LWP 2 ] > > [tyr:27867] *** Process received signal *** > > [tyr:27867] Signal: Bus Error (10) > > [tyr:27867] Signal code: Invalid address alignment (1) > > [tyr:27867] Failing at address: ffffffff7fffd224 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b acktrace_print+0x2c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfa 0 > > /lib/sparcv9/libc.so.1:0xd8b98 > > /lib/sparcv9/libc.so.1:0xcc70c > > /lib/sparcv9/libc.so.1:0xcc918 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e e8 [ Signal 10 (BUS)] > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d b_base_store+0xc8 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u til_decode_pidmap+0x798 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u til_nidmap_init+0x3cc > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22 6c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i nit+0x308 > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in it+0x31c > > /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0 x2a8 > > /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:main+0x20 > > /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:_start+0x7c > > [tyr:27867] *** End of error message *** > > -------------------------------------------------------------------------- > > mpiexec noticed that process rank 0 with PID 27867 on node tyr exited on signal 10 (Bus Error). > > -------------------------------------------------------------------------- > > [LWP 2 exited] > > [New Thread 2 ] > > [Switching to Thread 1 (LWP 1)] > > sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy query > > (gdb) bt > > #0 0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1 > > #1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1 > > #2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1 > > #3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1 > > #4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1 > > #5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1 > > #6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1 > > #7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1 > > #8 0xffffffff7ec7746c in vm_close () > > from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6 > > #9 0xffffffff7ec74a4c in lt_dlclose () > > from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6 > > #10 0xffffffff7ec99b70 in ri_destructor (obj=0x1001ead30) > > at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:391 > > #11 0xffffffff7ec98488 in opal_obj_run_destructors (object=0x1001ead30) > > at ../../../../openmpi-1.8.2rc3/opal/class/opal_object.h:446 > > #12 0xffffffff7ec993ec in mca_base_component_repository_release ( > > component=0xffffffff7b023cf0 <mca_oob_tcp_component>) > > at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:244 > > #13 0xffffffff7ec9b734 in mca_base_component_unload ( > > component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1) > > at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:47 > > #14 0xffffffff7ec9b7c8 in mca_base_component_close ( > > component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1) > > at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:60 > > #15 0xffffffff7ec9b89c in mca_base_components_close (output_id=-1, > > components=0xffffffff7f12b430 <orte_oob_base_framework+80>, skip=0x0) > > ---Type <return> to continue, or q <return> to quit--- > > at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:86 > > #16 0xffffffff7ec9b804 in mca_base_framework_components_close ( > > framework=0xffffffff7f12b3e0 <orte_oob_base_framework>, skip=0x0) > > at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:66 > > #17 0xffffffff7efae1e4 in orte_oob_base_close () > > at ../../../../openmpi-1.8.2rc3/orte/mca/oob/base/oob_base_frame.c:94 > > #18 0xffffffff7ecb28ac in mca_base_framework_close ( > > framework=0xffffffff7f12b3e0 <orte_oob_base_framework>) > > at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_framework.c:187 > > #19 0xffffffff7bf078c0 in rte_finalize () > > at ../../../../../openmpi-1.8.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:858 > > #20 0xffffffff7ef30a44 in orte_finalize () > > at ../../openmpi-1.8.2rc3/orte/runtime/orte_finalize.c:65 > > #21 0x00000001000070c4 in orterun (argc=4, argv=0xffffffff7fffe0e8) > > at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/orterun.c:1096 > > #22 0x0000000100003d70 in main (argc=4, argv=0xffffffff7fffe0e8) > > at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/main.c:13 > > (gdb) > > > > > > Is the above information helpful to track down the error? Do you need > > anything else? Thank you very much for any help in advance. > > > > > > Kind regards > > > > Siegmar > > > > > > > > > >> On Jul 25, 2014, at 2:08 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > >> > >>> Hi, > >>> > >>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris > >>> 10 Sparc and I receive a bus error, if I run a small program. > >>> > >>> tyr hello_1 105 mpiexec -np 2 a.out > >>> [tyr:29164] *** Process received signal *** > >>> [tyr:29164] Signal: Bus Error (10) > >>> [tyr:29164] Signal code: Invalid address alignment (1) > >>> [tyr:29164] Failing at address: ffffffff7fffd1c4 > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b acktrace_print+0x2c > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfd 0 > >>> /lib/sparcv9/libc.so.1:0xd8b98 > >>> /lib/sparcv9/libc.so.1:0xcc70c > >>> /lib/sparcv9/libc.so.1:0xcc918 > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e e8 [ Signal 10 (BUS)] > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d b_base_store+0xc8 > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u til_decode_pidmap+0x798 > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u til_nidmap_init+0x3cc > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22 6c > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i nit+0x308 > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in it+0x31c > >>> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0 x2a8 > >>> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:main+0x20 > >>> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:_start+0x7c > >>> [tyr:29164] *** End of error message *** > >>> ... > >>> > >>> > >>> I get the following output if I run the program in "dbx". > >>> > >>> ... > >>> RTC: Enabling Error Checking... > >>> RTC: Running program... > >>> Write to unallocated (wua) on thread 1: > >>> Attempting to write 1 byte at address 0xffffffff79f04000 > >>> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0 > >>> 0xffffffff55174da0: _readdir+0x0064: call _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80 > >>> (dbx) > >>> > >>> > >>> Hopefully the above output helps to fix the error. Can I provide > >>> anything else? Thank you very much for any help in advance. > >>> > >>> > >>> Kind regards > >>> > >>> Siegmar > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> Link to this post: http://www.open-mpi.org/community/lists/users/2014/07/24869.php > >> > >> > >> -- > >> Jeff Squyres > >> jsquy...@cisco.com > >> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ > >> > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/24909.php > >