No :-( I need some extra work to stop declaring orte_process_name_t and ompi_process_name_t variables. #249 will make things much easier. One option is to use opal_process_name_t everywhere or typedef orte and ompi types to the opal one. An other (lightweight but error prone imho) is to change variable declaration only. Any thought ?
Ralph Castain <r...@open-mpi.org> wrote: >Will PR#249 solve it? If so, we should just go with it as I suspect that is >the long-term solution. > >> On Oct 26, 2014, at 4:25 PM, Gilles Gouaillardet >> <gilles.gouaillar...@gmail.com> wrote: >> >> It looks like we faced a similar issue : >> opal_process_name_t is 64 bits aligned wheteas orte_process_name_t is 32 >> bits aligned. If you run an alignment sensitive cpu such as sparc and you >> are not lucky (so to speak) you can run into this issue. >> i will make a patch for this shortly >> >> Ralph Castain <r...@open-mpi.org> wrote: >>> Afraid this must be something about the Sparc - just ran on a Solaris 11 >>> x86 box and everything works fine. >>> >>> >>>> On Oct 26, 2014, at 8:22 AM, Siegmar Gross >>>> <siegmar.gr...@informatik.hs-fulda.de> wrote: >>>> >>>> Hi Gilles, >>>> >>>> I wanted to explore which function is called, when I call MPI_Init >>>> in a C program, because this function should be called from a Java >>>> program as well. Unfortunately C programs break with a Bus Error >>>> once more for openmpi-dev-124-g91e9686 on Solaris. I assume that's >>>> the reason why I get no useful backtrace for my Java program. >>>> >>>> tyr small_prog 117 mpicc -o init_finalize init_finalize.c >>>> tyr small_prog 118 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec >>>> ... >>>> (gdb) run -np 1 init_finalize >>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 >>>> init_finalize >>>> [Thread debugging using libthread_db enabled] >>>> [New Thread 1 (LWP 1)] >>>> [New LWP 2 ] >>>> [tyr:19240] *** Process received signal *** >>>> [tyr:19240] Signal: Bus Error (10) >>>> [tyr:19240] Signal code: Invalid address alignment (1) >>>> [tyr:19240] Failing at address: ffffffff7bd1c10c >>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_backtrace_print+0x2c >>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:0xdcc04 >>>> /lib/sparcv9/libc.so.1:0xd8b98 >>>> /lib/sparcv9/libc.so.1:0xcc70c >>>> /lib/sparcv9/libc.so.1:0xcc918 >>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-pal.so.0.0.0:opal_proc_set_name+0x1c >>>> [ Signal 10 (BUS)] >>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_pmix_native.so:0x103e8 >>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/openmpi/mca_ess_pmi.so:0x33dc >>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libopen-rte.so.0.0.0:orte_init+0x67c >>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:ompi_mpi_init+0x374 >>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi.so.0.0.0:PMPI_Init+0x2a8 >>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:main+0x20 >>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/init_finalize:_start+0x7c >>>> [tyr:19240] *** End of error message *** >>>> -------------------------------------------------------------------------- >>>> mpiexec noticed that process rank 0 with PID 0 on node tyr exited on >>>> signal 10 (Bus Error). >>>> -------------------------------------------------------------------------- >>>> [LWP 2 exited] >>>> [New Thread 2 ] >>>> [Switching to Thread 1 (LWP 1)] >>>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to >>>> satisfy query >>>> (gdb) bt >>>> #0 0xffffffff7f6173d0 in rtld_db_dlactivity () from >>>> /usr/lib/sparcv9/ld.so.1 >>>> #1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1 >>>> #2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1 >>>> #3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1 >>>> #4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1 >>>> #5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1 >>>> #6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1 >>>> #7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1 >>>> #8 0xffffffff7ec87f60 in vm_close (loader_data=0x0, >>>> module=0xffffffff7c901fe0) >>>> at ../../../openmpi-dev-124-g91e9686/opal/libltdl/loaders/dlopen.c:212 >>>> #9 0xffffffff7ec85534 in lt_dlclose (handle=0x100189b50) >>>> at ../../../openmpi-dev-124-g91e9686/opal/libltdl/ltdl.c:1982 >>>> #10 0xffffffff7ecaabd4 in ri_destructor (obj=0x1001893a0) >>>> at >>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_component_repository.c:382 >>>> #11 0xffffffff7eca9504 in opal_obj_run_destructors (object=0x1001893a0) >>>> at ../../../../openmpi-dev-124-g91e9686/opal/class/opal_object.h:446 >>>> #12 0xffffffff7ecaa474 in mca_base_component_repository_release ( >>>> component=0xffffffff7b1236f0 <mca_oob_tcp_component>) >>>> at >>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_component_repository.c:240 >>>> #13 0xffffffff7ecac774 in mca_base_component_unload ( >>>> component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1) >>>> at >>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:47 >>>> #14 0xffffffff7ecac808 in mca_base_component_close ( >>>> component=0xffffffff7b1236f0 <mca_oob_tcp_component>, output_id=-1) >>>> at >>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:60 >>>> #15 0xffffffff7ecac8dc in mca_base_components_close (output_id=-1, >>>> components=0xffffffff7f14ba58 <orte_oob_base_framework+80>, skip=0x0) >>>> at >>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:86 >>>> #16 0xffffffff7ecac844 in mca_base_framework_components_close ( >>>> framework=0xffffffff7f14ba08 <orte_oob_base_framework>, skip=0x0) >>>> at >>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_components_close.c:66 >>>> #17 0xffffffff7efcaf58 in orte_oob_base_close () >>>> at >>>> ../../../../openmpi-dev-124-g91e9686/orte/mca/oob/base/oob_base_frame.c:112 >>>> #18 0xffffffff7ecc136c in mca_base_framework_close ( >>>> framework=0xffffffff7f14ba08 <orte_oob_base_framework>) >>>> at >>>> ../../../../openmpi-dev-124-g91e9686/opal/mca/base/mca_base_framework.c:187 >>>> #19 0xffffffff7be07858 in rte_finalize () >>>> at >>>> ../../../../../openmpi-dev-124-g91e9686/orte/mca/ess/hnp/ess_hnp_module.c:857 >>>> #20 0xffffffff7ef338a4 in orte_finalize () >>>> at ../../openmpi-dev-124-g91e9686/orte/runtime/orte_finalize.c:66 >>>> #21 0x000000010000723c in orterun (argc=4, argv=0xffffffff7fffe0b8) >>>> at ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/orterun.c:1103 >>>> #22 0x0000000100003e80 in main (argc=4, argv=0xffffffff7fffe0b8) >>>> at ../../../../openmpi-dev-124-g91e9686/orte/tools/orterun/main.c:13 >>>> (gdb) >>>> >>>> Kind regards >>>> >>>> Siegmar >>>> >>>> >>>> >>>>> thank you very much for the quick tutorial. Unfortunately I still >>>>> can't get a backtrace. >>>>> >>>>>> You might need to configure with --enable-debug and add -g -O0 >>>>>> to your CFLAGS and LDFLAGS >>>>>> >>>>>> Then once you attach with gdb, you have to find the thread that is >>>>>> polling : >>>>>> thread 1 >>>>>> bt >>>>>> thread 2 >>>>>> bt >>>>>> and so on until you find the good thread >>>>>> If _dbg is a local variable, you need to select the right frame >>>>>> before you can change the value : >>>>>> get the frame number from bt (generally 1 under linux) >>>>>> f <frame number> >>>>>> set _dbg=0 >>>>>> >>>>>> I hope this helps >>>>> >>>>> "--enable-debug" is one of my default options. Now I used the >>>>> following command to configure Open MPI. I always start the >>>>> build process in an empty directory and I always remove >>>>> /usr/local/openmpi-1.9.0_64_gcc, before I install a new version. >>>>> >>>>> tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 112 head config.log \ >>>>> | grep openmpi >>>>> $ ../openmpi-dev-124-g91e9686/configure >>>>> --prefix=/usr/local/openmpi-1.9.0_64_gcc >>>>> --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64 >>>>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin >>>>> --with-jdk-headers=/usr/local/jdk1.8.0/include >>>>> JAVA_HOME=/usr/local/jdk1.8.0 >>>>> LDFLAGS=-m64 -g -O0 CC=gcc CXX=g++ FC=gfortran >>>>> CFLAGS=-m64 -D_REENTRANT -g -O0 >>>>> CXXFLAGS=-m64 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp >>>>> CPPFLAGS=-D_REENTRANT CXXCPPFLAGS= >>>>> --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java >>>>> --enable-heterogeneous --enable-mpi-thread-multiple >>>>> --with-threads=posix --with-hwloc=internal --without-verbs >>>>> --with-wrapper-cflags=-std=c11 -m64 --enable-debug >>>>> tyr openmpi-dev-124-g91e9686-SunOS.sparc.64_gcc 113 >>>>> >>>>> >>>>> "gbd" doesn't allow any backtrace for any thread. >>>>> >>>>> tyr java 124 /usr/local/gdb-7.6.1_64_gcc/bin/gdb >>>>> GNU gdb (GDB) 7.6.1 >>>>> ... >>>>> (gdb) attach 18876 >>>>> Attaching to process 18876 >>>>> [New process 18876] >>>>> Retry #1: >>>>> Retry #2: >>>>> Retry #3: >>>>> Retry #4: >>>>> 0x7eadcb04 in ?? () >>>>> (gdb) info threads >>>>> [New LWP 12] >>>>> [New LWP 11] >>>>> [New LWP 10] >>>>> [New LWP 9] >>>>> [New LWP 8] >>>>> [New LWP 7] >>>>> [New LWP 6] >>>>> [New LWP 5] >>>>> [New LWP 4] >>>>> [New LWP 3] >>>>> [New LWP 2] >>>>> Id Target Id Frame >>>>> 12 LWP 2 0x7eadc6b0 in ?? () >>>>> 11 LWP 3 0x7eadcbb8 in ?? () >>>>> 10 LWP 4 0x7eadcbb8 in ?? () >>>>> 9 LWP 5 0x7eadcbb8 in ?? () >>>>> 8 LWP 6 0x7eadcbb8 in ?? () >>>>> 7 LWP 7 0x7eadcbb8 in ?? () >>>>> 6 LWP 8 0x7ead8b0c in ?? () >>>>> 5 LWP 9 0x7eadcbb8 in ?? () >>>>> 4 LWP 10 0x7eadcbb8 in ?? () >>>>> 3 LWP 11 0x7eadcbb8 in ?? () >>>>> 2 LWP 12 0x7eadcbb8 in ?? () >>>>> * 1 LWP 1 0x7eadcb04 in ?? () >>>>> (gdb) thread 1 >>>>> [Switching to thread 1 (LWP 1)] >>>>> #0 0x7eadcb04 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcb04 in ?? () >>>>> #1 0x7eaca12c in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 2 >>>>> [Switching to thread 2 (LWP 12)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcbb8 in ?? () >>>>> #1 0x7eac2638 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 3 >>>>> [Switching to thread 3 (LWP 11)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcbb8 in ?? () >>>>> #1 0x7eac25a8 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 4 >>>>> [Switching to thread 4 (LWP 10)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcbb8 in ?? () >>>>> #1 0x7eac2638 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 5 >>>>> [Switching to thread 5 (LWP 9)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcbb8 in ?? () >>>>> #1 0x7eac2638 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 6 >>>>> [Switching to thread 6 (LWP 8)] >>>>> #0 0x7ead8b0c in ?? () >>>>> (gdb) bt >>>>> #0 0x7ead8b0c in ?? () >>>>> #1 0x7eacbcb0 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 7 >>>>> [Switching to thread 7 (LWP 7)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcbb8 in ?? () >>>>> #1 0x7eac25a8 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 8 >>>>> [Switching to thread 8 (LWP 6)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcbb8 in ?? () >>>>> #1 0x7eac25a8 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 9 >>>>> [Switching to thread 9 (LWP 5)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcbb8 in ?? () >>>>> #1 0x7eac2638 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 10 >>>>> [Switching to thread 10 (LWP 4)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcbb8 in ?? () >>>>> #1 0x7eac25a8 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 11 >>>>> [Switching to thread 11 (LWP 3)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) bt >>>>> #0 0x7eadcbb8 in ?? () >>>>> #1 0x7eac25a8 in ?? () >>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>>> (gdb) thread 12 >>>>> [Switching to thread 12 (LWP 2)] >>>>> #0 0x7eadc6b0 in ?? () >>>>> (gdb) >>>>> >>>>> >>>>> >>>>> I also tried to set _dbg in all available frames without success. >>>>> >>>>> (gdb) f 1 >>>>> #1 0x7eacb46c in ?? () >>>>> (gdb) set _dbg=0 >>>>> No symbol table is loaded. Use the "file" command. >>>>> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so >>>>> Reading symbols from >>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done. >>>>> (gdb) f 1 >>>>> #1 0x7eacb46c in ?? () >>>>> (gdb) set _dbg=0 >>>>> No symbol "_dbg" in current context. >>>>> (gdb) f 2 >>>>> #0 0x00000000 in ?? () >>>>> (gdb) set _dbg=0 >>>>> No symbol "_dbg" in current context. >>>>> (gdb) >>>>> ... >>>>> >>>>> >>>>> With "list" I get source code from mpi_CartComm.c and not from mpi_MPI.c. >>>>> If a switch threads, "list" continues in the old file. >>>>> >>>>> (gdb) thread 1 >>>>> [Switching to thread 1 (LWP 1)] >>>>> #0 0x7eadcb04 in ?? () >>>>> (gdb) list 36 >>>>> 31 distributed under the License is distributed on an "AS IS" >>>>> BASIS, >>>>> 32 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express >>>>> or implied. >>>>> 33 See the License for the specific language governing >>>>> permissions and >>>>> 34 limitations under the License. >>>>> 35 */ >>>>> 36 /* >>>>> 37 * File : mpi_CartComm.c >>>>> 38 * Headerfile : mpi_CartComm.h >>>>> 39 * Author : Sung-Hoon Ko, Xinying Li >>>>> 40 * Created : Thu Apr 9 12:22:15 1998 >>>>> (gdb) thread 2 >>>>> [Switching to thread 2 (LWP 12)] >>>>> #0 0x7eadcbb8 in ?? () >>>>> (gdb) list >>>>> 41 * Revision : $Revision: 1.6 $ >>>>> 42 * Updated : $Date: 2003/01/16 16:39:34 $ >>>>> 43 * Copyright: Northeast Parallel Architectures Center >>>>> 44 * at Syracuse University 1998 >>>>> 45 */ >>>>> 46 #include "ompi_config.h" >>>>> 47 >>>>> 48 #include <stdlib.h> >>>>> 49 #ifdef HAVE_TARGETCONDITIONALS_H >>>>> 50 #include <TargetConditionals.h> >>>>> (gdb) >>>>> >>>>> >>>>> Do you have any ideas, what's going wrong or if I must use a different >>>>> symbol table? >>>>> >>>>> >>>>> Kind regards >>>>> >>>>> Siegmar >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Gilles >>>>>> >>>>>> >>>>>> Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: >>>>>>> Hi Gilles, >>>>>>> >>>>>>> I changed _dbg to a static variable, so that it is visible in the >>>>>>> library, but unfortunately still not in the symbol table. >>>>>>> >>>>>>> >>>>>>> tyr java 419 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so | >>>>>>> grep -i _dbg >>>>>>> [271] | 1249644| 4|OBJT |LOCL |0 |18 |_dbg.14258 >>>>>>> tyr java 420 /usr/local/gdb-7.6.1_64_gcc/bin/gdb >>>>>>> GNU gdb (GDB) 7.6.1 >>>>>>> ... >>>>>>> (gdb) attach 13019 >>>>>>> Attaching to process 13019 >>>>>>> [New process 13019] >>>>>>> Retry #1: >>>>>>> Retry #2: >>>>>>> Retry #3: >>>>>>> Retry #4: >>>>>>> 0x7eadcb04 in ?? () >>>>>>> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so >>>>>>> Reading symbols from >>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done. >>>>>>> (gdb) set var _dbg.14258=0 >>>>>>> No symbol "_dbg" in current context. >>>>>>> (gdb) >>>>>>> >>>>>>> >>>>>>> Kind regards >>>>>>> >>>>>>> Siegmar >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> unfortunately I didn't get anything useful. It's probably my fault, >>>>>>>> because I'm still not very familiar with gdb or any other debugger. >>>>>>>> I did the following things. >>>>>>>> >>>>>>>> >>>>>>>> 1st window: >>>>>>>> ----------- >>>>>>>> >>>>>>>> tyr java 174 setenv OMPI_ATTACH 1 >>>>>>>> tyr java 175 mpijavac InitFinalizeMain.java >>>>>>>> warning: [path] bad path element >>>>>>>> "/usr/local/openmpi-1.9.0_64_gcc/lib64/shmem.jar": >>>>>>>> no such file or directory >>>>>>>> 1 warning >>>>>>>> tyr java 176 mpiexec -np 1 java InitFinalizeMain >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2nd window: >>>>>>>> ----------- >>>>>>>> >>>>>>>> tyr java 379 ps -aef | grep java >>>>>>>> noaccess 1345 1 0 May 22 ? 113:23 /usr/java/bin/java >>>>>>>> -server -Xmx128m >>>>> -XX:+UseParallelGC >>>>>>> -XX:ParallelGCThreads=4 >>>>>>>> fd1026 3661 10753 0 14:09:12 pts/14 0:00 mpiexec -np 1 java >>>>>>>> InitFinalizeMain >>>>>>>> fd1026 3677 13371 0 14:16:55 pts/2 0:00 grep java >>>>>>>> fd1026 3663 3661 0 14:09:12 pts/14 0:01 java -cp >>>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/java:/usr/local/jun >>>>>>>> tyr java 380 /usr/local/gdb-7.6.1_64_gcc/bin/gdb >>>>>>>> GNU gdb (GDB) 7.6.1 >>>>>>>> ... >>>>>>>> (gdb) attach 3663 >>>>>>>> Attaching to process 3663 >>>>>>>> [New process 3663] >>>>>>>> Retry #1: >>>>>>>> Retry #2: >>>>>>>> Retry #3: >>>>>>>> Retry #4: >>>>>>>> 0x7eadcb04 in ?? () >>>>>>>> (gdb) symbol-file /usr/local/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so >>>>>>>> Reading symbols from >>>>> /export2/prog/SunOS_sparc/openmpi-1.9.0_64_gcc/lib64/libmpi_java.so.0.0.0...done. >>>>>>>> (gdb) set var _dbg=0 >>>>>>>> No symbol "_dbg" in current context. >>>>>>>> (gdb) set var JNI_OnLoad::_dbg=0 >>>>>>>> No symbol "_dbg" in specified context. >>>>>>>> (gdb) set JNI_OnLoad::_dbg=0 >>>>>>>> No symbol "_dbg" in specified context. >>>>>>>> (gdb) info threads >>>>>>>> [New LWP 12] >>>>>>>> [New LWP 11] >>>>>>>> [New LWP 10] >>>>>>>> [New LWP 9] >>>>>>>> [New LWP 8] >>>>>>>> [New LWP 7] >>>>>>>> [New LWP 6] >>>>>>>> [New LWP 5] >>>>>>>> [New LWP 4] >>>>>>>> [New LWP 3] >>>>>>>> [New LWP 2] >>>>>>>> Id Target Id Frame >>>>>>>> 12 LWP 2 0x7eadc6b0 in ?? () >>>>>>>> 11 LWP 3 0x7eadcbb8 in ?? () >>>>>>>> 10 LWP 4 0x7eadcbb8 in ?? () >>>>>>>> 9 LWP 5 0x7eadcbb8 in ?? () >>>>>>>> 8 LWP 6 0x7eadcbb8 in ?? () >>>>>>>> 7 LWP 7 0x7eadcbb8 in ?? () >>>>>>>> 6 LWP 8 0x7ead8b0c in ?? () >>>>>>>> 5 LWP 9 0x7eadcbb8 in ?? () >>>>>>>> 4 LWP 10 0x7eadcbb8 in ?? () >>>>>>>> 3 LWP 11 0x7eadcbb8 in ?? () >>>>>>>> 2 LWP 12 0x7eadcbb8 in ?? () >>>>>>>> * 1 LWP 1 0x7eadcb04 in ?? () >>>>>>>> (gdb) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> It seems that "_dbg" is unknown and unavailable. >>>>>>>> >>>>>>>> tyr java 399 grep _dbg >>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/* >>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c: >>>>>>>> volatile >>>>> int _dbg = 1; >>>>>>>> /export2/src/openmpi-1.9/openmpi-dev-124-g91e9686/ompi/mpi/java/c/mpi_MPI.c: >>>>>>>> while >>>>> (_dbg) poll(NULL, 0, 1); >>>>>>>> tyr java 400 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i >>>>>>>> _dbg >>>>>>>> tyr java 401 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i >>>>>>>> JNI_OnLoad >>>>>>>> [1057] | 139688| 444|FUNC |GLOB |0 >>>>>>>> |11 |JNI_OnLoad >>>>>>>> tyr java 402 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> How can I set _dbg to zero to continue mpiexec? I also tried to >>>>>>>> set a breakpoint for function JNI_OnLoad, but it seems, that the >>>>>>>> function isn't called before SIGSEGV. >>>>>>>> >>>>>>>> >>>>>>>> tyr java 177 unsetenv OMPI_ATTACH >>>>>>>> tyr java 178 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec >>>>>>>> GNU gdb (GDB) 7.6.1 >>>>>>>> ... >>>>>>>> (gdb) b mpi_MPI.c:JNI_OnLoad >>>>>>>> No source file named mpi_MPI.c. >>>>>>>> Make breakpoint pending on future shared library load? (y or [n]) y >>>>>>>> >>>>>>>> Breakpoint 1 (mpi_MPI.c:JNI_OnLoad) pending. >>>>>>>> (gdb) run -np 1 java InitFinalizeMain >>>>>>>> Starting program: /usr/local/openmpi-1.9.0_64_gcc/bin/mpiexec -np 1 >>>>>>>> java InitFinalizeMain >>>>>>>> [Thread debugging using libthread_db enabled] >>>>>>>> [New Thread 1 (LWP 1)] >>>>>>>> [New LWP 2 ] >>>>>>>> # >>>>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>>>> # >>>>>>>> # SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=3518, tid=2 >>>>>>>> ... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> tyr java 381 cat InitFinalizeMain.java >>>>>>>> import mpi.*; >>>>>>>> >>>>>>>> public class InitFinalizeMain >>>>>>>> { >>>>>>>> public static void main (String args[]) throws MPIException >>>>>>>> { >>>>>>>> MPI.Init (args); >>>>>>>> System.out.print ("Hello!\n"); >>>>>>>> MPI.Finalize (); >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> SIGSEGV happens in MPI.Init(args), because I can print a message >>>>>>>> before I call the method. >>>>>>>> >>>>>>>> tyr java 192 unsetenv OMPI_ATTACH >>>>>>>> tyr java 193 mpijavac InitFinalizeMain.java >>>>>>>> tyr java 194 mpiexec -np 1 java InitFinalizeMain >>>>>>>> Before MPI.Init() >>>>>>>> # >>>>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>>>> # >>>>>>>> # SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=3697, tid=2 >>>>>>>> ... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Any ideas, how I can continue? I couldn't find a C function for >>>>>>>> MPI.Init() in a C file. Do you know, which function is called first, >>>>>>>> so that I can set a breakpoint? By the way, I get the same error >>>>>>>> for Solaris 10 x86_64. >>>>>>>> >>>>>>>> tyr java 388 ssh sunpc1 >>>>>>>> ... >>>>>>>> sunpc1 java 106 mpijavac InitFinalizeMain.java >>>>>>>> sunpc1 java 107 uname -a >>>>>>>> SunOS sunpc1 5.10 Generic_147441-21 i86pc i386 i86pc Solaris >>>>>>>> sunpc1 java 108 isainfo -k >>>>>>>> amd64 >>>>>>>> sunpc1 java 109 mpiexec -np 1 java InitFinalizeMain >>>>>>>> # >>>>>>>> # A fatal error has been detected by the Java Runtime Environment: >>>>>>>> # >>>>>>>> # SIGSEGV (0xb) at pc=0xfffffd7fff1d77f0, pid=20256, tid=2 >>>>>>>> >>>>>>>> >>>>>>>> Thank you very much for any help in advance. >>>>>>>> >>>>>>>> Kind regards >>>>>>>> >>>>>>>> Siegmar >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> thank you very much for your help. >>>>>>>>> >>>>>>>>>> how did you configure openmpi ? which java version did you use ? >>>>>>>>>> >>>>>>>>>> i just found a regression and you currently have to explicitly add >>>>>>>>>> CFLAGS=-D_REENTRANT CPPFLAGS=-D_REENTRANT >>>>>>>>>> to your configure command line >>>>>>>>> >>>>>>>>> I added "-D_REENTRANT" to my command. >>>>>>>>> >>>>>>>>> ../openmpi-dev-124-g91e9686/configure >>>>>>>>> --prefix=/usr/local/openmpi-1.9.0_64_gcc \ >>>>>>>>> --libdir=/usr/local/openmpi-1.9.0_64_gcc/lib64 \ >>>>>>>>> --with-jdk-bindir=/usr/local/jdk1.8.0/bin \ >>>>>>>>> --with-jdk-headers=/usr/local/jdk1.8.0/include \ >>>>>>>>> JAVA_HOME=/usr/local/jdk1.8.0 \ >>>>>>>>> LDFLAGS="-m64" CC="gcc" CXX="g++" FC="gfortran" \ >>>>>>>>> CFLAGS="-m64 -D_REENTRANT" CXXFLAGS="-m64" FCFLAGS="-m64" \ >>>>>>>>> CPP="cpp" CXXCPP="cpp" \ >>>>>>>>> CPPFLAGS="-D_REENTRANT" CXXCPPFLAGS="" \ >>>>>>>>> --enable-mpi-cxx \ >>>>>>>>> --enable-cxx-exceptions \ >>>>>>>>> --enable-mpi-java \ >>>>>>>>> --enable-heterogeneous \ >>>>>>>>> --enable-mpi-thread-multiple \ >>>>>>>>> --with-threads=posix \ >>>>>>>>> --with-hwloc=internal \ >>>>>>>>> --without-verbs \ >>>>>>>>> --with-wrapper-cflags="-std=c11 -m64" \ >>>>>>>>> --enable-debug \ >>>>>>>>> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc >>>>>>>>> >>>>>>>>> I use Java 8. >>>>>>>>> >>>>>>>>> tyr openmpi-1.9 112 java -version >>>>>>>>> java version "1.8.0" >>>>>>>>> Java(TM) SE Runtime Environment (build 1.8.0-b132) >>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode) >>>>>>>>> tyr openmpi-1.9 113 >>>>>>>>> >>>>>>>>> Unfortunately I still get a SIGSEGV with openmpi-dev-124-g91e9686. >>>>>>>>> I have applied your patch and will try to debug my small Java >>>>>>>>> program tomorrow or next week and then let you know the result. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25581.php >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25582.php >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/10/25584.php >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/10/25585.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/10/25586.php >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25587.php > >_______________________________________________ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/10/25588.php