Re: [OMPI users] SIGSEGV with Java, openmpi-1.8.2, and Sun C and gcc-4.9.0

2014-09-03 Thread Siegmar Gross
Hi Ralph,

> I believe this was fixed in the trunk and is now scheduled to come
> across to 1.8.3

Today I installed openmpi-1.9a1r32664 and the problem still exists.
Is the backtrace helpful or do you need something else?

tyr java 111 ompi_info | grep MPI:
Open MPI: 1.9a1r32664
tyr java 112 mpijavac InitFinalizeMain.java 
warning: [path] bad path element 
"/usr/local/openmpi-1.9_64_cc/lib64/shmem.jar": 
no such file or directory
1 warning
tyr java 113 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
/usr/local/openmpi-1.9_64_cc/bin/mpiexec 
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10".
For bug reporting instructions, please see:
...
Reading symbols from 
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun...done.
(gdb) run -np 1 java InitFinalizeMain 
Starting program: /usr/local/openmpi-1.9_64_cc/bin/mpiexec -np 1 java 
InitFinalizeMain
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP2]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7ea3c7f0, pid=3584, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc 
compressed oops)
# Problematic frame:
# C  [libc.so.1+0x3c7f0]  strlen+0x50
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid3584.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--
mpiexec noticed that process rank 0 with PID 0 on node tyr exited on signal 6 
(Abort).
--
[LWP2 exited]
[New Thread 2]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy 
query
(gdb) bt
#0  0x7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
#1  0x7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
#2  0x7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
#3  0x7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
#4  0x7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
#5  0x7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
#6  0x7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
#7  0x7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
#8  0x7e4e6d88 in vm_close ()
   from /usr/local/openmpi-1.9_64_cc/lib64/libopen-pal.so.0
#9  0x7e4e4074 in lt_dlclose ()
   from /usr/local/openmpi-1.9_64_cc/lib64/libopen-pal.so.0
#10 0x7e53a1cc in ri_destructor (obj=0x0)
at 
../../../../openmpi-1.9a1r32664/opal/mca/base/mca_base_component_repository.c:38
2
#11 0x7e5379a8 in opal_obj_run_destructors (object=0x0)
at ../../../../openmpi-1.9a1r32664/opal/class/opal_object.h:446
#12 0x7e539a3c in mca_base_component_repository_release 
(component=0xf000)
at 
../../../../openmpi-1.9a1r32664/opal/mca/base/mca_base_component_repository.c:24
0
#13 0x7e5400a0 in mca_base_component_unload (component=0x0, 
output_id=-2145509376)
at 
../../../../openmpi-1.9a1r32664/opal/mca/base/mca_base_components_close.c:47
#14 0x7e540144 in mca_base_component_close 
(component=0xff7b30ff, 
output_id=255)
at 
../../../../openmpi-1.9a1r32664/opal/mca/base/mca_base_components_close.c:60
#15 0x7e540254 in mca_base_components_close (output_id=767, 
components=0x0, 
skip=0xff7f73cdf800)
at 
../../../../openmpi-1.9a1r32664/opal/mca/base/mca_base_components_close.c:86
#16 0x7e540194 in mca_base_framework_components_close (framework=0xff, 
skip=0xff7c801c4000)
at 
../../../../openmpi-1.9a1r32664/opal/mca/base/mca_base_components_close.c:68
#17 0x7ee49a58 in orte_oob_base_close ()
at ../../../../openmpi-1.9a1r32664/orte/mca/oob/base/oob_base_frame.c:98
#18 0x7e56bcfc in mca_base_framework_close 
(framework=0xff7e4e3f3cff)
at ../../../../openmpi-1.9a1r32664/opal/mca/base/mca_base_framework.c:187
#19 0x7bb13f00 in rte_finalize ()
at ../../../../../openmpi-1.9a1r32664/orte/mca/ess/hnp/ess_hnp_module.c:857
#20 

Re: [OMPI users] SIGSEGV with Java, openmpi-1.8.2, and Sun C and gcc-4.9.0

2014-09-02 Thread Ralph Castain
I believe this was fixed in the trunk and is now scheduled to come across to 
1.8.3

On Sep 2, 2014, at 4:21 AM, Siegmar Gross 
 wrote:

> Hi,
> 
> yesterday I installed openmpi-1.8.2 on my machines (Solaris 10 Sparc
> (tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1 x86_64
> (linpc0)) with Sun C 5.12. A small Java program works on Linux,
> but breaks with a segmentation fault on Solaris 10.
> 
> 
> tyr java 172 where mpijavac
> mpijavac is aliased to \mpijavac -deprecation -Xlint:all
> /usr/local/openmpi-1.8.2_64_cc/bin/mpijavac
> tyr java 173 mpijavac InitFinalizeMain.java 
> tyr java 174 mpiexec -np 1 java InitFinalizeMain
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7ea3c7f0, pid=28334, tid=2
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode 
> solaris-sparc compressed oops)
> # Problematic frame:
> # C  [libc.so.1+0x3c7f0]  strlen+0x50
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" 
> before starting Java again
> #
> # An error report file with more information is saved as:
> # /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid28334.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> --
> mpiexec noticed that process rank 0 with PID 28334 on node tyr exited on 
> signal 6 (Abort).
> --
> tyr java 175 
> 
> 
> 
> gbd shows the following backtrace for SunC 5.12.
> 
> tyr java 175 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
> /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec 
> GNU gdb (GDB) 7.6.1
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "sparc-sun-solaris2.10".
> For bug reporting instructions, please see:
> ...
> Reading symbols from 
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_cc/bin/orterun...done.
> (gdb) run -np 1 java InitFinalizeMain 
> Starting program: /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec -np 1 java 
> InitFinalizeMain
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP2]
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7ea3c7f0, pid=28353, tid=2
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode 
> solaris-sparc compressed oops)
> # Problematic frame:
> # C  [libc.so.1+0x3c7f0]  strlen+0x50
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" 
> before starting Java again
> #
> # An error report file with more information is saved as:
> # /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid28353.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> --
> mpiexec noticed that process rank 0 with PID 28353 on node tyr exited on 
> signal 6 (Abort).
> --
> [LWP2 exited]
> [New Thread 2]
> [Switching to Thread 1 (LWP 1)]
> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
> satisfy query
> (gdb) bt
> #0  0x7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
> #1  0x7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
> #2  0x7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
> #3  0x7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
> #4  0x7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
> #5  0x7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
> #6  0x7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
> #7  0x7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
> #8  0x7e8cb348 in vm_close ()
>   from /usr/local/openmpi-1.8.2_64_cc/lib64/libopen-pal.so.6
> #9  0x7e8c8634 in lt_dlclose ()
>   from /usr/local/openmpi-1.8.2_64_cc/lib64/libopen-pal.so.6
> 

[OMPI users] SIGSEGV with Java, openmpi-1.8.2, and Sun C and gcc-4.9.0

2014-09-02 Thread Siegmar Gross
Hi,

yesterday I installed openmpi-1.8.2 on my machines (Solaris 10 Sparc
(tyr), Solaris 10 x86_64 (sunpc0), and openSUSE Linux 12.1 x86_64
(linpc0)) with Sun C 5.12. A small Java program works on Linux,
but breaks with a segmentation fault on Solaris 10.


tyr java 172 where mpijavac
mpijavac is aliased to \mpijavac -deprecation -Xlint:all
/usr/local/openmpi-1.8.2_64_cc/bin/mpijavac
tyr java 173 mpijavac InitFinalizeMain.java 
tyr java 174 mpiexec -np 1 java InitFinalizeMain
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7ea3c7f0, pid=28334, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc 
compressed oops)
# Problematic frame:
# C  [libc.so.1+0x3c7f0]  strlen+0x50
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" 
before starting Java again
#
# An error report file with more information is saved as:
# /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid28334.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--
mpiexec noticed that process rank 0 with PID 28334 on node tyr exited on signal 
6 (Abort).
--
tyr java 175 



gbd shows the following backtrace for SunC 5.12.

tyr java 175 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
/usr/local/openmpi-1.8.2_64_cc/bin/mpiexec 
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10".
For bug reporting instructions, please see:
...
Reading symbols from 
/export2/prog/SunOS_sparc/openmpi-1.8.2_64_cc/bin/orterun...done.
(gdb) run -np 1 java InitFinalizeMain 
Starting program: /usr/local/openmpi-1.8.2_64_cc/bin/mpiexec -np 1 java 
InitFinalizeMain
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP2]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x7ea3c7f0, pid=28353, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc 
compressed oops)
# Problematic frame:
# C  [libc.so.1+0x3c7f0]  strlen+0x50
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" 
before starting Java again
#
# An error report file with more information is saved as:
# /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid28353.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--
mpiexec noticed that process rank 0 with PID 28353 on node tyr exited on signal 
6 (Abort).
--
[LWP2 exited]
[New Thread 2]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy 
query
(gdb) bt
#0  0x7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
#1  0x7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
#2  0x7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
#3  0x7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
#4  0x7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
#5  0x7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
#6  0x7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
#7  0x7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
#8  0x7e8cb348 in vm_close ()
   from /usr/local/openmpi-1.8.2_64_cc/lib64/libopen-pal.so.6
#9  0x7e8c8634 in lt_dlclose ()
   from /usr/local/openmpi-1.8.2_64_cc/lib64/libopen-pal.so.6
#10 0x7e91edcc in ri_destructor (obj=0xff)
at 
../../../../openmpi-1.8.2/opal/mca/base/mca_base_component_repository.c:391
#11 0x7e91c5a0 in opal_obj_run_destructors (object=0xff7c701d00ff)
at ../../../../openmpi-1.8.2/opal/class/opal_object.h:446
#12 0x7e91e61c in mca_base_component_repository_release 
(component=0x10ff)
at