Hi Jeff,

> We committed some fixes -- see if you can get farther with
> tonight's nightly tarball.

My small Java and C programs still break on Solaris 10 Sparc
and x86_64 and on Linux (Sun C 5.12) with different errors. I
have put everything into this email, because I have already
sent single error messages for these errors before for an
earlier version. It seems that nothing has changed for my
programs and my environment.


Java program:
=============

tyr java 105 ompi_info | grep -e MPI: -e "C compiler:"
                Open MPI: 1.9a1r32716
              C compiler: cc
tyr java 106 mpijavac InitFinalizeMain.java
warning: [path] bad path element 
"/usr/local/openmpi-1.9_64_cc/lib64/shmem.jar": no such file or directory
1 warning
tyr java 107 mpiexec -np 1 java InitFinalizeMain
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=21678, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc 
compressed oops)
# Problematic frame:
# C  [libc.so.1+0x3c7f0]  strlen+0x50
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" 
before starting Java again
#
# An error report file with more information is saved as:
# /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid21678.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node tyr exited on signal 6 
(Abort).
--------------------------------------------------------------------------
tyr java 108 




tyr java 108 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
/usr/local/openmpi-1.9_64_cc/bin/mpiexec
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from 
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun...done.
(gdb) run -np 1 java InitFinalizeMain 
Starting program: /usr/local/openmpi-1.9_64_cc/bin/mpiexec -np 1 java 
InitFinalizeMain
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xffffffff7ea3c7f0, pid=21696, tid=2
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode solaris-sparc 
compressed oops)
# Problematic frame:
# C  [libc.so.1+0x3c7f0]  strlen+0x50
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" 
before starting Java again
#
# An error report file with more information is saved as:
# /home/fd1026/work/skripte/master/parallel/prog/mpi/java/hs_err_pid21696.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node tyr exited on signal 6 
(Abort).
--------------------------------------------------------------------------
[LWP    2         exited]
[New Thread 2        ]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to satisfy 
query
(gdb) bt
#0  0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
#1  0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
#2  0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
#3  0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
#4  0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
#5  0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
#6  0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
#7  0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
#8  0xffffffff7e4e6f88 in vm_close ()
   from /usr/local/openmpi-1.9_64_cc/lib64/libopen-pal.so.0
#9  0xffffffff7e4e4274 in lt_dlclose ()
   from /usr/local/openmpi-1.9_64_cc/lib64/libopen-pal.so.0
#10 0xffffffff7e53a574 in ri_destructor (obj=0x0)
    at 
../../../../openmpi-1.9a1r32716/opal/mca/base/mca_base_component_repository.c:382
#11 0xffffffff7e537d50 in opal_obj_run_destructors (object=0x0)
    at ../../../../openmpi-1.9a1r32716/opal/class/opal_object.h:446
#12 0xffffffff7e539de4 in mca_base_component_repository_release 
(component=0xf000)
    at 
../../../../openmpi-1.9a1r32716/opal/mca/base/mca_base_component_repository.c:240
#13 0xffffffff7e540448 in mca_base_component_unload (component=0x0, 
output_id=-2145509376)
    at 
../../../../openmpi-1.9a1r32716/opal/mca/base/mca_base_components_close.c:47
#14 0xffffffff7e5404ec in mca_base_component_close 
(component=0xffffff7b000030ff, 
    output_id=255)
    at 
../../../../openmpi-1.9a1r32716/opal/mca/base/mca_base_components_close.c:60
#15 0xffffffff7e5405fc in mca_base_components_close (output_id=767, 
components=0x0, 
    skip=0xffffff7f73cdf800)
    at 
../../../../openmpi-1.9a1r32716/opal/mca/base/mca_base_components_close.c:86
#16 0xffffffff7e54053c in mca_base_framework_components_close (framework=0xff, 
    skip=0xffffff7c801c4000)
    at 
../../../../openmpi-1.9a1r32716/opal/mca/base/mca_base_components_close.c:68
#17 0xffffffff7ee48d68 in orte_oob_base_close ()
    at ../../../../openmpi-1.9a1r32716/orte/mca/oob/base/oob_base_frame.c:98
#18 0xffffffff7e56c23c in mca_base_framework_close 
(framework=0xffffff7e4e413cff)
    at ../../../../openmpi-1.9a1r32716/opal/mca/base/mca_base_framework.c:187
#19 0xffffffff7bb13f00 in rte_finalize ()
    at ../../../../../openmpi-1.9a1r32716/orte/mca/ess/hnp/ess_hnp_module.c:857
#20 0xffffffff7ec3adf0 in orte_finalize ()
    at ../../openmpi-1.9a1r32716/orte/runtime/orte_finalize.c:66
#21 0x000000010000e264 in orterun (argc=4607, argv=0x0)
    at ../../../../openmpi-1.9a1r32716/orte/tools/orterun/orterun.c:1099
#22 0x00000001000046d4 in main (argc=255, argv=0xffffff7f0af87800)
    at ../../../../openmpi-1.9a1r32716/orte/tools/orterun/main.c:13
(gdb) 





C program:
==========

tyr small_prog 116 mpiexec -np 1 init_finalize
Hello!
tyr small_prog 117 mpiexec -np 2 init_finalize
select: Interrupted system call
[tyr.informatik.hs-fulda.de:22150] [[61600,0],0]->[[61600,1],0] 
pmix_server_msg_send_bytes: write failed: Broken 
pipe (32) [sd = 19]
[tyr.informatik.hs-fulda.de:22150] [[61600,0],0]-[[61600,1],0] 
pmix_server_peer_send_handler: unable to send 
message ON SOCKET 19
[tyr.informatik.hs-fulda.de:22152] [[61600,1],0] usock_peer_recv_handler: 
unable to recv message
[tyr.informatik.hs-fulda.de:22150] [[61600,0],0]->[[61600,1],0] 
pmix_server_msg_send_bytes: write failed: Broken 
pipe (32) [sd = 19]
[tyr.informatik.hs-fulda.de:22150] [[61600,0],0]-[[61600,1],0] 
pmix_server_peer_send_handler: unable to send 
message ON SOCKET 19
[tyr.informatik.hs-fulda.de:22154] [[61600,1],1] usock_peer_recv_handler: 
unable to recv message
^Ctyr small_prog 118 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
/usr/local/openmpi-1.9_64_cc/bin/mpiece 
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from 
/export2/prog/SunOS_sparc/openmpi-1.9_64_cc/bin/orterun...done.
(gdb) run -np 2 init_finalize   
Starting program: /usr/local/openmpi-1.9_64_cc/bin/mpiexec -np 2 init_finalize
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
select: Interrupted system call

Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 1 (LWP 1)]
0xffffffff7d3dcdf8 in _write () from /lib/sparcv9/libc.so.1
(gdb) bt
#0  0xffffffff7d3dcdf8 in _write () from /lib/sparcv9/libc.so.1
#1  0xffffffff7d3ca23c in write () from /lib/sparcv9/libc.so.1
#2  0xffffffff7ed5c400 in send_bytes (peer=0xffffff7f73cdf8ff)
    at ../../openmpi-1.9a1r32716/orte/orted/pmix/pmix_server_sendrecv.c:83
#3  0xffffffff7ed5cb18 in pmix_server_send_handler (sd=479838208, flags=256, 
    cbdata=0x400020000001300)
    at ../../openmpi-1.9a1r32716/orte/orted/pmix/pmix_server_sendrecv.c:188
#4  0xffffffff7e612b1c in event_persist_closure ()
   from /usr/local/openmpi-1.9_64_cc/lib64/libopen-pal.so.0
#5  0xffffffff7e612ca8 in event_process_active_single_queue ()
   from /usr/local/openmpi-1.9_64_cc/lib64/libopen-pal.so.0
#6  0xffffffff7e613048 in event_process_active ()
   from /usr/local/openmpi-1.9_64_cc/lib64/libopen-pal.so.0
#7  0xffffffff7e613a6c in opal_libevent2021_event_base_loop ()
   from /usr/local/openmpi-1.9_64_cc/lib64/libopen-pal.so.0
#8  0x000000010000e18c in orterun (argc=658510079, argv=0x10000001800)
    at ../../../../openmpi-1.9a1r32716/orte/tools/orterun/orterun.c:1081
#9  0x00000001000046d4 in main (argc=1792, argv=0xffffff7ed5c5a400)
    at ../../../../openmpi-1.9a1r32716/orte/tools/orterun/main.c:13
(gdb) 




linpc1 small_prog 101 mpiexec -np 1 init_finalize
Hello!
linpc1 small_prog 102 mpiexec -np 2 init_finalize
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_proc_complete_init failed
  --> Returned "(null)" (-27) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[linpc1:13886] 1 more process has sent help message help-mpi-runtime.txt / 
mpi_init:startup:internal-failure
[linpc1:13886] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
help / error messages
linpc1 small_prog 103 



Can I provide anything else?


Kind regards

Siegmar

Reply via email to