Hi,

I have tried to run the ping-pong-mpi-tcp project. This program crashes with a 
similar segmentation fault at StringBuilder.append(String). I have also opened 
an issue at Github (https://github.com/open-mpi/ompi/issues/10223) but I 
thought I'd also try this list. The log indicates, that the app runs a little 
before it crashes, also the crash happens during a simple String operation like 
the following:

var newString = "concatenated" + "string";

The error report and the log of the application can be found below.

Any help with this problem would be very much appreciated!

Thanks in advance
Janek

The error report of the jvm can be found here: 
https://github.com/open-mpi/ompi/files/8424478/hs_err_pid31983.log. The first 
bits look like the following:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00001512949256d4 (sent by kill), pid=31983, tid=31990
#
# JRE version: OpenJDK Runtime Environment (17.0.2+8) (build 17.0.2+8-86)
# Java VM: OpenJDK 64-Bit Server VM (17.0.2+8-86, mixed mode, sharing, tiered, 
compressed oops, compressed class ptrs, serial gc, linux-amd64)
# Problematic frame:
# J 448 c2 
java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; 
java.base@17.0.2 (8 bytes) @ 0x00001512949256d4 
[0x00001512949256a0+0x0000000000000034]
#
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to 
/net/ils/laudan/2-mpi-test/core.31983)
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

---------------  S U M M A R Y ------------

Command Line: -Djava.library.path=/homes2/ils/laudan/ompi-java17/lib MPIMain 2 
false

Host: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz, 12 cores, 125G, AlmaLinux 
release 8.5 (Arctic Sphynx)
Time: Tue Apr  5 16:50:36 2022 CEST elapsed time: 3.694052 seconds (0d 0h 0m 3s)

---------------  T H R E A D  ---------------

Current thread (0x00001512a4024710):  JavaThread "main" [_thread_in_Java, 
id=31990, stack(0x00001512aa9c6000,0x00001512aaac7000)]

Stack: [0x00001512aa9c6000,0x00001512aaac7000],  sp=0x00001512aaac54c0,  free 
space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 448 c2 
java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; 
java.base@17.0.2 (8 bytes) @ 0x00001512949256d4 
[0x00001512949256a0+0x0000000000000034]
j  protocol.commands.NetworkCommand.toString()Ljava/lang/String;+76
j  protocol.commands.ping.Ping_NC.toString()Ljava/lang/String;+13
j  
testframework.TestFramework.addAdditionalPayload(Lprotocol/commands/NetworkCommand;)Lprotocol/commands/NetworkCommand;+7
j  network.messenger.MessageSender.send(Lprotocol/commands/NetworkCommand;)V+1
j  role.Role.sendMessage(Lprotocol/commands/NetworkCommand;)V+5
j  role.Node.pingAll()V+59
j  
testframework.TestFramework.loopPing(Ljava/lang/String;Ltestframework/TestPhase;Lrole/Node;I)Ltestframework/result/OverallLatencyResult;+57
j  testframework.TestFramework._doPingTests(Lrole/Node;I)V+17
j  
testframework.TestFramework.doPingTests(Lrole/Node;I)Ltestframework/TestFramework;+25
j  MPIMain.main([Ljava/lang/String;)V+153
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7f4335]  JavaCalls::call_helper(JavaValue*, methodHandle 
const&, JavaCallArguments*, JavaThread*)+0x315
V  [libjvm.so+0x88cfed]  jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, 
JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone 
.constprop.1]+0x38d
V  [libjvm.so+0x88fe6e]  jni_CallStaticVoidMethod+0x16e
C  [libjli.so+0x46fe]  JavaMain+0xcfe
C  [libjli.so+0x7ee9]  ThreadJavaMain+0x9


siginfo: si_signo: 11 (SIGSEGV), si_code: -6 (SI_TKILL), si_pid: 31983 (current 
process), si_uid: 35556

Register to memory mapping:

RAX={method} {0x0000000800424790} 'append' 
'(Ljava/lang/String;)Ljava/lang/StringBuilder;' in 'java/lang/StringBuilder'
RBX={method} {0x0000000800424790} 'append' 
'(Ljava/lang/String;)Ljava/lang/StringBuilder;' in 'java/lang/StringBuilder'
RCX=0x00000000819598d0 is an oop: java.lang.StringBuilder
{0x00000000819598d0} - klass: 'java/lang/StringBuilder'
 - ---- fields (total size 3 words):
 - 'count' 'I' @12  164 (a4)
 - 'coder' 'B' @16  0
 - 'value' '[B' @20  [B{0x0000000081959ca0} (1032b394)
RDX=0x0 is NULL
RSP=0x00001512aaac54c0 is pointing into the stack for thread: 0x00001512a4024710
RBP=0x00001512aaac5598 is pointing into the stack for thread: 0x00001512a4024710
RSI=0x00000000819598d0 is an oop: java.lang.StringBuilder
{0x00000000819598d0} - klass: 'java/lang/StringBuilder'
 - ---- fields (total size 3 words):
 - 'count' 'I' @12  164 (a4)
 - 'coder' 'B' @16  0
 - 'value' '[B' @20  [B{0x0000000081959ca0} (1032b394)
RDI=0x00001512866b78a8 is pointing into metadata
R8 =0x00001512a40d1d10 points into unknown readable memory: 0x0000151286400950 
| 50 09 40 86 12 15 00 00
R9 =0x0000000000000002 is an unknown value
R10=0x00000000819598d0 is an oop: java.lang.StringBuilder
{0x00000000819598d0} - klass: 'java/lang/StringBuilder'
 - ---- fields (total size 3 words):
 - 'count' 'I' @12  164 (a4)
 - 'coder' 'B' @16  0
 - 'value' '[B' @20  [B{0x0000000081959ca0} (1032b394)
R11=0x00001512949256c0 is at entry_point+32 in (nmethod*)0x0000151294925510
R12=0x0 is NULL
R13=0x00001512aaac5540 is pointing into the stack for thread: 0x00001512a4024710
R14=0x00001512aaac55a8 is pointing into the stack for thread: 0x00001512a4024710
R15=0x00001512a4024710 is a thread


---------------------------------------------------------------------- 
Application log ----------------------------------

2022-04-05 16:50:33 [main] MPIMain.main()
INFO: Args received: [2, false]
2022-04-05 16:50:33 [main] MPIMain.main()
INFO: Args received: [2, false]
[INFO] 16:50:34:294 config.GlobalConfig.initMPI(): Thread support level: 0
[INFO] 16:50:34:294 config.GlobalConfig.initMPI(): Thread support level: 0
[INFO] 16:50:34:298 config.GlobalConfig.init(): Init [MPI_CONNECTION, 
isSingleJVM:false]
[INFO] 16:50:34:298 config.GlobalConfig.init(): Init [MPI_CONNECTION, 
isSingleJVM:false]
[INFO] 16:50:35:494 config.GlobalConfig.registerRole(): Registering role: 
Role{roleId='p0g2', myAddress=MPIAddress{rank=0, groupId=2}, isLeader=false}
[INFO] 16:50:35:503 config.GlobalConfig.registerRole(): Registering role: 
Role{roleId='p1g2', myAddress=MPIAddress{rank=1, groupId=2}, isLeader=false}
[INFO] 16:50:35:540 config.GlobalConfig.registerAddress(): Address 
[MPIAddress{rank=0, groupId=2}] registered on role [Role{roleId='p0g2', 
myAddress=MPIAddress{rank=0, groupId=2}, isLeader=true}]
[INFO] 16:50:35:540 config.GlobalConfig.registerAddress(): Address 
[MPIAddress{rank=1, groupId=2}] registered on role [Role{roleId='p0g2', 
myAddress=MPIAddress{rank=0, groupId=2}, isLeader=true}]
[INFO] 16:50:35:541 role.Node.<init>(): Node created: Role{roleId='p0g2', 
myAddress=MPIAddress{rank=0, groupId=2}, isLeader=true}
[INFO] 16:50:35:541 config.GlobalConfig.registerAddress(): Address 
[MPIAddress{rank=1, groupId=2}] registered on role [Role{roleId='p1g2', 
myAddress=MPIAddress{rank=1, groupId=2}, isLeader=true}]
[INFO] 16:50:35:541 config.GlobalConfig.registerAddress(): Address 
[MPIAddress{rank=0, groupId=2}] registered on role [Role{roleId='p1g2', 
myAddress=MPIAddress{rank=1, groupId=2}, isLeader=false}]
[INFO] 16:50:35:542 role.Node.<init>(): Node created: Role{roleId='p1g2', 
myAddress=MPIAddress{rank=1, groupId=2}, isLeader=false}
[node500:31983:0:31990] Caught signal 11 (Segmentation fault: address not 
mapped to object at address 0x14)
[INFO] 16:50:36:566 testframework.TestFramework._doPingTests(): Starting 
ping-pong tests...
==== backtrace (tid:  31990) ====
 0  /usr/lib64/libucs.so.0(ucs_handle_error+0x2a4) [0x1512416f32a4]
 1  /usr/lib64/libucs.so.0(+0x2347c) [0x1512416f347c]
 2  /usr/lib64/libucs.so.0(+0x2364a) [0x1512416f364a]
 3  [0x1512949256d4]
=================================
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00001512949256d4 (sent by kill), pid=31983, tid=31990
#
# JRE version: OpenJDK Runtime Environment (17.0.2+8) (build 17.0.2+8-86)
# Java VM: OpenJDK 64-Bit Server VM (17.0.2+8-86, mixed mode, sharing, tiered, 
compressed oops, compressed class ptrs, serial gc, linux-amd64)
# Problematic frame:
# J 448 c2 
java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; 
java.base@17.0.2 (8 bytes) @ 0x00001512949256d4 
[0x00001512949256a0+0x0000000000000034]
#
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to 
/net/ils/laudan/2-mpi-test/core.31983)
#
# An error report file with more information is saved as:
# /net/ils/laudan/2-mpi-test/hs_err_pid31983.log
Compiled method (c2)    3781  448       4       java.lang.StringBuilder::append 
(8 bytes)
 total in heap  [0x0000151294925510,0x0000151294925fb0] = 2720
 relocation     [0x0000151294925670,0x00001512949256a0] = 48
 main code      [0x00001512949256a0,0x0000151294925be0] = 1344
 stub code      [0x0000151294925be0,0x0000151294925bf8] = 24
 metadata       [0x0000151294925bf8,0x0000151294925c50] = 88
 scopes data    [0x0000151294925c50,0x0000151294925e88] = 568
 scopes pcs     [0x0000151294925e88,0x0000151294925f68] = 224
 dependencies   [0x0000151294925f68,0x0000151294925f70] = 8
 handler table  [0x0000151294925f70,0x0000151294925f88] = 24
 nul chk table  [0x0000151294925f88,0x0000151294925fb0] = 40
[node500:31983] *** Process received signal ***
[node500:31983] Signal: Aborted (6)
[node500:31983] Signal code:  (-6)
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
[node500:31983] [ 0] /usr/lib64/libpthread.so.0(+0x12c20)[0x1512aa492c20]
[node500:31983] [ 1] /usr/lib64/libc.so.6(gsignal+0x10f)[0x1512a9eee37f]
[node500:31983] [ 2] /usr/lib64/libc.so.6(abort+0x127)[0x1512a9ed8db5]
[node500:31983] [ 3] 
/net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(+0x246cc9)[0x1512a8e90cc9]
[node500:31983] [ 4] 
/net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(+0xe0e70c)[0x1512a9a5870c]
[node500:31983] [ 5] 
/net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(+0xe0f12b)[0x1512a9a5912b]
[node500:31983] [ 6] 
/net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(+0xe0f15e)[0x1512a9a5915e]
[node500:31983] [ 7] 
/net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(JVM_handle_linux_signal+0x198)[0x1512a9906148]
[node500:31983] [ 8] /usr/lib64/libpthread.so.0(+0x12c20)[0x1512aa492c20]
[node500:31983] [ 9] [0x1512949256d4]
[node500:31983] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node node500 exited on signal 
6 (Aborted).
--------------------------------------------------------------------------



------ Originalnachricht ------
Von: "Benson Muite" 
<benson_mu...@emailplus.org<mailto:benson_mu...@emailplus.org>>
An: "users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>" 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Cc: "Laudan, Janek" <lau...@tu-berlin.de<mailto:lau...@tu-berlin.de>>
Gesendet: 18.03.2022 06:48:54
Betreff: Re: [OMPI users] [EXTERNAL] Java Segentation Fault

Hi Janek,

If you compile your program and produce a class file, does it run using
mpirun -np 1 java matsim-p

Try to compile OpenMPI from source as indicated at
https://www-lb.open-mpi.org/faq/?category=java

Java tends to require more memory, so if using a batch system be sure to
request enough.

Possibly also interesting to try might be:
https://github.com/mboysan/ping-pong-mpi-tcp

Benson

On 3/17/22 7:03 PM, Laudan, Janek via users wrote:
Hi Howard,

thanks for your reply. I am using version 4.1.2 and I didn't compile
with the mpijavac wrapper. I was hoping that I could maintain some form
of our maven build infrastructure and then deploy the resulting jar. The
Project set up is here:
https://github.com/Janekdererste/matsim-p/blob/master/pom.xml
<https://github.com/Janekdererste/matsim-p/blob/master/pom.xml>

All the best,
Janek

------ Originalnachricht ------
Von: "Pritchard Jr., Howard" <howa...@lanl.gov<mailto:howa...@lanl.gov> 
<mailto:howa...@lanl.gov<mailto:howa...@lanl.gov>>>
An: "Laudan, Janek" <lau...@tu-berlin.de<mailto:lau...@tu-berlin.de> 
<mailto:lau...@tu-berlin.de<mailto:lau...@tu-berlin.de>>>;
"Open MPI Users" <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
<mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>>
Gesendet: 17.03.2022 16:59:04
Betreff: Re: [EXTERNAL] [OMPI users] Java Segentation Fault

HI Janek,

A few questions.

First which version of Open MPI are you using?

Did you compile your code with the Open MPI mpijavac wrapper?

Howard

*From: *users 
<users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>
<mailto:users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>>>
 on behalf of "Laudan, Janek
via users" <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> 
<mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>>
*Reply-To: *"Laudan, Janek" <lau...@tu-berlin.de<mailto:lau...@tu-berlin.de>
<mailto:lau...@tu-berlin.de<mailto:lau...@tu-berlin.de>>>, Open MPI Users
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> 
<mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>>
*Date: *Thursday, March 17, 2022 at 9:52 AM
*To: *"users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> 
<mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>"
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> 
<mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>>
*Cc: *"Laudan, Janek" <lau...@tu-berlin.de<mailto:lau...@tu-berlin.de> 
<mailto:lau...@tu-berlin.de<mailto:lau...@tu-berlin.de>>>
*Subject: *[EXTERNAL] [OMPI users] Java Segentation Fault

Hi,



I am trying to extend an existing Java-Project to be run with
open-mpi. I have managed to successfully set up open-mpi and my
project on my local machine to conduct some test runs.

However, when I tried to set up things on our cluster I ran into some
problems. I was able to run some trivial examples such as "HelloWorld"
and "Ring" which I found on in the ompi-Github-repo. Unfortunately,
when I try to run our app wrapped between MPI.Init(args) and
MPI.Finalize() I get the following segmentation fault:

$ mpirun -np 1 java -cp matsim-p-1.0-SNAPSHOT.jar
org.matsim.parallel.RunMinimalMPIExample
Java-Version: 11.0.2
before getTestScenario
before load config
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This
will impact performance.
[cluster-i:1272 :0:1274] Caught signal 11 (Segmentation fault: address
not mapped to object at address 0xc)
==== backtrace (tid:   1274) ====
=================================
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000014a85752fdf4, pid=1272, tid=1274
#
# JRE version: Java(TM) SE Runtime Environment (11.0.2+9) (build
11.0.2+9-LTS)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0.2+9-LTS, mixed
mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# J 612 c2
java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
java.base@11.0.2 (8 bytes) @ 0x000014a85752fdf4
[0x000014a85752fdc0+0x0000000000000034]
#
# No core dump will be written. Core dumps have been disabled. To
enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /net/ils/laudan/mpi-test/matsim-p/hs_err_pid1272.log
Compiled method (c2)    1052  612       4
java.lang.StringBuilder::append (8 bytes)
 total in heap  [0x000014a85752fc10,0x000014a8575306a8] = 2712
 relocation     [0x000014a85752fd88,0x000014a85752fdb8] = 48
 main code      [0x000014a85752fdc0,0x000014a857530360] = 1440
 stub code      [0x000014a857530360,0x000014a857530378] = 24
 metadata       [0x000014a857530378,0x000014a8575303c0] = 72
 scopes data    [0x000014a8575303c0,0x000014a857530578] = 440
 scopes pcs     [0x000014a857530578,0x000014a857530658] = 224
 dependencies   [0x000014a857530658,0x000014a857530660] = 8
 handler table  [0x000014a857530660,0x000014a857530678] = 24
 nul chk table  [0x000014a857530678,0x000014a8575306a8] = 48
Compiled method (c1)    1053  263       3
java.lang.StringBuilder::<init> (7 bytes)
 total in heap  [0x000014a850102790,0x000014a850102b30] = 928
 relocation     [0x000014a850102908,0x000014a850102940] = 56
 main code      [0x000014a850102940,0x000014a850102a20] = 224
 stub code      [0x000014a850102a20,0x000014a850102ac8] = 168
 metadata       [0x000014a850102ac8,0x000014a850102ad0] = 8
 scopes data    [0x000014a850102ad0,0x000014a850102ae8] = 24
 scopes pcs     [0x000014a850102ae8,0x000014a850102b28] = 64
 dependencies   [0x000014a850102b28,0x000014a850102b30] = 8
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is
disabled
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
<http://bugreport.java.com/bugreport/crash.jsp>
#
[cluster-i:01272] *** Process received signal ***
[cluster-i:01272] Signal: Aborted (6)
[cluster-i:01272] Signal code:  (-6)
[cluster-i:01272] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630]
[cluster-i:01272] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x14a86dcbb387]
[cluster-i:01272] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x14a86dcbca78]
[cluster-i:01272] [ 3]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xc00be9)[0x14a86d3f8be9]
[cluster-i:01272] [ 4]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29619)[0x14a86d621619]
[cluster-i:01272] [ 5]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29e9b)[0x14a86d621e9b]
[cluster-i:01272] [ 6]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29ece)[0x14a86d621ece]
[cluster-i:01272] [ 7]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(JVM_handle_linux_signal+0x1c0)[0x14a86d403a00]
[cluster-i:01272] [ 8]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xbff5e8)[0x14a86d3f75e8]
[cluster-i:01272] [ 9] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630]
[cluster-i:01272] [10] [0x14a85752fdf4]
[cluster-i:01272] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node cluster-i exited
on signal 6 (Aborted).
--------------------------------------------------------------------------

I am running ompi 4.1.2 with java-11. The project which I am trying to
set up is here: https://github.com/Janekdererste/matsim-p
<https://github.com/Janekdererste/matsim-p>

I hope somebody can advise on what to try next. Thanks and all the best

Janek


Reply via email to