Hi, I have tried to run the ping-pong-mpi-tcp project. This program crashes with a similar segmentation fault at StringBuilder.append(String). I have also opened an issue at Github (https://github.com/open-mpi/ompi/issues/10223) but I thought I'd also try this list. The log indicates, that the app runs a little before it crashes, also the crash happens during a simple String operation like the following:
var newString = "concatenated" + "string"; The error report and the log of the application can be found below. Any help with this problem would be very much appreciated! Thanks in advance Janek The error report of the jvm can be found here: https://github.com/open-mpi/ompi/files/8424478/hs_err_pid31983.log. The first bits look like the following: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00001512949256d4 (sent by kill), pid=31983, tid=31990 # # JRE version: OpenJDK Runtime Environment (17.0.2+8) (build 17.0.2+8-86) # Java VM: OpenJDK 64-Bit Server VM (17.0.2+8-86, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64) # Problematic frame: # J 448 c2 java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; java.base@17.0.2 (8 bytes) @ 0x00001512949256d4 [0x00001512949256a0+0x0000000000000034] # # Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /net/ils/laudan/2-mpi-test/core.31983) # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # --------------- S U M M A R Y ------------ Command Line: -Djava.library.path=/homes2/ils/laudan/ompi-java17/lib MPIMain 2 false Host: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz, 12 cores, 125G, AlmaLinux release 8.5 (Arctic Sphynx) Time: Tue Apr 5 16:50:36 2022 CEST elapsed time: 3.694052 seconds (0d 0h 0m 3s) --------------- T H R E A D --------------- Current thread (0x00001512a4024710): JavaThread "main" [_thread_in_Java, id=31990, stack(0x00001512aa9c6000,0x00001512aaac7000)] Stack: [0x00001512aa9c6000,0x00001512aaac7000], sp=0x00001512aaac54c0, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J 448 c2 java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; java.base@17.0.2 (8 bytes) @ 0x00001512949256d4 [0x00001512949256a0+0x0000000000000034] j protocol.commands.NetworkCommand.toString()Ljava/lang/String;+76 j protocol.commands.ping.Ping_NC.toString()Ljava/lang/String;+13 j testframework.TestFramework.addAdditionalPayload(Lprotocol/commands/NetworkCommand;)Lprotocol/commands/NetworkCommand;+7 j network.messenger.MessageSender.send(Lprotocol/commands/NetworkCommand;)V+1 j role.Role.sendMessage(Lprotocol/commands/NetworkCommand;)V+5 j role.Node.pingAll()V+59 j testframework.TestFramework.loopPing(Ljava/lang/String;Ltestframework/TestPhase;Lrole/Node;I)Ltestframework/result/OverallLatencyResult;+57 j testframework.TestFramework._doPingTests(Lrole/Node;I)V+17 j testframework.TestFramework.doPingTests(Lrole/Node;I)Ltestframework/TestFramework;+25 j MPIMain.main([Ljava/lang/String;)V+153 v ~StubRoutines::call_stub V [libjvm.so+0x7f4335] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x315 V [libjvm.so+0x88cfed] jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .constprop.1]+0x38d V [libjvm.so+0x88fe6e] jni_CallStaticVoidMethod+0x16e C [libjli.so+0x46fe] JavaMain+0xcfe C [libjli.so+0x7ee9] ThreadJavaMain+0x9 siginfo: si_signo: 11 (SIGSEGV), si_code: -6 (SI_TKILL), si_pid: 31983 (current process), si_uid: 35556 Register to memory mapping: RAX={method} {0x0000000800424790} 'append' '(Ljava/lang/String;)Ljava/lang/StringBuilder;' in 'java/lang/StringBuilder' RBX={method} {0x0000000800424790} 'append' '(Ljava/lang/String;)Ljava/lang/StringBuilder;' in 'java/lang/StringBuilder' RCX=0x00000000819598d0 is an oop: java.lang.StringBuilder {0x00000000819598d0} - klass: 'java/lang/StringBuilder' - ---- fields (total size 3 words): - 'count' 'I' @12 164 (a4) - 'coder' 'B' @16 0 - 'value' '[B' @20 [B{0x0000000081959ca0} (1032b394) RDX=0x0 is NULL RSP=0x00001512aaac54c0 is pointing into the stack for thread: 0x00001512a4024710 RBP=0x00001512aaac5598 is pointing into the stack for thread: 0x00001512a4024710 RSI=0x00000000819598d0 is an oop: java.lang.StringBuilder {0x00000000819598d0} - klass: 'java/lang/StringBuilder' - ---- fields (total size 3 words): - 'count' 'I' @12 164 (a4) - 'coder' 'B' @16 0 - 'value' '[B' @20 [B{0x0000000081959ca0} (1032b394) RDI=0x00001512866b78a8 is pointing into metadata R8 =0x00001512a40d1d10 points into unknown readable memory: 0x0000151286400950 | 50 09 40 86 12 15 00 00 R9 =0x0000000000000002 is an unknown value R10=0x00000000819598d0 is an oop: java.lang.StringBuilder {0x00000000819598d0} - klass: 'java/lang/StringBuilder' - ---- fields (total size 3 words): - 'count' 'I' @12 164 (a4) - 'coder' 'B' @16 0 - 'value' '[B' @20 [B{0x0000000081959ca0} (1032b394) R11=0x00001512949256c0 is at entry_point+32 in (nmethod*)0x0000151294925510 R12=0x0 is NULL R13=0x00001512aaac5540 is pointing into the stack for thread: 0x00001512a4024710 R14=0x00001512aaac55a8 is pointing into the stack for thread: 0x00001512a4024710 R15=0x00001512a4024710 is a thread ---------------------------------------------------------------------- Application log ---------------------------------- 2022-04-05 16:50:33 [main] MPIMain.main() INFO: Args received: [2, false] 2022-04-05 16:50:33 [main] MPIMain.main() INFO: Args received: [2, false] [INFO] 16:50:34:294 config.GlobalConfig.initMPI(): Thread support level: 0 [INFO] 16:50:34:294 config.GlobalConfig.initMPI(): Thread support level: 0 [INFO] 16:50:34:298 config.GlobalConfig.init(): Init [MPI_CONNECTION, isSingleJVM:false] [INFO] 16:50:34:298 config.GlobalConfig.init(): Init [MPI_CONNECTION, isSingleJVM:false] [INFO] 16:50:35:494 config.GlobalConfig.registerRole(): Registering role: Role{roleId='p0g2', myAddress=MPIAddress{rank=0, groupId=2}, isLeader=false} [INFO] 16:50:35:503 config.GlobalConfig.registerRole(): Registering role: Role{roleId='p1g2', myAddress=MPIAddress{rank=1, groupId=2}, isLeader=false} [INFO] 16:50:35:540 config.GlobalConfig.registerAddress(): Address [MPIAddress{rank=0, groupId=2}] registered on role [Role{roleId='p0g2', myAddress=MPIAddress{rank=0, groupId=2}, isLeader=true}] [INFO] 16:50:35:540 config.GlobalConfig.registerAddress(): Address [MPIAddress{rank=1, groupId=2}] registered on role [Role{roleId='p0g2', myAddress=MPIAddress{rank=0, groupId=2}, isLeader=true}] [INFO] 16:50:35:541 role.Node.<init>(): Node created: Role{roleId='p0g2', myAddress=MPIAddress{rank=0, groupId=2}, isLeader=true} [INFO] 16:50:35:541 config.GlobalConfig.registerAddress(): Address [MPIAddress{rank=1, groupId=2}] registered on role [Role{roleId='p1g2', myAddress=MPIAddress{rank=1, groupId=2}, isLeader=true}] [INFO] 16:50:35:541 config.GlobalConfig.registerAddress(): Address [MPIAddress{rank=0, groupId=2}] registered on role [Role{roleId='p1g2', myAddress=MPIAddress{rank=1, groupId=2}, isLeader=false}] [INFO] 16:50:35:542 role.Node.<init>(): Node created: Role{roleId='p1g2', myAddress=MPIAddress{rank=1, groupId=2}, isLeader=false} [node500:31983:0:31990] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x14) [INFO] 16:50:36:566 testframework.TestFramework._doPingTests(): Starting ping-pong tests... ==== backtrace (tid: 31990) ==== 0 /usr/lib64/libucs.so.0(ucs_handle_error+0x2a4) [0x1512416f32a4] 1 /usr/lib64/libucs.so.0(+0x2347c) [0x1512416f347c] 2 /usr/lib64/libucs.so.0(+0x2364a) [0x1512416f364a] 3 [0x1512949256d4] ================================= # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00001512949256d4 (sent by kill), pid=31983, tid=31990 # # JRE version: OpenJDK Runtime Environment (17.0.2+8) (build 17.0.2+8-86) # Java VM: OpenJDK 64-Bit Server VM (17.0.2+8-86, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64) # Problematic frame: # J 448 c2 java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; java.base@17.0.2 (8 bytes) @ 0x00001512949256d4 [0x00001512949256a0+0x0000000000000034] # # Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /net/ils/laudan/2-mpi-test/core.31983) # # An error report file with more information is saved as: # /net/ils/laudan/2-mpi-test/hs_err_pid31983.log Compiled method (c2) 3781 448 4 java.lang.StringBuilder::append (8 bytes) total in heap [0x0000151294925510,0x0000151294925fb0] = 2720 relocation [0x0000151294925670,0x00001512949256a0] = 48 main code [0x00001512949256a0,0x0000151294925be0] = 1344 stub code [0x0000151294925be0,0x0000151294925bf8] = 24 metadata [0x0000151294925bf8,0x0000151294925c50] = 88 scopes data [0x0000151294925c50,0x0000151294925e88] = 568 scopes pcs [0x0000151294925e88,0x0000151294925f68] = 224 dependencies [0x0000151294925f68,0x0000151294925f70] = 8 handler table [0x0000151294925f70,0x0000151294925f88] = 24 nul chk table [0x0000151294925f88,0x0000151294925fb0] = 40 [node500:31983] *** Process received signal *** [node500:31983] Signal: Aborted (6) [node500:31983] Signal code: (-6) # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # [node500:31983] [ 0] /usr/lib64/libpthread.so.0(+0x12c20)[0x1512aa492c20] [node500:31983] [ 1] /usr/lib64/libc.so.6(gsignal+0x10f)[0x1512a9eee37f] [node500:31983] [ 2] /usr/lib64/libc.so.6(abort+0x127)[0x1512a9ed8db5] [node500:31983] [ 3] /net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(+0x246cc9)[0x1512a8e90cc9] [node500:31983] [ 4] /net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(+0xe0e70c)[0x1512a9a5870c] [node500:31983] [ 5] /net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(+0xe0f12b)[0x1512a9a5912b] [node500:31983] [ 6] /net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(+0xe0f15e)[0x1512a9a5915e] [node500:31983] [ 7] /net/homes/ils/laudan/jdk-17.0.2/lib/server/libjvm.so(JVM_handle_linux_signal+0x198)[0x1512a9906148] [node500:31983] [ 8] /usr/lib64/libpthread.so.0(+0x12c20)[0x1512aa492c20] [node500:31983] [ 9] [0x1512949256d4] [node500:31983] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node node500 exited on signal 6 (Aborted). -------------------------------------------------------------------------- ------ Originalnachricht ------ Von: "Benson Muite" <benson_mu...@emailplus.org<mailto:benson_mu...@emailplus.org>> An: "users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>" <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> Cc: "Laudan, Janek" <lau...@tu-berlin.de<mailto:lau...@tu-berlin.de>> Gesendet: 18.03.2022 06:48:54 Betreff: Re: [OMPI users] [EXTERNAL] Java Segentation Fault Hi Janek, If you compile your program and produce a class file, does it run using mpirun -np 1 java matsim-p Try to compile OpenMPI from source as indicated at https://www-lb.open-mpi.org/faq/?category=java Java tends to require more memory, so if using a batch system be sure to request enough. Possibly also interesting to try might be: https://github.com/mboysan/ping-pong-mpi-tcp Benson On 3/17/22 7:03 PM, Laudan, Janek via users wrote: Hi Howard, thanks for your reply. I am using version 4.1.2 and I didn't compile with the mpijavac wrapper. I was hoping that I could maintain some form of our maven build infrastructure and then deploy the resulting jar. The Project set up is here: https://github.com/Janekdererste/matsim-p/blob/master/pom.xml <https://github.com/Janekdererste/matsim-p/blob/master/pom.xml> All the best, Janek ------ Originalnachricht ------ Von: "Pritchard Jr., Howard" <howa...@lanl.gov<mailto:howa...@lanl.gov> <mailto:howa...@lanl.gov<mailto:howa...@lanl.gov>>> An: "Laudan, Janek" <lau...@tu-berlin.de<mailto:lau...@tu-berlin.de> <mailto:lau...@tu-berlin.de<mailto:lau...@tu-berlin.de>>>; "Open MPI Users" <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> <mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>> Gesendet: 17.03.2022 16:59:04 Betreff: Re: [EXTERNAL] [OMPI users] Java Segentation Fault HI Janek, A few questions. First which version of Open MPI are you using? Did you compile your code with the Open MPI mpijavac wrapper? Howard *From: *users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org> <mailto:users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>>> on behalf of "Laudan, Janek via users" <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> <mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>> *Reply-To: *"Laudan, Janek" <lau...@tu-berlin.de<mailto:lau...@tu-berlin.de> <mailto:lau...@tu-berlin.de<mailto:lau...@tu-berlin.de>>>, Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> <mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>> *Date: *Thursday, March 17, 2022 at 9:52 AM *To: *"users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> <mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>" <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> <mailto:users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>> *Cc: *"Laudan, Janek" <lau...@tu-berlin.de<mailto:lau...@tu-berlin.de> <mailto:lau...@tu-berlin.de<mailto:lau...@tu-berlin.de>>> *Subject: *[EXTERNAL] [OMPI users] Java Segentation Fault Hi, I am trying to extend an existing Java-Project to be run with open-mpi. I have managed to successfully set up open-mpi and my project on my local machine to conduct some test runs. However, when I tried to set up things on our cluster I ran into some problems. I was able to run some trivial examples such as "HelloWorld" and "Ring" which I found on in the ompi-Github-repo. Unfortunately, when I try to run our app wrapped between MPI.Init(args) and MPI.Finalize() I get the following segmentation fault: $ mpirun -np 1 java -cp matsim-p-1.0-SNAPSHOT.jar org.matsim.parallel.RunMinimalMPIExample Java-Version: 11.0.2 before getTestScenario before load config WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. [cluster-i:1272 :0:1274] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xc) ==== backtrace (tid: 1274) ==== ================================= # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000014a85752fdf4, pid=1272, tid=1274 # # JRE version: Java(TM) SE Runtime Environment (11.0.2+9) (build 11.0.2+9-LTS) # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0.2+9-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # J 612 c2 java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; java.base@11.0.2 (8 bytes) @ 0x000014a85752fdf4 [0x000014a85752fdc0+0x0000000000000034] # # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /net/ils/laudan/mpi-test/matsim-p/hs_err_pid1272.log Compiled method (c2) 1052 612 4 java.lang.StringBuilder::append (8 bytes) total in heap [0x000014a85752fc10,0x000014a8575306a8] = 2712 relocation [0x000014a85752fd88,0x000014a85752fdb8] = 48 main code [0x000014a85752fdc0,0x000014a857530360] = 1440 stub code [0x000014a857530360,0x000014a857530378] = 24 metadata [0x000014a857530378,0x000014a8575303c0] = 72 scopes data [0x000014a8575303c0,0x000014a857530578] = 440 scopes pcs [0x000014a857530578,0x000014a857530658] = 224 dependencies [0x000014a857530658,0x000014a857530660] = 8 handler table [0x000014a857530660,0x000014a857530678] = 24 nul chk table [0x000014a857530678,0x000014a8575306a8] = 48 Compiled method (c1) 1053 263 3 java.lang.StringBuilder::<init> (7 bytes) total in heap [0x000014a850102790,0x000014a850102b30] = 928 relocation [0x000014a850102908,0x000014a850102940] = 56 main code [0x000014a850102940,0x000014a850102a20] = 224 stub code [0x000014a850102a20,0x000014a850102ac8] = 168 metadata [0x000014a850102ac8,0x000014a850102ad0] = 8 scopes data [0x000014a850102ad0,0x000014a850102ae8] = 24 scopes pcs [0x000014a850102ae8,0x000014a850102b28] = 64 dependencies [0x000014a850102b28,0x000014a850102b30] = 8 Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp <http://bugreport.java.com/bugreport/crash.jsp> # [cluster-i:01272] *** Process received signal *** [cluster-i:01272] Signal: Aborted (6) [cluster-i:01272] Signal code: (-6) [cluster-i:01272] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630] [cluster-i:01272] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x14a86dcbb387] [cluster-i:01272] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x14a86dcbca78] [cluster-i:01272] [ 3] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xc00be9)[0x14a86d3f8be9] [cluster-i:01272] [ 4] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29619)[0x14a86d621619] [cluster-i:01272] [ 5] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29e9b)[0x14a86d621e9b] [cluster-i:01272] [ 6] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29ece)[0x14a86d621ece] [cluster-i:01272] [ 7] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(JVM_handle_linux_signal+0x1c0)[0x14a86d403a00] [cluster-i:01272] [ 8] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xbff5e8)[0x14a86d3f75e8] [cluster-i:01272] [ 9] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630] [cluster-i:01272] [10] [0x14a85752fdf4] [cluster-i:01272] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node cluster-i exited on signal 6 (Aborted). -------------------------------------------------------------------------- I am running ompi 4.1.2 with java-11. The project which I am trying to set up is here: https://github.com/Janekdererste/matsim-p <https://github.com/Janekdererste/matsim-p> I hope somebody can advise on what to try next. Thanks and all the best Janek