Hi, I am trying to extend an existing Java-Project to be run with open-mpi. I have managed to successfully set up open-mpi and my project on my local machine to conduct some test runs.
However, when I tried to set up things on our cluster I ran into some problems. I was able to run some trivial examples such as "HelloWorld" and "Ring" which I found on in the ompi-Github-repo. Unfortunately, when I try to run our app wrapped between MPI.Init(args) and MPI.Finalize() I get the following segmentation fault: $ mpirun -np 1 java -cp matsim-p-1.0-SNAPSHOT.jar org.matsim.parallel.RunMinimalMPIExample Java-Version: 11.0.2 before getTestScenario before load config WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. [cluster-i:1272 :0:1274] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xc) ==== backtrace (tid: 1274) ==== ================================= # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000014a85752fdf4, pid=1272, tid=1274 # # JRE version: Java(TM) SE Runtime Environment (11.0.2+9) (build 11.0.2+9-LTS) # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0.2+9-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # J 612 c2 java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; java.base@11.0.2 (8 bytes) @ 0x000014a85752fdf4 [0x000014a85752fdc0+0x0000000000000034] # # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /net/ils/laudan/mpi-test/matsim-p/hs_err_pid1272.log Compiled method (c2) 1052 612 4 java.lang.StringBuilder::append (8 bytes) total in heap [0x000014a85752fc10,0x000014a8575306a8] = 2712 relocation [0x000014a85752fd88,0x000014a85752fdb8] = 48 main code [0x000014a85752fdc0,0x000014a857530360] = 1440 stub code [0x000014a857530360,0x000014a857530378] = 24 metadata [0x000014a857530378,0x000014a8575303c0] = 72 scopes data [0x000014a8575303c0,0x000014a857530578] = 440 scopes pcs [0x000014a857530578,0x000014a857530658] = 224 dependencies [0x000014a857530658,0x000014a857530660] = 8 handler table [0x000014a857530660,0x000014a857530678] = 24 nul chk table [0x000014a857530678,0x000014a8575306a8] = 48 Compiled method (c1) 1053 263 3 java.lang.StringBuilder::<init> (7 bytes) total in heap [0x000014a850102790,0x000014a850102b30] = 928 relocation [0x000014a850102908,0x000014a850102940] = 56 main code [0x000014a850102940,0x000014a850102a20] = 224 stub code [0x000014a850102a20,0x000014a850102ac8] = 168 metadata [0x000014a850102ac8,0x000014a850102ad0] = 8 scopes data [0x000014a850102ad0,0x000014a850102ae8] = 24 scopes pcs [0x000014a850102ae8,0x000014a850102b28] = 64 dependencies [0x000014a850102b28,0x000014a850102b30] = 8 Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # [cluster-i:01272] *** Process received signal *** [cluster-i:01272] Signal: Aborted (6) [cluster-i:01272] Signal code: (-6) [cluster-i:01272] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630] [cluster-i:01272] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x14a86dcbb387] [cluster-i:01272] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x14a86dcbca78] [cluster-i:01272] [ 3] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xc00be9)[0x14a86d3f8be9] [cluster-i:01272] [ 4] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29619)[0x14a86d621619] [cluster-i:01272] [ 5] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29e9b)[0x14a86d621e9b] [cluster-i:01272] [ 6] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29ece)[0x14a86d621ece] [cluster-i:01272] [ 7] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(JVM_handle_linux_signal+0x1c0)[0x14a86d403a00] [cluster-i:01272] [ 8] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xbff5e8)[0x14a86d3f75e8] [cluster-i:01272] [ 9] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630] [cluster-i:01272] [10] [0x14a85752fdf4] [cluster-i:01272] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node cluster-i exited on signal 6 (Aborted). -------------------------------------------------------------------------- I am running ompi 4.1.2 with java-11. The project which I am trying to set up is here: https://github.com/Janekdererste/matsim-p I hope somebody can advise on what to try next. Thanks and all the best Janek