I am trying to extend an existing Java-Project to be run with
open-mpi. I have managed to successfully set up open-mpi and my
project on my local machine to conduct some test runs.
However, when I tried to set up things on our cluster I ran into some
problems. I was able to run some trivial examples such as "HelloWorld"
and "Ring" which I found on in the ompi-Github-repo. Unfortunately,
when I try to run our app wrapped between MPI.Init(args) and
MPI.Finalize() I get the following segmentation fault:
$ mpirun -np 1 java -cp matsim-p-1.0-SNAPSHOT.jar
org.matsim.parallel.RunMinimalMPIExample
Java-Version: 11.0.2
before getTestScenario
before load config
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This
will impact performance.
[cluster-i:1272 :0:1274] Caught signal 11 (Segmentation fault: address
not mapped to object at address 0xc)
==== backtrace (tid: 1274) ====
=================================
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000014a85752fdf4, pid=1272, tid=1274
#
# JRE version: Java(TM) SE Runtime Environment (11.0.2+9) (build
11.0.2+9-LTS)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0.2+9-LTS, mixed
mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# J 612 c2
java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
java.base@11.0.2 (8 bytes) @ 0x000014a85752fdf4
[0x000014a85752fdc0+0x0000000000000034]
#
# No core dump will be written. Core dumps have been disabled. To
enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /net/ils/laudan/mpi-test/matsim-p/hs_err_pid1272.log
Compiled method (c2) 1052 612 4
java.lang.StringBuilder::append (8 bytes)
total in heap [0x000014a85752fc10,0x000014a8575306a8] = 2712
relocation [0x000014a85752fd88,0x000014a85752fdb8] = 48
main code [0x000014a85752fdc0,0x000014a857530360] = 1440
stub code [0x000014a857530360,0x000014a857530378] = 24
metadata [0x000014a857530378,0x000014a8575303c0] = 72
scopes data [0x000014a8575303c0,0x000014a857530578] = 440
scopes pcs [0x000014a857530578,0x000014a857530658] = 224
dependencies [0x000014a857530658,0x000014a857530660] = 8
handler table [0x000014a857530660,0x000014a857530678] = 24
nul chk table [0x000014a857530678,0x000014a8575306a8] = 48
Compiled method (c1) 1053 263 3
java.lang.StringBuilder::<init> (7 bytes)
total in heap [0x000014a850102790,0x000014a850102b30] = 928
relocation [0x000014a850102908,0x000014a850102940] = 56
main code [0x000014a850102940,0x000014a850102a20] = 224
stub code [0x000014a850102a20,0x000014a850102ac8] = 168
metadata [0x000014a850102ac8,0x000014a850102ad0] = 8
scopes data [0x000014a850102ad0,0x000014a850102ae8] = 24
scopes pcs [0x000014a850102ae8,0x000014a850102b28] = 64
dependencies [0x000014a850102b28,0x000014a850102b30] = 8
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is
disabled
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
<http://bugreport.java.com/bugreport/crash.jsp>
#
[cluster-i:01272] *** Process received signal ***
[cluster-i:01272] Signal: Aborted (6)
[cluster-i:01272] Signal code: (-6)
[cluster-i:01272] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630]
[cluster-i:01272] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x14a86dcbb387]
[cluster-i:01272] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x14a86dcbca78]
[cluster-i:01272] [ 3]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xc00be9)[0x14a86d3f8be9]
[cluster-i:01272] [ 4]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29619)[0x14a86d621619]
[cluster-i:01272] [ 5]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29e9b)[0x14a86d621e9b]
[cluster-i:01272] [ 6]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29ece)[0x14a86d621ece]
[cluster-i:01272] [ 7]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(JVM_handle_linux_signal+0x1c0)[0x14a86d403a00]
[cluster-i:01272] [ 8]
/afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xbff5e8)[0x14a86d3f75e8]
[cluster-i:01272] [ 9] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630]
[cluster-i:01272] [10] [0x14a85752fdf4]
[cluster-i:01272] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node cluster-i exited
on signal 6 (Aborted).
--------------------------------------------------------------------------
I am running ompi 4.1.2 with java-11. The project which I am trying to
set up is here: https://github.com/Janekdererste/matsim-p
<https://github.com/Janekdererste/matsim-p>
I hope somebody can advise on what to try next. Thanks and all the best
Janek