Hi Janek,

If you compile your program and produce a class file, does it run using mpirun -np 1 java matsim-p

Try to compile OpenMPI from source as indicated at https://www-lb.open-mpi.org/faq/?category=java

Java tends to require more memory, so if using a batch system be sure to request enough.

Possibly also interesting to try might be:
https://github.com/mboysan/ping-pong-mpi-tcp

Benson

On 3/17/22 7:03 PM, Laudan, Janek via users wrote:
Hi Howard,

thanks for your reply. I am using version 4.1.2 and I didn't compile with the mpijavac wrapper. I was hoping that I could maintain some form of our maven build infrastructure and then deploy the resulting jar. The Project set up is here: https://github.com/Janekdererste/matsim-p/blob/master/pom.xml <https://github.com/Janekdererste/matsim-p/blob/master/pom.xml>

All the best,
Janek

------ Originalnachricht ------
Von: "Pritchard Jr., Howard" <howa...@lanl.gov <mailto:howa...@lanl.gov>>
An: "Laudan, Janek" <lau...@tu-berlin.de <mailto:lau...@tu-berlin.de>>; "Open MPI Users" <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>>
Gesendet: 17.03.2022 16:59:04
Betreff: Re: [EXTERNAL] [OMPI users] Java Segentation Fault

HI Janek,

A few questions.

First which version of Open MPI are you using?

Did you compile your code with the Open MPI mpijavac wrapper?

Howard

*From: *users <users-boun...@lists.open-mpi.org <mailto:users-boun...@lists.open-mpi.org>> on behalf of "Laudan, Janek via users" <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> *Reply-To: *"Laudan, Janek" <lau...@tu-berlin.de <mailto:lau...@tu-berlin.de>>, Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>>
*Date: *Thursday, March 17, 2022 at 9:52 AM
*To: *"users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>" <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>>
*Cc: *"Laudan, Janek" <lau...@tu-berlin.de <mailto:lau...@tu-berlin.de>>
*Subject: *[EXTERNAL] [OMPI users] Java Segentation Fault

Hi,



I am trying to extend an existing Java-Project to be run with open-mpi. I have managed to successfully set up open-mpi and my project on my local machine to conduct some test runs.

However, when I tried to set up things on our cluster I ran into some problems. I was able to run some trivial examples such as "HelloWorld" and "Ring" which I found on in the ompi-Github-repo. Unfortunately, when I try to run our app wrapped between MPI.Init(args) and MPI.Finalize() I get the following segmentation fault:

$ mpirun -np 1 java -cp matsim-p-1.0-SNAPSHOT.jar org.matsim.parallel.RunMinimalMPIExample
Java-Version: 11.0.2
before getTestScenario
before load config
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. [cluster-i:1272 :0:1274] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xc)
==== backtrace (tid:   1274) ====
=================================
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000014a85752fdf4, pid=1272, tid=1274
#
# JRE version: Java(TM) SE Runtime Environment (11.0.2+9) (build 11.0.2+9-LTS) # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0.2+9-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# J 612 c2 java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; java.base@11.0.2 (8 bytes) @ 0x000014a85752fdf4 [0x000014a85752fdc0+0x0000000000000034]
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /net/ils/laudan/mpi-test/matsim-p/hs_err_pid1272.log
Compiled method (c2)    1052  612       4 java.lang.StringBuilder::append (8 bytes)
 total in heap  [0x000014a85752fc10,0x000014a8575306a8] = 2712
 relocation     [0x000014a85752fd88,0x000014a85752fdb8] = 48
 main code      [0x000014a85752fdc0,0x000014a857530360] = 1440
 stub code      [0x000014a857530360,0x000014a857530378] = 24
 metadata       [0x000014a857530378,0x000014a8575303c0] = 72
 scopes data    [0x000014a8575303c0,0x000014a857530578] = 440
 scopes pcs     [0x000014a857530578,0x000014a857530658] = 224
 dependencies   [0x000014a857530658,0x000014a857530660] = 8
 handler table  [0x000014a857530660,0x000014a857530678] = 24
 nul chk table  [0x000014a857530678,0x000014a8575306a8] = 48
Compiled method (c1)    1053  263       3 java.lang.StringBuilder::<init> (7 bytes)
 total in heap  [0x000014a850102790,0x000014a850102b30] = 928
 relocation     [0x000014a850102908,0x000014a850102940] = 56
 main code      [0x000014a850102940,0x000014a850102a20] = 224
 stub code      [0x000014a850102a20,0x000014a850102ac8] = 168
 metadata       [0x000014a850102ac8,0x000014a850102ad0] = 8
 scopes data    [0x000014a850102ad0,0x000014a850102ae8] = 24
 scopes pcs     [0x000014a850102ae8,0x000014a850102b28] = 64
 dependencies   [0x000014a850102b28,0x000014a850102b30] = 8
Could not load hsdis-amd64.so; library not loadable; PrintAssembly is disabled
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp <http://bugreport.java.com/bugreport/crash.jsp>
#
[cluster-i:01272] *** Process received signal ***
[cluster-i:01272] Signal: Aborted (6)
[cluster-i:01272] Signal code:  (-6)
[cluster-i:01272] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630]
[cluster-i:01272] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x14a86dcbb387]
[cluster-i:01272] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x14a86dcbca78]
[cluster-i:01272] [ 3] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xc00be9)[0x14a86d3f8be9] [cluster-i:01272] [ 4] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29619)[0x14a86d621619] [cluster-i:01272] [ 5] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29e9b)[0x14a86d621e9b] [cluster-i:01272] [ 6] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29ece)[0x14a86d621ece] [cluster-i:01272] [ 7] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(JVM_handle_linux_signal+0x1c0)[0x14a86d403a00] [cluster-i:01272] [ 8] /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xbff5e8)[0x14a86d3f75e8]
[cluster-i:01272] [ 9] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630]
[cluster-i:01272] [10] [0x14a85752fdf4]
[cluster-i:01272] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node cluster-i exited on signal 6 (Aborted).
--------------------------------------------------------------------------

I am running ompi 4.1.2 with java-11. The project which I am trying to set up is here: https://github.com/Janekdererste/matsim-p <https://github.com/Janekdererste/matsim-p>

I hope somebody can advise on what to try next. Thanks and all the best

Janek


Reply via email to