​Hello Siegmar and Gilles,

I made a reply where Gilles suggested, but figured I leave a note here in case 
the other was missed.


-Nathan


--
Nathaniel Graham
HPC-DES
Los Alamos National Laboratory
________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Gilles Gouaillardet 
<gilles.gouaillar...@gmail.com>
Sent: Monday, August 29, 2016 6:16 AM
To: Open MPI Users
Subject: Re: [OMPI users] problem with exceptions in Java interface

Hi Siegmar,

I will review PR 1698 and wait some more feedback from the developers, they 
might have different views than mine.
assuming PR 1698 does what you expect, it does not catch all user errors.
for example, if you MPI_Send a buffer that is too short, the exception might be 
thrown at any time.
in the worst case, it will occur in the progress thread and outside of any MPI 
call, which means it cannot be "converted" into a MPIException.

fwiw, we have a way to check buffers, but it requires
1. Open MPI is configure'd with --enable-memchecker
and
2. the MPI tasks are ran under valgrind.
iirc, valgrind will issue an error message if the buffer is invalid, and the 
app will crash after
(e.g. the MPI subroutine will not return with an error code the end user can 
"trap")

such checks might be easier to make in Java, and resulting errors might be 
easily made "trappable", but as far as I am concerned
1. this has a runtime overhead
2. this is a new development.

let's follow up at https://github.com/open-mpi/ompi/issues/1698 from now

Cheers,

Gilles

On Monday, August 29, 2016, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de<mailto:siegmar.gr...@informatik.hs-fulda.de>>
 wrote:
Hi Gilles,

isn't it possible to pass all exceptions from the Java interface
to the calling method? I can live with the current handling of
exceptions as well, although some exceptions can be handled
within my program and some will break my program even if I want
to handle exceptions myself. I understood PR 1698 in the way, that
all exceptions can be processed in the user program if the user
chooses MPI.ERRORS_RETURN (otherwise this change request wouldn't
have been necessary). Nevertheless, if you decide, things are as
they are, I'm happy with your decision as well.


Kind regards

Siegmar


Am 29.08.2016 um 10:30 schrieb Gilles Gouaillardet:
Siegmar and all,


i am puzzled with this error.

on one hand, it is caused by an invalid buffer

(e.g. buffer size is 1, but user suggests size is 2)

so i am fine with current behavior (e.g.
java.lang.ArrayIndexOutOfBoundsException is thrown)

/* if that was a C program, it would very likely SIGSEGV, e.g. Open MPI does
not catch this kind of error when checking params */


on the other hand, Open MPI could be enhanced to check the buffer size, and
throw a MPIException in this case.


as far as i am concerned, this is a feature request and not a bug.


thoughts anyone ?


Cheers,


Gilles

On 8/29/2016 3:48 PM, Siegmar Gross wrote:
Hi,

I have installed v1.10.3-31-g35ba6a1, openmpi-v2.0.0-233-gb5f0a4f,
and openmpi-dev-4691-g277c319 on my "SUSE Linux Enterprise Server
12 (x86_64)" with Sun C 5.14 beta and gcc-6.1.0. In May I had
reported a problem with Java execeptions (PR 1698) which had
been solved in June (PR 1803).

https://github.com/open-mpi/ompi/issues/1698
https://github.com/open-mpi/ompi/pull/1803

Unfortunately the problem still exists or exists once more
in all three branches.


loki fd1026 112 ompi_info | grep -e "Open MPI repo revision" -e "C compiler
absolute"
  Open MPI repo revision: dev-4691-g277c319
     C compiler absolute: /opt/solstudio12.5b/bin/cc
loki fd1026 112 mpijavac Exception_2_Main.java
warning: [path] bad path element
"/usr/local/openmpi-master_64_cc/lib64/shmem.jar": no such file or directory
1 warning
loki fd1026 113 mpiexec -np 1 java Exception_2_Main
Set error handler for MPI.COMM_WORLD to MPI.ERRORS_RETURN.
Call "bcast" with index out-of bounds.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
        at mpi.Comm.bcast(Native Method)
        at mpi.Comm.bcast(Comm.java:1252)
        at Exception_2_Main.main(Exception_2_Main.java:22)
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[58548,1],0]
  Exit code:    1
--------------------------------------------------------------------------
loki fd1026 114 exit



loki fd1026 116 ompi_info | grep -e "Open MPI repo revision" -e "C compiler
absolute"
  Open MPI repo revision: v2.0.0-233-gb5f0a4f
     C compiler absolute: /opt/solstudio12.5b/bin/cc
loki fd1026 117 mpijavac Exception_2_Main.java
warning: [path] bad path element
"/usr/local/openmpi-2.0.1_64_cc/lib64/shmem.jar": no such file or directory
1 warning
loki fd1026 118 mpiexec -np 1 java Exception_2_Main
Set error handler for MPI.COMM_WORLD to MPI.ERRORS_RETURN.
Call "bcast" with index out-of bounds.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
        at mpi.Comm.bcast(Native Method)
        at mpi.Comm.bcast(Comm.java:1252)
        at Exception_2_Main.main(Exception_2_Main.java:22)
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[58485,1],0]
  Exit code:    1
--------------------------------------------------------------------------
loki fd1026 119 exit



loki fd1026 107 ompi_info | grep -e "Open MPI repo revision" -e "C compiler
absolute"
  Open MPI repo revision: v1.10.3-31-g35ba6a1
     C compiler absolute: /opt/solstudio12.5b/bin/cc
loki fd1026 107 mpijavac Exception_2_Main.java
loki fd1026 108 mpiexec -np 1 java Exception_2_Main
Set error handler for MPI.COMM_WORLD to MPI.ERRORS_RETURN.
Call "bcast" with index out-of bounds.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
        at mpi.Comm.bcast(Native Method)
        at mpi.Comm.bcast(Comm.java:1231)
        at Exception_2_Main.main(Exception_2_Main.java:22)
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:

  Process name: [[34400,1],0]
  Exit code:    1
--------------------------------------------------------------------------
loki fd1026 109 exit




I would be grateful, if somebody can fix the problem. Thank you
very much for any help in advance.


Kind regards

Siegmar
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to