The app is not calling MPI_ABORT directly. I dug a little deeper into it
but didn't find anything interesting. It just doesn't find the subdirectory
for output purposes (the internal error variable is 0) and simply crashes
when returning from the subroutine. It was just me not setting things up
Is your app calling MPI_Abort directly? There's a 2nd argument to MPI_ABORT
that should be passed to the output message. If it's not, we should
investigate that.
Or is your app aborting in some other, indirect method? If so, perhaps somehow
that 2nd argument is getting dropped somewhere
Hi Jeff. Sorry for the delay. It took a while but I was finally error to
track down the point where the app breaks down. The problem seems to
originate in an output subroutine, not because any MPI communication is
malfunctioning. My guess is that MPI_Abort needs to produce some error
message. Why
It's somewhat hard to say without more information.
What is your app doing when it calls abort?
On Jan 29, 2021, at 8:49 PM, Arturo Fernandez via users
mailto:users@lists.open-mpi.org>> wrote:
Hello,
My system is running CentOS8 & OpenMPI v4.1.0. Most stuff is working fine but
one app is
Hello,
My system is running CentOS8 & OpenMPI v4.1.0. Most stuff is working fine
but one app is aborting with:
MPI_ABORT was invoked on rank 7 in communicator MPI_COMM_WORLD
with errorcode 1734831948.
The other 23 MPI ranks also abort. I'm a bit confused by the high error
code. Does it mean