[OMPI users] Process termination problem

Daniel Spångberg Thu, 16 Aug 2007 12:04:19 -0400

Dear Open-MPI user list members,

I am currently having a user with an application where one of theMPI-processes die, but the openmpi-system does not kill the rest of theapplication.

Since the mpirun man page states the following I would expect it to takecare of killing the application if a process exits without callingMPI_Finalize:


   Process Termination / Signal Handling

During the run of an MPI application, if any rank dies abnormally(either exiting before invoking MPI_FINALIZE, or dying as theresult of a signal), mpirun will print out an error message andkill the rest of the MPI application.

The following test program demonstrates the behaviour (program hangs untilit is killed by the user or batch system):


#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <mpi.h>

#define RANK_DEATH 1

int main(int argc, char **argv)
{
  int rank;
  MPI_Init(&argc,&argv);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);

  sleep(10);
  if (rank==RANK_DEATH)
    exit(1);
  sleep(10);
  MPI_Finalize();
  return 0;
}

I have tested this on openmpi 1.2.1 as well as the latest stable 1.2.3. Iam on Linux x86_64.

Is this a bug, or are there some flags I can use to force the mpirun (ororted, or...) to kill the whole MPI program when this happens?

If one of the application processes die from a signal (I have tested SEGVand FPE) rather than just exiting the whole application is indeed killed.


Best regards
Daniel Spångberg

[OMPI users] Process termination problem

Reply via email to