Ted,
fwiw, the 'master' branch has the behavior you expect. meanwhile, you can simple edit your 'dum.sh' script and replace /home/buildadina/src/aborttest02/aborttest02.exe with exec /home/buildadina/src/aborttest02/aborttest02.exe Cheers, Gilles On 6/15/2017 3:01 AM, Ted Sussman wrote:
Hello, My question concerns MPI_ABORT, indirect execution of executables by mpirun and Open MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT works as expected, but when mpirun runs executables indirectly, MPI_ABORT does not work as expected. If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT works as expected in all cases. The examples given below have been simplified as far as possible to show the issues. --- Example 1 Consider an MPI job run in the following way: mpirun ... -app addmpw1 where the appfile addmpw1 lists two executables: -n 1 -host gulftown ... aborttest02.exe -n 1 -host gulftown ... aborttest02.exe The two executables are executed on the local node gulftown. aborttest02 calls MPI_ABORT for rank 0, then sleeps. The above MPI job runs as expected. Both processes immediately abort when rank 0 calls MPI_ABORT. --- Example 2 Now change the above example as follows: mpirun ... -app addmpw2 where the appfile addmpw2 lists shell scripts: -n 1 -host gulftown ... dum.sh -n 1 -host gulftown ... dum.sh dum.sh invokes aborttest02.exe. So aborttest02.exe is executed indirectly by mpirun. In this case, the MPI job only aborts process 0 when rank 0 calls MPI_ABORT. Process 1 continues to run. This behavior is unexpected. ---- I have attached all files to this E-mail. Since there are absolute pathnames in the files, to reproduce my findings, you will need to update the pathnames in the appfiles and shell scripts. To run example 1, sh run1.sh and to run example 2, sh run2.sh --- I have tested these examples with Open MPI 1.4.3 and 2.0.3. In Open MPI 1.4.3, both examples work as expected. Open MPI 2.0.3 has the same behavior as Open MPI 2.1.1. --- I would prefer that Open MPI 2.1.1 aborts both processes, even when the executables are invoked indirectly by mpirun. If there is an MCA setting that is needed to make Open MPI 2.1.1 abort both processes, please let me know. Sincerely, Theodore Sussman The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any other MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance. ---- File information ----------- File: config.log.bz2 Date: 14 Jun 2017, 13:35 Size: 146548 bytes. Type: Binary The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any other MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance. ---- File information ----------- File: ompi_info.bz2 Date: 14 Jun 2017, 13:35 Size: 24088 bytes. Type: Binary The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any other MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance. ---- File information ----------- File: aborttest02.tgz Date: 14 Jun 2017, 13:52 Size: 4285 bytes. Type: Binary _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users