Ted,

fwiw, the 'master' branch has the behavior you expect.


meanwhile, you can simple edit your 'dum.sh' script and replace

/home/buildadina/src/aborttest02/aborttest02.exe

with

exec /home/buildadina/src/aborttest02/aborttest02.exe


Cheers,


Gilles


On 6/15/2017 3:01 AM, Ted Sussman wrote:
Hello,

My question concerns MPI_ABORT, indirect execution of executables by mpirun and 
Open
MPI 2.1.1.  When mpirun runs executables directly, MPI_ABORT works as expected, 
but
when mpirun runs executables indirectly, MPI_ABORT does not work as expected.

If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT works as 
expected in all
cases.

The examples given below have been simplified as far as possible to show the 
issues.

---

Example 1

Consider an MPI job run in the following way:

mpirun ... -app addmpw1

where the appfile addmpw1 lists two executables:

-n 1 -host gulftown ... aborttest02.exe
-n 1 -host gulftown ... aborttest02.exe

The two executables are executed on the local node gulftown.  aborttest02 calls 
MPI_ABORT
for rank 0, then sleeps.

The above MPI job runs as expected.  Both processes immediately abort when rank 
0 calls
MPI_ABORT.

---

Example 2

Now change the above example as follows:

mpirun ... -app addmpw2

where the appfile addmpw2 lists shell scripts:

-n 1 -host gulftown ... dum.sh
-n 1 -host gulftown ... dum.sh

dum.sh invokes aborttest02.exe.  So aborttest02.exe is executed indirectly by 
mpirun.

In this case, the MPI job only aborts process 0 when rank 0 calls MPI_ABORT.  
Process 1
continues to run.  This behavior is unexpected.

----

I have attached all files to this E-mail.  Since there are absolute pathnames 
in the files, to
reproduce my findings, you will need to update the pathnames in the appfiles 
and shell
scripts.  To run example 1,

sh run1.sh

and to run example 2,

sh run2.sh

---

I have tested these examples with Open MPI 1.4.3 and 2.0.3.  In Open MPI 1.4.3, 
both
examples work as expected.  Open MPI 2.0.3 has the same behavior as Open MPI 
2.1.1.

---

I would prefer that Open MPI 2.1.1 aborts both processes, even when the 
executables are
invoked indirectly by mpirun.  If there is an MCA setting that is needed to 
make Open MPI
2.1.1 abort both processes, please let me know.


Sincerely,

Theodore Sussman


The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.

    ---- File information -----------
      File:  config.log.bz2
      Date:  14 Jun 2017, 13:35
      Size:  146548 bytes.
      Type:  Binary


The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.

    ---- File information -----------
      File:  ompi_info.bz2
      Date:  14 Jun 2017, 13:35
      Size:  24088 bytes.
      Type:  Binary


The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any other MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.

    ---- File information -----------
      File:  aborttest02.tgz
      Date:  14 Jun 2017, 13:52
      Size:  4285 bytes.
      Type:  Binary


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to