Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-27 Thread Ted Sussman
Hello Ralph, Thanks for your quick reply and bug fix. I have obtained the update and tried it in my simple example, and also in the original program from which the simple example was extracted. The update works as expected :) Sincerely, Ted Sussman On 27 Jun 2017 at 12:13,

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-19 Thread Ted Sussman
I don't do any setting of process groups. dum.sh just invokes the executable: //aborttest10.exe On 19 Jun 2017 at 10:30, r...@open-mpi.org wrote: > When you fork that process off, do you set its process group? Or is it in the > same process group as the shell script? > > > On Jun 19,

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-19 Thread r...@open-mpi.org
When you fork that process off, do you set its process group? Or is it in the same process group as the shell script? > On Jun 19, 2017, at 10:19 AM, Ted Sussman wrote: > > If I replace the sleep with an infinite loop, I get the same behavior. One > "aborttest" process

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-19 Thread Ted Sussman
If I replace the sleep with an infinite loop, I get the same behavior. One "aborttest" process remains after all the signals are sent. On 19 Jun 2017 at 10:10, r...@open-mpi.org wrote: > > That is typical behavior when you throw something into "sleep" - not much we > can do about it, I >

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-19 Thread Ted Sussman
Hello, I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug. I have attached the abort test program aborttest10.tgz. This version sleeps for 5 sec before calling MPI_ABORT, so that I can check the pids using ps. This is what happens (see run2.sh.out). Open MPI invokes

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-17 Thread gilles
Ted, i do not observe the same behavior you describe with Open MPI 2.1.1 # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh abort.sh 31361 launching abort abort.sh 31362 launching abort I am rank 0 with pid 31363 I am rank 1 with pid 31364

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-16 Thread gilles
Ted, if you mpirun --mca odls_base_verbose 10 ... you will see which processes get killed and how Best regards, Gilles - Original Message - > Hello Jeff, > > Thanks for your comments. > > I am not seeing behavior #4, on the two computers that I have tested on, using Open MPI >

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-16 Thread Ted Sussman
Hello Jeff, Thanks for your comments. I am not seeing behavior #4, on the two computers that I have tested on, using Open MPI 2.1.1. I wonder if you can duplicate my results with the files that I have uploaded. Regarding what is the "correct" behavior, I am willing to modify my application

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-16 Thread Jeff Squyres (jsquyres)
Ted -- Sorry for jumping in late. Here's my $0.02... In the runtime, we can do 4 things: 1. Kill just the process that we forked. 2. Kill just the process(es) that call back and identify themselves as MPI processes (we don't track this right now, but we could add that functionality). 3. Union

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-16 Thread Ted Sussman
Hello Gilles and Ralph, Thank you for your advice so far. I appreciate the time that you have spent to educate me about the details of Open MPI. But I think that there is something fundamental that I don't understand. Consider Example 2 run with Open MPI 2.1.1. mpirun --> shell for process

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread Gilles Gouaillardet
Ted, note that the shell receives a SIGTERM followed by a SIGKILL (if needed ?) from Open MPI so if you cannot exec the MPI binary, you have the option to trap SIGTERM in your shell script, and then manually propagate it (or a SIGKILL) to the MPI app Cheers, Gilles On Fri, Jun 16, 2017 at

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread r...@open-mpi.org
You have to understand that we have no way of knowing who is making MPI calls - all we see is the proc that we started, and we know someone of that rank is running (but we have no way of knowing which of the procs you sub-spawned it is). So the behavior you are seeking only occurred in some

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread Ted Sussman
Hello Ralph, I am just an Open MPI end user, so I will need to wait for the next official release. mpirun --> shell for process 0 --> executable for process 0 --> MPI calls --> shell for process 1 --> executable for process 1 --> MPI calls ... I guess

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread r...@open-mpi.org
Yeah, things jittered a little there as we debated the “right” behavior. Generally, when we see that happening it means that a param is required, but somehow we never reached that point. See if https://github.com/open-mpi/ompi/pull/3704 helps - if

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread Ted Sussman
Thank you for your comments. Our application relies upon "dum.sh" to clean up after the process exits, either if the process exits normally, or if the process exits abnormally because of MPI_ABORT. If the process group is killed by MPI_ABORT, this clean up will not be performed. If exec is

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread r...@open-mpi.org
Here is how the system is working: Master: each process is put into its own process group upon launch. When we issue a “kill”, however, we only issue it to the individual process (instead of the process group that is headed by that child process). This is probably a bug as I don’t believe that

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-15 Thread Ted Sussman
Hello Gilles, Thank you for your quick answer. I confirm that if exec is used, both processes immediately abort. Now suppose that the line echo "After aborttest: OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK is added to the end of dum.sh. If Example 2 is run with Open MPI 1.4.3, the output

Re: [OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1

2017-06-14 Thread Gilles Gouaillardet
Ted, fwiw, the 'master' branch has the behavior you expect. meanwhile, you can simple edit your 'dum.sh' script and replace /home/buildadina/src/aborttest02/aborttest02.exe with exec /home/buildadina/src/aborttest02/aborttest02.exe Cheers, Gilles On 6/15/2017 3:01 AM, Ted Sussman