Re: [OMPI users] Signal propagation in 2.0.1

2016-12-02 Thread r...@open-mpi.org
Fix is on the way: https://github.com/open-mpi/ompi/pull/2498 


Thanks
Ralph

> On Dec 1, 2016, at 10:49 AM, r...@open-mpi.org wrote:
> 
> Yeah, that’s a bug - we’ll have to address it
> 
> Thanks
> Ralph
> 
>> On Nov 28, 2016, at 9:29 AM, Noel Rycroft > > wrote:
>> 
>> I'm seeing different behaviour between Open MPI 1.8.4 and 2.0.1 with regards 
>> to signal propagation.
>> 
>> With version 1.8.4 mpirun seems to propagate SIGTERM to the tasks it starts 
>> which enables the tasks to handle SIGTERM.
>> 
>> In version 2.0.1 mpirun does not seem to propagate SIGTERM and instead I 
>> suspect it's sending SIGKILL immediately. Because the child tasks are not 
>> given a chance to handle SIGTERM they end up orphaning their child processes.
>> 
>> I have a pretty simply reproducer which consists of:
>> A simple MPI application that sleeps for a number of seconds.
>> A simple bash script which launches mpirun.  
>> A second bash script which is used to launch a 'child' MPI application 
>> 'sleep' binary
>> Both scripts launch their children in the background, and 'wait' on 
>> completion. They both install signal handlers for SIGTERM.
>> 
>> When SIGTERM is sent to the top level script it is explicitly propagated to 
>> 'mpirun' via the signal handler. 
>> 
>> In Open MPI 1.8.4 SIGTERM is propagated to the child MPI tasks which in turn 
>> explicitly propagate the signal to the child binary processes.
>> 
>> In Open MPI 2.0.1 I see no evidence that SIGTERM is propagated to the child 
>> MPI tasks. Instead those tasks are killed and their children (the 
>> application binaries) are orphaned.
>> 
>> Is the difference in behaviour between the different versions expected..?
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Signal propagation in 2.0.1

2016-12-01 Thread r...@open-mpi.org
Yeah, that’s a bug - we’ll have to address it

Thanks
Ralph

> On Nov 28, 2016, at 9:29 AM, Noel Rycroft  wrote:
> 
> I'm seeing different behaviour between Open MPI 1.8.4 and 2.0.1 with regards 
> to signal propagation.
> 
> With version 1.8.4 mpirun seems to propagate SIGTERM to the tasks it starts 
> which enables the tasks to handle SIGTERM.
> 
> In version 2.0.1 mpirun does not seem to propagate SIGTERM and instead I 
> suspect it's sending SIGKILL immediately. Because the child tasks are not 
> given a chance to handle SIGTERM they end up orphaning their child processes.
> 
> I have a pretty simply reproducer which consists of:
> A simple MPI application that sleeps for a number of seconds.
> A simple bash script which launches mpirun.  
> A second bash script which is used to launch a 'child' MPI application 
> 'sleep' binary
> Both scripts launch their children in the background, and 'wait' on 
> completion. They both install signal handlers for SIGTERM.
> 
> When SIGTERM is sent to the top level script it is explicitly propagated to 
> 'mpirun' via the signal handler. 
> 
> In Open MPI 1.8.4 SIGTERM is propagated to the child MPI tasks which in turn 
> explicitly propagate the signal to the child binary processes.
> 
> In Open MPI 2.0.1 I see no evidence that SIGTERM is propagated to the child 
> MPI tasks. Instead those tasks are killed and their children (the application 
> binaries) are orphaned.
> 
> Is the difference in behaviour between the different versions expected..?
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users