[jira] [Commented] (YARN-6401) terminating signal should be able to specify per application to support graceful-stop

Jason Lowe (JIRA) Wed, 29 Mar 2017 11:29:02 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947656#comment-15947656
 ]


Jason Lowe commented on YARN-6401:
----------------------------------

Ah, sorry.  I was thinking it was ignoring SIGTERM and thus not cleaning up 
because it would get killed by the subsequent SIGKILL.  Instead it sounds like 
it _is_ responding to SIGTERM but not cleaning up.  Isn't that a bit odd?  The 
whole point of SIGTERM is to request a shutdown of the process rather than 
forcing one.

I'm not an httpd expert, so I started digging into the docs to try to 
understand why it wouldn't do something sane with TERM but does with a 
non-standard signal like WINCH.  Turns out it does handle TERM, but it's 
aggressive such that in-progress requests may be interrupted/canceled.  WINCH 
only advises things to exit, which sounds like active requests could continue 
to be processed but the listen port is no longer monitored so no new requests 
will be processed.

What worries me here is that we can still end up with an unorderly shutdown 
even if YARN sent WINCH instead of TERM. The default delay between the TERM and 
KILL signals is relatively short,  which is why the processing httpd does for 
TERM seems more appropriate here.  If a request could take hundreds of 
milliseconds to process then the KILL is going to arrive too soon after the 
WINCH signal unless the delay between the two signals is widened.  However that 
delay is not a per-app setting, and making it a per-app setting would cause a 
DoS problem.  Containers are often killed because YARN needs the container to 
leave in a timely manner (e.g.: container running beyond limits, preemption, 
etc.).

So I still think this is something better handled by the application framework 
(in this case Slider) rather than YARN.  MapReduce has a similar example.  
MapReduce jobs can be killed via YARN, but it's harsh and things are often lost 
when this occurs.  That's why the {{mapred job -kill}} command first tries to 
kill the job by contacting the AM and requesting it to do an orderly shutdown 
outside of YARN, and only falls back on YARN to terminate the containers if the 
job is unresponsive to the kill request.  I think the same thing applies here.  
If we really want an orderly shutdown to httpd so we won't kill outstanding 
requests (even if they can take a while) then Slider (or some layer on top of 
Slider) should support sending the WINCH signals to the containers for the app 
and then the app can terminate when all containers have completed their 
shutdown.  Then the application can implement an arbitrary, 
application-specific shutdown sequence and timing.  If YARN needs to do the 
killing directly then we cannot wait an arbitrary amount of time for the app to 
cleanup and shutdown gracefully.

I think YARN will still need some support to send the WINCH signal in either 
case.  Currently containers can be sent signals after YARN-1897, but it's only 
a restricted subset that can be translated cross-platform.  That would need to 
be extended to support more arbitrary signals like WINCH.

> terminating signal should be able to specify per application to support 
> graceful-stop
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-6401
>                 URL: https://issues.apache.org/jira/browse/YARN-6401
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: kyungwan nam
>
> when stop container, first send SIGTERM to the process.
> after a while, send SIGKILL if the process is still alive.
> above process is always the same for any application.
> but, to graceful-stop, sometimes it need to send another signal instead of 
> SIGTERM.
> for instance, if apache httpd on slider is running, SIGWINCH should be came 
> to stop gracefully.
> the way to stop gracefully is depend on application.
> it will be good if we can define a signal to terminate per application.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6401) terminating signal should be able to specify per application to support graceful-stop

Reply via email to