[
https://issues.apache.org/jira/browse/YARN-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947656#comment-15947656
]
Jason Lowe commented on YARN-6401:
--
Ah, sorry. I was thinking it was ignoring SIGTERM and thus not cleaning up
because it would get killed by the subsequent SIGKILL. Instead it sounds like
it _is_ responding to SIGTERM but not cleaning up. Isn't that a bit odd? The
whole point of SIGTERM is to request a shutdown of the process rather than
forcing one.
I'm not an httpd expert, so I started digging into the docs to try to
understand why it wouldn't do something sane with TERM but does with a
non-standard signal like WINCH. Turns out it does handle TERM, but it's
aggressive such that in-progress requests may be interrupted/canceled. WINCH
only advises things to exit, which sounds like active requests could continue
to be processed but the listen port is no longer monitored so no new requests
will be processed.
What worries me here is that we can still end up with an unorderly shutdown
even if YARN sent WINCH instead of TERM. The default delay between the TERM and
KILL signals is relatively short, which is why the processing httpd does for
TERM seems more appropriate here. If a request could take hundreds of
milliseconds to process then the KILL is going to arrive too soon after the
WINCH signal unless the delay between the two signals is widened. However that
delay is not a per-app setting, and making it a per-app setting would cause a
DoS problem. Containers are often killed because YARN needs the container to
leave in a timely manner (e.g.: container running beyond limits, preemption,
etc.).
So I still think this is something better handled by the application framework
(in this case Slider) rather than YARN. MapReduce has a similar example.
MapReduce jobs can be killed via YARN, but it's harsh and things are often lost
when this occurs. That's why the {{mapred job -kill}} command first tries to
kill the job by contacting the AM and requesting it to do an orderly shutdown
outside of YARN, and only falls back on YARN to terminate the containers if the
job is unresponsive to the kill request. I think the same thing applies here.
If we really want an orderly shutdown to httpd so we won't kill outstanding
requests (even if they can take a while) then Slider (or some layer on top of
Slider) should support sending the WINCH signals to the containers for the app
and then the app can terminate when all containers have completed their
shutdown. Then the application can implement an arbitrary,
application-specific shutdown sequence and timing. If YARN needs to do the
killing directly then we cannot wait an arbitrary amount of time for the app to
cleanup and shutdown gracefully.
I think YARN will still need some support to send the WINCH signal in either
case. Currently containers can be sent signals after YARN-1897, but it's only
a restricted subset that can be translated cross-platform. That would need to
be extended to support more arbitrary signals like WINCH.
> terminating signal should be able to specify per application to support
> graceful-stop
> -
>
> Key: YARN-6401
> URL: https://issues.apache.org/jira/browse/YARN-6401
> Project: Hadoop YARN
> Issue Type: Improvement
>Reporter: kyungwan nam
>
> when stop container, first send SIGTERM to the process.
> after a while, send SIGKILL if the process is still alive.
> above process is always the same for any application.
> but, to graceful-stop, sometimes it need to send another signal instead of
> SIGTERM.
> for instance, if apache httpd on slider is running, SIGWINCH should be came
> to stop gracefully.
> the way to stop gracefully is depend on application.
> it will be good if we can define a signal to terminate per application.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org