[
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786572#comment-13786572
]
Andrey Klochkov commented on YARN-445:
--------------------------------------
Steve, the current implementation will send the signal to the java started with
bin/hbase as it sends it to all processes in the job object, e.g. all processes
of the main container process. It can be replaced with sending the signal to
all processes in the group instead, and I think the behavior will be the same.
BTW I don't know how to do the opposite - i.e. how to avoid sending the signal
to all processes of the container, on Windows (so the behavior on Linux is
different as "bin/hbase" will receive the signal). I think this is fine as long
as this difference is documented. In case of hbase the shell script can create
a custom hook for SIGTERM and do whatever is needed in that case (e.g. send
SIGTERM to the java process it started).
There is one caveat in ctrl+break handling in case of a batch file starting a
java process:
1. the batch file starts the java process
2. user sends ctrl+break to all processes in the group (or job object). java
process prints thread dump. batch file doesn't react yet.
3. the java processes completes successfully
4. the batch file will not exit, it will print "Terminate batch job? (Y/N)" as
it received the ctrl+break signal earlier.
The only way I see on how we can overcome this problem with batch file
processes is to identify them somehow (by executable name?) when walking
through the processes in the job object, and do not send them the signal.
Sending ctrl+break to batch file processes doesn't make sense anyway as in
newer Windows there's no way to disable or customize ctrl+break handling in
batch files.
> Ability to signal containers
> ----------------------------
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Jason Lowe
> Attachments: YARN-445--n2.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an
> interface for sending SIGQUIT to a container. For that specific feature we
> could implement it as an additional field in the StopContainerRequest.
> However that would not address other potential features like the ability for
> an AM to trigger jstacks on arbitrary tasks *without* killing them. The
> latter feature would be a very useful debugging tool for users who do not
> have shell access to the nodes.
--
This message was sent by Atlassian JIRA
(v6.1#6144)