[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786572#comment-13786572
 ] 

Andrey Klochkov commented on YARN-445:
--------------------------------------

Steve, the current implementation will send the signal to the java started with 
bin/hbase as it sends it to all processes in the job object, e.g. all processes 
of the main container process. It can be replaced with sending the signal to 
all processes in the group instead, and I think the behavior will be the same. 

BTW I don't know how to do the opposite - i.e. how to avoid sending the signal 
to all processes of the container, on Windows (so the behavior on Linux is 
different as "bin/hbase" will receive the signal). I think this is fine as long 
as this difference is documented. In case of hbase the shell script can create 
a custom hook for SIGTERM and do whatever is needed in that case (e.g. send 
SIGTERM to the java process it started). 

There is one caveat in ctrl+break handling in case of a batch file starting a 
java process:
1. the batch file starts the java process
2. user sends ctrl+break to all processes in the group (or job object). java 
process prints thread dump. batch file doesn't react yet.
3. the java processes completes successfully
4. the batch file will not exit, it will print "Terminate batch job? (Y/N)" as 
it received the ctrl+break signal earlier.

The only way I see on how we can overcome this problem with batch file 
processes is to identify them somehow (by executable name?) when walking 
through the processes in the job object, and do not send them the signal. 
Sending ctrl+break to batch file processes doesn't make sense anyway as in 
newer Windows there's no way to disable or customize ctrl+break handling in 
batch files.

> Ability to signal containers
> ----------------------------
>
>                 Key: YARN-445
>                 URL: https://issues.apache.org/jira/browse/YARN-445
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jason Lowe
>         Attachments: YARN-445--n2.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to