[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632028#comment-13632028
 ] 

Chris Nauroth commented on YARN-445:
------------------------------------

Unfortunately, I don't believe the Unix signal concept maps cleanly to Windows. 
 Some of the signal-related functions are defined on Windows, but with behavior 
quite different from the Unix equivalent.

http://msdn.microsoft.com/en-us/library/xdkz3x12(v=vs.71).aspx

For example, there are differences in exit codes seen by the signalled process, 
and some signal handling scenarios cause the process to start a new thread to 
handle it instead of interrupting an existing thread.

Another alternative on Windows is console control handlers:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx

I have seen projects that attempt to define a higher-level interface of 
"externally triggered command", using method names like gracefulShutdown, kill, 
and outputDebugInfo.  On a Unix, the implementation can map these to 
signal/kill.  On Windows, the implementation can map these to 
SetConsoleCtrlHandler/GenerateConsoleCtrlEvent.  The problem is that this is a 
least common denominator approach that may not cover all possible use cases.

Considering all of that, I can think of 3 different approaches to this feature:

# Sacrifice trying to create a general-purpose signaling mechanism and just 
stay focused on triggering JVM features.  (This is identical to Jason's #1.)
# Use the Windows APIs I mentioned above to implement least-common-denominator 
signaling support.
# Add YARN API support for ContainerLaunchContext to accept a mapping of 
externally-triggered command names to code.  (i.e. 
{{ctx.setExternalCommand("gracefulShutdown", "kill -TERM $CONTAINER_PID")}}.  
Then, during execution, the AM could send a message to the NM saying 
"gracefulShutdown container_X".  When the NM receives the message, it could 
look up "gracefulShutdown" in the map of external commands and trigger the 
kill.  For highly custom message handling scenarios (Windows console control 
events/named pipes/whatever else), the AM could ship a binary as a localized 
resource that contains the implementation, and the external command can be 
mapped to call that binary.

Each of these approaches gets progressively more general-purpose, but also 
progressively more complex.  The last one in particular gives maximum 
flexibility, but makes the API challenging for AM writers.

A side note on the last option: another variant is to add one more level of 
indirection in the API to support different container launch configuration per 
platform.  This would make it easier to support heterogeneous clusters (mix of 
Unix and Windows nodes).  This would let the AM say things like "use kill on 
Unix, but use something else on Windows" but without needing to know if 
specific nodes are running Unix or Windows.

                
> Ability to signal containers
> ----------------------------
>
>                 Key: YARN-445
>                 URL: https://issues.apache.org/jira/browse/YARN-445
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.0.5-beta
>            Reporter: Jason Lowe
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to