[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632028#comment-13632028 ]
Chris Nauroth commented on YARN-445: ------------------------------------ Unfortunately, I don't believe the Unix signal concept maps cleanly to Windows. Some of the signal-related functions are defined on Windows, but with behavior quite different from the Unix equivalent. http://msdn.microsoft.com/en-us/library/xdkz3x12(v=vs.71).aspx For example, there are differences in exit codes seen by the signalled process, and some signal handling scenarios cause the process to start a new thread to handle it instead of interrupting an existing thread. Another alternative on Windows is console control handlers: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686016(v=vs.85).aspx I have seen projects that attempt to define a higher-level interface of "externally triggered command", using method names like gracefulShutdown, kill, and outputDebugInfo. On a Unix, the implementation can map these to signal/kill. On Windows, the implementation can map these to SetConsoleCtrlHandler/GenerateConsoleCtrlEvent. The problem is that this is a least common denominator approach that may not cover all possible use cases. Considering all of that, I can think of 3 different approaches to this feature: # Sacrifice trying to create a general-purpose signaling mechanism and just stay focused on triggering JVM features. (This is identical to Jason's #1.) # Use the Windows APIs I mentioned above to implement least-common-denominator signaling support. # Add YARN API support for ContainerLaunchContext to accept a mapping of externally-triggered command names to code. (i.e. {{ctx.setExternalCommand("gracefulShutdown", "kill -TERM $CONTAINER_PID")}}. Then, during execution, the AM could send a message to the NM saying "gracefulShutdown container_X". When the NM receives the message, it could look up "gracefulShutdown" in the map of external commands and trigger the kill. For highly custom message handling scenarios (Windows console control events/named pipes/whatever else), the AM could ship a binary as a localized resource that contains the implementation, and the external command can be mapped to call that binary. Each of these approaches gets progressively more general-purpose, but also progressively more complex. The last one in particular gives maximum flexibility, but makes the API challenging for AM writers. A side note on the last option: another variant is to add one more level of indirection in the API to support different container launch configuration per platform. This would make it easier to support heterogeneous clusters (mix of Unix and Windows nodes). This would let the AM say things like "use kill on Unix, but use something else on Windows" but without needing to know if specific nodes are running Unix or Windows. > Ability to signal containers > ---------------------------- > > Key: YARN-445 > URL: https://issues.apache.org/jira/browse/YARN-445 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: 2.0.5-beta > Reporter: Jason Lowe > > It would be nice if an ApplicationMaster could send signals to contaniers > such as SIGQUIT, SIGUSR1, etc. > For example, in order to replicate the jstack-on-task-timeout feature > implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an > interface for sending SIGQUIT to a container. For that specific feature we > could implement it as an additional field in the StopContainerRequest. > However that would not address other potential features like the ability for > an AM to trigger jstacks on arbitrary tasks *without* killing them. The > latter feature would be a very useful debugging tool for users who do not > have shell access to the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira