[
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrey Klochkov updated YARN-445:
---------------------------------
Attachment: YARN-445.patch
Attaching a patch that provides the simplest implementation:
- winutils is extended with an additional routine that uses console control
handlers to emulate ctrl+break on the container. For Java containers it
roughly corresponds to QUIT signal on Linux.
- ContainerManagerProtocol is extended with signalContainers() method which
accepts a signal number to send. Currently the implementation accepts QUIT
(i.e. value 3) signal only and rejects the request otherwise.
- TestContainerManager is extended accordingly and executed successfully under
Windows, OSX and Linux.
This provides a simple implementation that would allow to troubleshoot
containers without killing them, as the initial description of the feature is
stating. If needed we may extract an additional Jira to extend this further
with allowing arbitrary map of commands to be provided in submission context
and then invoked through the NM API.
> Ability to signal containers
> ----------------------------
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 2.1.0-beta
> Reporter: Jason Lowe
> Attachments: YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an
> interface for sending SIGQUIT to a container. For that specific feature we
> could implement it as an additional field in the StopContainerRequest.
> However that would not address other potential features like the ability for
> an AM to trigger jstacks on arbitrary tasks *without* killing them. The
> latter feature would be a very useful debugging tool for users who do not
> have shell access to the nodes.
--
This message was sent by Atlassian JIRA
(v6.1#6144)