[
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902278#comment-13902278
]
Ming Ma commented on YARN-445:
------------------------------
[Gera
Shegalov|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=jira.shegalov]
and I discussed the idea of providing such signal functionality at yarn layer
without AM involved. I have got the basic prototype working and would like get
feedback from others.
The benefit of this approach is other yarn applications such as Spark don't
need to write any code to get the benefit of this feature. If we decide to
extend the interface to support jmap by allowing users to running any
processing script onto the container in the future, all yarn java applications
will get it from free. Here how it works.
1. Client is able to ask RM to signal a specific container as long as it passes
authorization.
{code:title=SignalContainerRequest.java|borderStyle=solid}
public interface SignalContainerRequest {
/**
* Get the <code>ContainerId</code> of the container to signal.
* @return <code>ContainerId</code> of the container to signal.
*/
@Public
@Stable
public abstract ContainerId getContainerId();
@Private
@Stable
public abstract void setContainerId(ContainerId containerId);
@Public
@Stable
public abstract int getSignal();
@Private
@Stable
public abstract void setSignal(int signal);
}
{code}
{code:title=ClientRMProtocol.java|borderStyle=solid}
/**
* Signal a running container.
*
* @param request the container to signal.
* @return an empty response.
* @throws YarnRemoteException
*/
public SignalContainerResponse signalContainer(
SignalContainerRequest request)
throws YarnRemoteException;
{code}
2. RM will provide the container id to the corresponding NM in the next
heartbeat. HeartbeatResponse interface is modified to provide such information.
3. AM isn't involved.
4. From customers point of view, on the CLI, customers use "bin/yarn
application -signal $containerid 3" to capture jstack. On the web UI, customers
can click on links on container web page as well as MR job page
Of course, this is orthogonal to general signal support across different OS
platforms.
> Ability to signal containers
> ----------------------------
>
> Key: YARN-445
> URL: https://issues.apache.org/jira/browse/YARN-445
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Jason Lowe
> Assignee: Andrey Klochkov
> Attachments: YARN-445--n2.patch, YARN-445--n3.patch,
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an
> interface for sending SIGQUIT to a container. For that specific feature we
> could implement it as an additional field in the StopContainerRequest.
> However that would not address other potential features like the ability for
> an AM to trigger jstacks on arbitrary tasks *without* killing them. The
> latter feature would be a very useful debugging tool for users who do not
> have shell access to the nodes.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)