Ming Ma updated YARN-1897:
    Attachment: YARN-1897-6.patch

Thanks [~djp]! Yes, the approach taken in YARN-4131 is simpler by leveraging 
the existing protocol (to accomplish the kill container scenario. But changing 
the NM-RM protocol will allow us to support other useful scenarios besides kill 
container and thread dump.

* "Pause container" test case.
* Send compound command "kill %pid%; sleep 50; kill -9 %pid%.".
* Run some JVM command to capture perf data.
* Allow container to map custom signal such as SIGUSR2 to any action it wants 
to run in the container process.

I would like to clarify the scenarios described in YARN-4131 to see if it is 
something the signal container can cover.

* Kill container via preemption. This means RM will know about it first before 
NM, different from the signal container order which kills container without 
RM's knowledge first. It seems killing container without RM knowledge matches 
container crash test case better. But killing container via preemption can 
simulate preemption. But does it matter here as long as container is killed?
* Container Expiration. Is that only for a container that has been 
allocated/acquired before it is in running state? It seems it is used by RM to 
time out on container allocation/acquisition. It will trigger 
{{RMContainerEventType.EXPIRE}} and won't have impact on running container.

Here is the updated patch to fix some of the unit test failures. I still don't 
know why the mapred test fails even though it works on my machine.

Look forward to more comments from you.

> CLI and core support for signal container functionality
> -------------------------------------------------------
>                 Key: YARN-1897
>                 URL: https://issues.apache.org/jira/browse/YARN-1897
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897.1.patch
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.

This message was sent by Atlassian JIRA

Reply via email to