[ 
https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308259#comment-15308259
 ] 

Varun Saxena edited comment on YARN-5156 at 5/31/16 6:13 PM:
-------------------------------------------------------------

Looked at the code. NMTimelinePublisher publishes the YARN_CONTAINER_FINISHED 
event on ApplicationContainerFinishedEvent.
And this event is posted from ContainerImpl.

The issue here seems to be that we are cloning the container status and posting 
a ApplicationContainerFinishedEvent before the transition has completed and 
container state has been set to DONE. This means the container state is 
reported as RUNNING. ContainerImpl#sendFinishedEvents which posts a 
ApplicationContainerFinishedEvent is called from all those transitions which 
would lead the state to be changed to DONE. So in 
NMTimelinePublisher#publishContainerFinishedEvent we can simply set 
STATE_EVENT_INFO as DONE.
Or as we know that container finished event would always lead to a state of 
DONE, no need to send STATE_EVENT_INFO at all. Thoughts ?

{code:title=ContainerImpl.java|borderStyle=solid}
  @SuppressWarnings("unchecked")
  private void sendFinishedEvents() {
    // Inform the application
    @SuppressWarnings("rawtypes")
    EventHandler eventHandler = dispatcher.getEventHandler();

    ContainerStatus containerStatus = cloneAndGetContainerStatus();
    eventHandler.handle(new ApplicationContainerFinishedEvent(containerStatus));

    // Remove the container from the resource-monitor
    eventHandler.handle(new ContainerStopMonitoringEvent(containerId));
    // Tell the logService too
    eventHandler.handle(new LogHandlerContainerFinishedEvent(
      containerId, exitCode));
  }
{code}

Naga, you will be handling this ?


was (Author: varun_saxena):
Looked at the code. NMTimelinePublisher publishes the YARN_CONTAINER_FINISHED 
event on ApplicationContainerFinishedEvent.
And this event is posted from ContainerImpl.

The issue here seems to be that we are cloning the container status and posting 
a ApplicationContainerFinishedEvent before the transition has completed and 
container state has been set to DONE. This means the container state is 
reported as RUNNING. ContainerImpl#sendFinishedEvents which posts a 
ApplicationContainerFinishedEvent is called from all those transitions which 
would lead the state to be changed to DONE. So in 
NMTimelinePublisher#publishContainerFinishedEvent we can simply set 
STATE_EVENT_INFO as DONE.
Or when we know that container finished event would lead to a state of DONE, no 
need to send STATE_EVENT_INFO at all. Thoughts ?

{code:title=ContainerImpl.java|borderStyle=solid}
  @SuppressWarnings("unchecked")
  private void sendFinishedEvents() {
    // Inform the application
    @SuppressWarnings("rawtypes")
    EventHandler eventHandler = dispatcher.getEventHandler();

    ContainerStatus containerStatus = cloneAndGetContainerStatus();
    eventHandler.handle(new ApplicationContainerFinishedEvent(containerStatus));

    // Remove the container from the resource-monitor
    eventHandler.handle(new ContainerStopMonitoringEvent(containerId));
    // Tell the logService too
    eventHandler.handle(new LogHandlerContainerFinishedEvent(
      containerId, exitCode));
  }
{code}

Naga, you will be handling this ?

> YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
> -------------------------------------------------------------------------
>
>                 Key: YARN-5156
>                 URL: https://issues.apache.org/jira/browse/YARN-5156
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>
> On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do 
> we design this deliberately or it's a bug? 
> {code}
> {
> metrics: [ ],
> events: [
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: 1464213765890,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: 1464213761133,
> info: { }
> },
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: 1464213761132,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: 1464213761132,
> info: { }
> }
> ],
> id: "container_e15_1464213707405_0001_01_000018",
> type: "YARN_CONTAINER",
> createdtime: 1464213761132,
> info: {
> YARN_CONTAINER_ALLOCATED_PRIORITY: "20",
> YARN_CONTAINER_ALLOCATED_VCORE: 1,
> YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0",
> UID: 
> "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_000018",
> YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164",
> YARN_CONTAINER_ALLOCATED_MEMORY: 1024,
> SYSTEM_INFO_PARENT_ENTITY: {
> type: "YARN_APPLICATION_ATTEMPT",
> id: "appattempt_1464213707405_0001_000001"
> },
> YARN_CONTAINER_ALLOCATED_PORT: 64694
> },
> configs: { },
> isrelatedto: { },
> relatesto: { }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to