[
https://issues.apache.org/jira/browse/YARN-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843863#comment-16843863
]
Prabhu Joseph edited comment on YARN-8224 at 5/20/19 11:01 AM:
---------------------------------------------------------------
[~snemeth] Have missed to debug when the issue happened. But as per my
understanding, {{RMAppImpl}} adds the {{NodeId}} into Map
{{logAggregationStatus}} when container is Acquired in
{{AppRunningOnNodeTransition}}. But when the container is Killed after
Acquired, the NM looks does not have any information about this container.
{{RMAppImpl#getLogAggregationReportsForApp}} will set TIME_OUT since the Node
does not send any {{LogAggregationReport}} for this container.
I think this issue can happen by killing a acquired container or stopping NM
after container is acquired.
was (Author: prabhu joseph):
[~snemeth] Have missed to debug when the issue happened. But as per my
understanding, {{RMAppImpl}} adds the {{NodeId}} into Map
{{logAggregationStatus}} when container is Acquired in
AppRunningOnNodeTransition. But when the container is killed after Acquired,
the NM looks does not have any information about this container.
RMAppImpl#getLogAggregationReportsForApp will set TIME_OUT since the Node does
not send any LogAggregationReport for this container.
I think this issue can happen by killing a acquired container or stopping NM
after container is acquired.
> LogAggregation status TIME_OUT for absent container misleading
> --------------------------------------------------------------
>
> Key: YARN-8224
> URL: https://issues.apache.org/jira/browse/YARN-8224
> Project: Hadoop YARN
> Issue Type: Bug
> Components: log-aggregation
> Affects Versions: 2.7.3
> Reporter: Prabhu Joseph
> Assignee: Prabhu Joseph
> Priority: Major
>
> When a container is not launched on NM and it is absent, RM still tries to
> get the Log Aggregation Status and reports the status as TIME_OUT in RM UI.
> {code}
> 2018-04-26 12:47:38,403 WARN containermanager.ContainerManagerImpl
> (ContainerManagerImpl.java:handle(1070)) - Event EventType: KILL_CONTAINER
> sent to absent container container_e361_1524687599273_2110_01_000770
> 2018-04-26 12:49:31,743 WARN containermanager.ContainerManagerImpl
> (ContainerManagerImpl.java:handle(1086)) - Event EventType:
> FINISH_APPLICATION sent to absent application application_1524687599273_2110
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]