[
https://issues.apache.org/jira/browse/YARN-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199559#comment-14199559
]
Gordon Wang commented on YARN-2808:
-----------------------------------
setting yarn.resourcemanager.max-completed-applications to 0 is not a solution
from my view.
RM could cache completed app generic info, so that it is easy for user to check
the job status. And more, user could find the job info in RM's web UI, then,
redirect to history-server/timeline-server for more app detail info.
I think YARN-1794 is similar, but not the same.
{quote}
And was intending to fix by "modifing YarnClientImpl.getContainers(). If the
size of containers is 0. We can get it from timeline."
{quote}
I am afraid this fix is only for YARN-2808 case. But generally speaking, since
RM does not cache completed container info, if a app is running, both the size
of running containers and the size of completed container could be larger than
0. So, one possible fix is that pulling the container info both from RM and
timeline-server regardless of the state of app attempt.
> yarn client tool can not list app_attempt's container info correctly
> --------------------------------------------------------------------
>
> Key: YARN-2808
> URL: https://issues.apache.org/jira/browse/YARN-2808
> Project: Hadoop YARN
> Issue Type: Bug
> Components: client
> Reporter: Gordon Wang
>
> When enabling timeline server, yarn client can not list the container info
> for a application attempt correctly.
> Here is the reproduce step.
> # enabling yarn timeline server
> # submit a MR job
> # after the job is finished. use yarn client to list the container info of
> the app attempt.
> Then, since the RM has cached the application's attempt info, the output show
> {noformat}
> [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list
> appattempt_1415168250217_0001_000001
> 14/11/05 01:19:15 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/11/05 01:19:15 INFO impl.TimelineClientImpl: Timeline service address:
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/11/05 01:19:16 INFO client.RMProxy: Connecting to ResourceManager at
> /0.0.0.0:8032
> 14/11/05 01:19:16 INFO client.AHSProxy: Connecting to Application History
> server at /0.0.0.0:10200
> Total number of containers :0
> Container-Id Start Time Finish
> Time State Host
> LOG-URL
> {noformat}
> But if the rm is restarted, client can fetch the container info from timeline
> server correctly.
> {noformat}
> [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list
> appattempt_1415168250217_0001_000001
> 14/11/05 01:21:06 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/11/05 01:21:06 INFO impl.TimelineClientImpl: Timeline service address:
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/11/05 01:21:06 INFO client.RMProxy: Connecting to ResourceManager at
> /0.0.0.0:8032
> 14/11/05 01:21:06 INFO client.AHSProxy: Connecting to Application History
> server at /0.0.0.0:10200
> Total number of containers :4
> Container-Id Start Time Finish
> Time State Host
> LOG-URL
> container_1415168250217_0001_01_000001 1415168318376
> 1415168349896 COMPLETE localhost.localdomain:47024
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000001/container_1415168250217_0001_01_000001/hadoop
> container_1415168250217_0001_01_000002 1415168326399
> 1415168334858 COMPLETE localhost.localdomain:47024
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000002/container_1415168250217_0001_01_000002/hadoop
> container_1415168250217_0001_01_000003 1415168326400
> 1415168335277 COMPLETE localhost.localdomain:47024
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000003/container_1415168250217_0001_01_000003/hadoop
> container_1415168250217_0001_01_000004 1415168335825
> 1415168343873 COMPLETE localhost.localdomain:47024
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000004/container_1415168250217_0001_01_000004/hadoop
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)