[ 
https://issues.apache.org/jira/browse/YARN-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199559#comment-14199559
 ] 

Gordon Wang commented on YARN-2808:
-----------------------------------

setting yarn.resourcemanager.max-completed-applications to 0 is not a solution 
from my view.
RM could cache completed app generic info, so that it is easy for user to check 
the job status. And more, user could find the job info in RM's web UI, then, 
redirect to history-server/timeline-server for more app detail info.

I think YARN-1794 is similar, but not the same.
{quote}
And was intending to fix by "modifing YarnClientImpl.getContainers(). If the 
size of containers is 0. We can get it from timeline."
{quote}
I am afraid this fix is only for YARN-2808 case. But generally speaking, since 
RM does not cache completed container info, if a app is running, both the size 
of running containers and the size of completed container could be larger than 
0. So, one possible fix is that pulling the container info both from RM and 
timeline-server regardless of the state of app attempt.

> yarn client tool can not list app_attempt's container info correctly
> --------------------------------------------------------------------
>
>                 Key: YARN-2808
>                 URL: https://issues.apache.org/jira/browse/YARN-2808
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>            Reporter: Gordon Wang
>
> When enabling timeline server, yarn client can not list the container info 
> for a application attempt correctly.
> Here is the reproduce step.
> # enabling yarn timeline server
> # submit a MR job
> # after the job is finished. use yarn client to list the container info of 
> the app attempt.
> Then, since the RM has cached the application's attempt info, the output show 
> {noformat}
> [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list 
> appattempt_1415168250217_0001_000001
> 14/11/05 01:19:15 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 14/11/05 01:19:15 INFO impl.TimelineClientImpl: Timeline service address: 
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/11/05 01:19:16 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 14/11/05 01:19:16 INFO client.AHSProxy: Connecting to Application History 
> server at /0.0.0.0:10200
> Total number of containers :0
>                   Container-Id                  Start Time             Finish 
> Time                   State                    Host                          
>       LOG-URL
> {noformat}
> But if the rm is restarted, client can fetch the container info from timeline 
> server correctly.
> {noformat}
> [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list 
> appattempt_1415168250217_0001_000001
> 14/11/05 01:21:06 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 14/11/05 01:21:06 INFO impl.TimelineClientImpl: Timeline service address: 
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/11/05 01:21:06 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 14/11/05 01:21:06 INFO client.AHSProxy: Connecting to Application History 
> server at /0.0.0.0:10200
> Total number of containers :4
>                   Container-Id                  Start Time             Finish 
> Time                   State                    Host                          
>       LOG-URL
> container_1415168250217_0001_01_000001               1415168318376           
> 1415168349896                COMPLETE    localhost.localdomain:47024     
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000001/container_1415168250217_0001_01_000001/hadoop
> container_1415168250217_0001_01_000002               1415168326399           
> 1415168334858                COMPLETE    localhost.localdomain:47024     
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000002/container_1415168250217_0001_01_000002/hadoop
> container_1415168250217_0001_01_000003               1415168326400           
> 1415168335277                COMPLETE    localhost.localdomain:47024     
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000003/container_1415168250217_0001_01_000003/hadoop
> container_1415168250217_0001_01_000004               1415168335825           
> 1415168343873                COMPLETE    localhost.localdomain:47024     
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000004/container_1415168250217_0001_01_000004/hadoop
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to