[ 
https://issues.apache.org/jira/browse/YARN-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299101#comment-14299101
 ] 

Zhijie Shen commented on YARN-2808:
-----------------------------------

I think the patch should work, though it's not guarantee all the containers 
will be returned for a running attempt due to some race condition that 
container is finished, its info is pushed to timeline server, but is still not 
persisted. Anyway, it will be a good improvement in terms of user experience.

Some minor comments:

1. Is it possible to improve the performance? The application could be big to 
have hundreds of containers. It's not efficient to loop through them many 
times. Maybe run through them once, and put the ids in a hashset for check?
{code}
for (int i = 0; i < containersFromHistoryServer.size(); i++) {
                    if (containersFromHistoryServer.get(i).getContainerId()
                        .equals(tmp.getContainerId())) {
                      containersFromHistoryServer.remove(i);
                      //Remove containers from AHS as container from RM will 
have latest
                      //information
                      break;
                    }
                  }
{code}

2. In the test can we add a case that the running container is in RM, and it's 
also in the timeline server as part of its information is written there, the 
container info cached in RM is sourced instead of the partial info in the 
timeline server.

> yarn client tool can not list app_attempt's container info correctly
> --------------------------------------------------------------------
>
>                 Key: YARN-2808
>                 URL: https://issues.apache.org/jira/browse/YARN-2808
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.6.0
>            Reporter: Gordon Wang
>            Assignee: Naganarasimha G R
>         Attachments: YARN-2808.20150126-1.patch, YARN-2808.20150130-1.patch
>
>
> When enabling timeline server, yarn client can not list the container info 
> for a application attempt correctly.
> Here is the reproduce step.
> # enabling yarn timeline server
> # submit a MR job
> # after the job is finished. use yarn client to list the container info of 
> the app attempt.
> Then, since the RM has cached the application's attempt info, the output show 
> {noformat}
> [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list 
> appattempt_1415168250217_0001_000001
> 14/11/05 01:19:15 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 14/11/05 01:19:15 INFO impl.TimelineClientImpl: Timeline service address: 
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/11/05 01:19:16 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 14/11/05 01:19:16 INFO client.AHSProxy: Connecting to Application History 
> server at /0.0.0.0:10200
> Total number of containers :0
>                   Container-Id                  Start Time             Finish 
> Time                   State                    Host                          
>       LOG-URL
> {noformat}
> But if the rm is restarted, client can fetch the container info from timeline 
> server correctly.
> {noformat}
> [hadoop@localhost hadoop-3.0.0-SNAPSHOT]$ ./bin/yarn container -list 
> appattempt_1415168250217_0001_000001
> 14/11/05 01:21:06 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 14/11/05 01:21:06 INFO impl.TimelineClientImpl: Timeline service address: 
> http://0.0.0.0:8188/ws/v1/timeline/
> 14/11/05 01:21:06 INFO client.RMProxy: Connecting to ResourceManager at 
> /0.0.0.0:8032
> 14/11/05 01:21:06 INFO client.AHSProxy: Connecting to Application History 
> server at /0.0.0.0:10200
> Total number of containers :4
>                   Container-Id                  Start Time             Finish 
> Time                   State                    Host                          
>       LOG-URL
> container_1415168250217_0001_01_000001               1415168318376           
> 1415168349896                COMPLETE    localhost.localdomain:47024     
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000001/container_1415168250217_0001_01_000001/hadoop
> container_1415168250217_0001_01_000002               1415168326399           
> 1415168334858                COMPLETE    localhost.localdomain:47024     
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000002/container_1415168250217_0001_01_000002/hadoop
> container_1415168250217_0001_01_000003               1415168326400           
> 1415168335277                COMPLETE    localhost.localdomain:47024     
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000003/container_1415168250217_0001_01_000003/hadoop
> container_1415168250217_0001_01_000004               1415168335825           
> 1415168343873                COMPLETE    localhost.localdomain:47024     
> http://0.0.0.0:8188/applicationhistory/logs/localhost.localdomain:47024/container_1415168250217_0001_01_000004/container_1415168250217_0001_01_000004/hadoop
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to