[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551338#comment-16551338
 ] 

Jason Lowe commented on YARN-8330:
----------------------------------

To me this all depends upon the intent of how the ATS data is going to be 
consumed.  One use case of the container data would be to track every single 
allocation an RM ever did for an application, whether that allocation was used 
or not.  Calculating an application footprint (ala YARN-415) requires this kind 
of information, otherwise applications that abuse the allocation protocol and 
ask for lots of containers that are never launched would appear much smaller 
than they really are.  Getting this kind of information requires we record 
containers at the ALLOCATED state at a minimum.

That leaves the inherent race where containers can be allocated just as an AM 
tries to remove the allocation request and the container gets discarded.  Even 
after the change to record at ALLOCATED, {{yarn containers -list}} command also 
needs the ability to discern between containers that were never launched and 
containers that were so it can only list containers that actually ran the app's 
code.


> An extra container got launched by RM for yarn-service
> ------------------------------------------------------
>
>                 Key: YARN-8330
>                 URL: https://issues.apache.org/jira/browse/YARN-8330
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>            Reporter: Yesha Vora
>            Assignee: Suma Shivaprasad
>            Priority: Critical
>         Attachments: YARN-8330.1.patch, YARN-8330.2.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_000001
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-Id            Start Time             Finish Time                   
> State                    Host       Node Http Address                         
>        LOG-URL
> container_e06_1525463491331_0006_01_000002    Fri May 04 22:34:26 +0000 2018  
>                  N/A                 RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000002/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000003
>     Fri May 04 22:34:26 +0000 2018                   N/A                 
> RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000003/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000001
>     Fri May 04 22:34:15 +0000 2018                   N/A                 
> RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000001/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000005
>     Fri May 04 22:34:56 +0000 2018                   N/A                 
> RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000005/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000004
>     Fri May 04 22:34:56 +0000 2018                   N/A                    
> null    xxx:25454  http://xxx:8042    
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_000004/container_e06_1525463491331_0006_01_000004/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_000004 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_000004 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to