[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553180#comment-16553180
 ] 

Eric Yang commented on YARN-8330:
---------------------------------

If information is collected for end user consumption to understand their 
application usage, then by collecting RUNNING state might be sufficient.  End 
user should not be penalized for YARN framework deficiency.  If information is 
collected for system administrator to understand the cluster health and isolate 
which node is potentially causing container to fail, then reporting 
ALLOCATED/ACQUIRED is preferable.  Timeline server is optimized for end user 
application reporting, the extra data seems unnecessary at this time  However, 
more information is collected, it is easier to avoid writing similar code 
twice.  The report filtering can be done at Timeline server to fulfill both use 
cases.  It could be a problem to handy cap the data collection toward one use 
case only.

> An extra container got launched by RM for yarn-service
> ------------------------------------------------------
>
>                 Key: YARN-8330
>                 URL: https://issues.apache.org/jira/browse/YARN-8330
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>            Reporter: Yesha Vora
>            Assignee: Suma Shivaprasad
>            Priority: Critical
>         Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_000001
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-Id            Start Time             Finish Time                   
> State                    Host       Node Http Address                         
>        LOG-URL
> container_e06_1525463491331_0006_01_000002    Fri May 04 22:34:26 +0000 2018  
>                  N/A                 RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000002/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000003
>     Fri May 04 22:34:26 +0000 2018                   N/A                 
> RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000003/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000001
>     Fri May 04 22:34:15 +0000 2018                   N/A                 
> RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000001/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000005
>     Fri May 04 22:34:56 +0000 2018                   N/A                 
> RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000005/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000004
>     Fri May 04 22:34:56 +0000 2018                   N/A                    
> null    xxx:25454  http://xxx:8042    
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_000004/container_e06_1525463491331_0006_01_000004/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_000004 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_000004 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to