[
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553180#comment-16553180
]
Eric Yang commented on YARN-8330:
---------------------------------
If information is collected for end user consumption to understand their
application usage, then by collecting RUNNING state might be sufficient. End
user should not be penalized for YARN framework deficiency. If information is
collected for system administrator to understand the cluster health and isolate
which node is potentially causing container to fail, then reporting
ALLOCATED/ACQUIRED is preferable. Timeline server is optimized for end user
application reporting, the extra data seems unnecessary at this time However,
more information is collected, it is easier to avoid writing similar code
twice. The report filtering can be done at Timeline server to fulfill both use
cases. It could be a problem to handy cap the data collection toward one use
case only.
> An extra container got launched by RM for yarn-service
> ------------------------------------------------------
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn-native-services
> Reporter: Yesha Vora
> Assignee: Suma Shivaprasad
> Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list
> appattempt_1525463491331_0006_000001
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over
> to rm2
> Total number of containers :5
> Container-Id Start Time Finish Time
> State Host Node Http Address
> LOG-URL
> container_e06_1525463491331_0006_01_000002 Fri May 04 22:34:26 +0000 2018
> N/A RUNNING xxx:25454 http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000002/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 -
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000003
> Fri May 04 22:34:26 +0000 2018 N/A
> RUNNING xxx:25454 http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000003/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 -
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000001
> Fri May 04 22:34:15 +0000 2018 N/A
> RUNNING xxx:25454 http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000001/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 -
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000005
> Fri May 04 22:34:56 +0000 2018 N/A
> RUNNING xxx:25454 http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000005/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 -
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000004
> Fri May 04 22:34:56 +0000 2018 N/A
> null xxx:25454 http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_000004/container_e06_1525463491331_0006_01_000004/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM
> is listing 5 containers.
> container_e06_1525463491331_0006_01_000004 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log
> available in NM & AM related to container 04. Only one line in RM log is
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl
> (RMContainerImpl.java:handle(489)) -
> container_e06_1525463491331_0006_01_000004 Container Transitioned from NEW to
> RESERVED{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]