[
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551118#comment-16551118
]
Rohith Sharma K S commented on YARN-8330:
-----------------------------------------
Thanks [~suma.shivaprasad] for working on this patch.
As we discussed offline, publishing container events from constructor or in
state ALLOCATED/AQUIRED doesn't solve the issue completely rather it just
converges probability of occurrence. Containers could be released by AM without
launching it. This scenario also end up in listing additional containers.
ATSv2 publishes container info from NM which ensures actual containers which
are launched. I think we can take similar approach from RM also to publish only
when container state is RUNNING. OTH, this will not collect information on how
many actual containers are created/allocated for this applications.
I think to be more precise, we should follow atsv2 approach i.e publishing
container information when container state is RUNNING.
cc:/ [~jlowe]
> An extra container got launched by RM for yarn-service
> ------------------------------------------------------
>
> Key: YARN-8330
> URL: https://issues.apache.org/jira/browse/YARN-8330
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn-native-services
> Reporter: Yesha Vora
> Assignee: Suma Shivaprasad
> Priority: Critical
> Attachments: YARN-8330.1.patch, YARN-8330.2.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list
> appattempt_1525463491331_0006_000001
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over
> to rm2
> Total number of containers :5
> Container-Id Start Time Finish Time
> State Host Node Http Address
> LOG-URL
> container_e06_1525463491331_0006_01_000002 Fri May 04 22:34:26 +0000 2018
> N/A RUNNING xxx:25454 http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000002/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 -
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000003
> Fri May 04 22:34:26 +0000 2018 N/A
> RUNNING xxx:25454 http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000003/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 -
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000001
> Fri May 04 22:34:15 +0000 2018 N/A
> RUNNING xxx:25454 http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000001/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 -
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000005
> Fri May 04 22:34:56 +0000 2018 N/A
> RUNNING xxx:25454 http://xxx:8042
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000005/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 -
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000004
> Fri May 04 22:34:56 +0000 2018 N/A
> null xxx:25454 http://xxx:8042
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_000004/container_e06_1525463491331_0006_01_000004/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM
> is listing 5 containers.
> container_e06_1525463491331_0006_01_000004 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log
> available in NM & AM related to container 04. Only one line in RM log is
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO rmcontainer.RMContainerImpl
> (RMContainerImpl.java:handle(489)) -
> container_e06_1525463491331_0006_01_000004 Container Transitioned from NEW to
> RESERVED{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]