[jira] [Commented] (YARN-10343) Legacy RM UI should include labeled metrics for allocated, total, and reserved resources.

2020-07-10 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155811#comment-17155811
 ] 

Hadoop QA commented on YARN-10343:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 25m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
45s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 31s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 8 unchanged - 0 fixed = 9 total (was 8) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 10s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}183m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | 
hadoop.yarn.server.resourcemanager.placement.TestUserGroupMappingPlacementRule |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26268/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10343 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007452/YARN-10343.000.patch |
| Optional Tests | dupname asflicense compi

[jira] [Updated] (YARN-10343) Legacy RM UI should include labeled metrics for allocated, total, and reserved resources.

2020-07-10 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10343:
--
Attachment: YARN-10343.000.patch

> Legacy RM UI should include labeled metrics for allocated, total, and 
> reserved resources.
> -
>
> Key: YARN-10343
> URL: https://issues.apache.org/jira/browse/YARN-10343
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.10.0, 3.2.1, 3.1.3
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: Screen Shot 2020-07-07 at 1.00.22 PM.png, Screen Shot 
> 2020-07-07 at 1.03.26 PM.png, YARN-10343.000.patch
>
>
> The current legacy RM UI only includes resources metrics for the default 
> partition. If a cluster has labeled nodes, those are not included in the 
> resource metrics for allocated, total, and reserved resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1108) Always use tokens for client-to-AM connections

2020-07-10 Thread Santosh Marella (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155653#comment-17155653
 ] 

Santosh Marella commented on YARN-1108:
---

Hey guys.. I noticed that client-to-AM tokens are issued only when security is 
enabled, unlike other tokens (for e.g. AM to RM), and stumbled on this JIRA 
when trying to find out why. 

Looks like not much activity on this, but, just curious if there was any reason 
why client-to-AM tokens are issued only when security is enabled.
 

> Always use tokens for client-to-AM connections
> --
>
> Key: YARN-1108
> URL: https://issues.apache.org/jira/browse/YARN-1108
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Darrell Lowe
>Priority: Major
>
> Most of the tokens in YARN are used regardless of whether security is 
> enabled.  The client-to-AM token is an exception, and it would be nice if 
> this were treated like the others.  This would consolidate code paths for 
> easier testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-07-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155582#comment-17155582
 ] 

Eric Payne commented on YARN-10297:
---

I ran 50 tests with and without the patch. The tests failed 3 times without the 
patch and succeeded all times with the patch. Not conclusive, but a positive 
sign.
In addition, I analyzed the fixes and they look good to me.

+1.

[~jhung] and [~maniraj...@gmail.com], if there are no objections, I will commit 
this on Monday.

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch, YARN-10297.002.patch, 
> YARN-10297.003.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-07-10 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155480#comment-17155480
 ] 

Jim Brennan commented on YARN-10297:


Thanks [~epayne]!  The unit test failure in TestFairSchedulerPreemption is 
covered by [YARN-10329]. I have added comment to that Jira indicating that the 
TestFairScheduler test is fixed in this one.

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch, YARN-10297.002.patch, 
> YARN-10297.003.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10291) Yarn service commands doesn't work when https is enabled in RM

2020-07-10 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155344#comment-17155344
 ] 

Bilwa S T commented on YARN-10291:
--

Hi [~eyang]

It works with Java cacerts option. Thank you. We can close this issue.

> Yarn service commands doesn't work when https is enabled in RM
> --
>
> Key: YARN-10291
> URL: https://issues.apache.org/jira/browse/YARN-10291
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10291.001.patch
>
>
> when we submit application using command "yarn app -launch sleeper-service 
> ../share/hadoop/yarn/yarn-service-examples/sleeper/sleeper.json" , it throws 
> below exception 
> {code:java}
> com.sun.jersey.api.client.ClientHandlerException: 
> javax.net.ssl.SSLHandshakeException: 
> sun.security.validator.ValidatorException: PKIX path building failed: 
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> valid certification path to requested target
> {code}
> We should use WebServiceClient#createClient as it takes care of setting 
> sslfactory when https is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3001) RM dies because of divide by zero

2020-07-10 Thread xin qian (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155285#comment-17155285
 ] 

xin qian commented on YARN-3001:


Hi, Zhihai Xu , do you know how to reproduce this problem?

> RM dies because of divide by zero
> -
>
> Key: YARN-3001
> URL: https://issues.apache.org/jira/browse/YARN-3001
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.5.1
>Reporter: hoelog
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-3001.barnch-2.7.patch
>
>
> RM dies because of divide by zero exception.
> {code}
> 2014-12-31 21:27:05,022 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator.computeAvailableContainers(DefaultResourceCalculator.java:37)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1332)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1218)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:877)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:656)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:570)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:851)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:900)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:599)
> at java.lang.Thread.run(Thread.java:745)
> 2014-12-31 21:27:05,023 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-10 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10341:
-
Description: 
If there 10 workers running and if containers get killed , after a while we see 
that there are just 9 workers runnning. This is due to CONTAINER COMPLETED 
Event is not processed on AM side. 
 Issue is in below code:
{code:java}
public void onContainersCompleted(List statuses) {
  for (ContainerStatus status : statuses) {
ContainerId containerId = status.getContainerId();
ComponentInstance instance = liveInstances.get(status.getContainerId());
if (instance == null) {
  LOG.warn(
  "Container {} Completed. No component instance exists. 
exitStatus={}. diagnostics={} ",
  containerId, status.getExitStatus(), status.getDiagnostics());
  return;
}
ComponentEvent event =
new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
.setStatus(status).setInstance(instance)
.setContainerId(containerId);
dispatcher.getEventHandler().handle(event);
  }
{code}
If component instance doesnt exist for a container, it doesnt iterate over 
other containers as its returning from method. This happens when restart_policy 
is "ON_FAILURE"

  was:
If there 10 workers running and if containers get killed , after a while we see 
that there are just 9 workers runnning. This is due to CONTAINER COMPLETED 
Event is not processed on AM side. 
Issue is in below code:

{code:java}
public void onContainersCompleted(List statuses) {
  for (ContainerStatus status : statuses) {
ContainerId containerId = status.getContainerId();
ComponentInstance instance = liveInstances.get(status.getContainerId());
if (instance == null) {
  LOG.warn(
  "Container {} Completed. No component instance exists. 
exitStatus={}. diagnostics={} ",
  containerId, status.getExitStatus(), status.getDiagnostics());
  return;
}
ComponentEvent event =
new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
.setStatus(status).setInstance(instance)
.setContainerId(containerId);
dispatcher.getEventHandler().handle(event);
  }
{code}

If component instance doesnt exist for a container, it doesnt iterate over 
other containers as its returning from method



> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
>  Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method. This happens when 
> restart_policy is "ON_FAILURE"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org