[jira] [Comment Edited] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938302#comment-16938302
 ] 

Bibin Chundatt edited comment on YARN-9730 at 9/26/19 5:58 AM:
---

[~jhung]

Thank you for working on this. Sorry to come in really late too ..

{quote}
240   if (ResourceRequest.ANY.equals(req.getResourceName())) {
241 SchedulerUtils.enforcePartitionExclusivity(req,
242 getRmContext().getExclusiveEnforcedPartitions(),
243 asc.getNodeLabelExpression());
244   }
{quote}

Configuration query on the AM allocation flow is going to be costly which i 
observed while evaluating the performance..
Could you optimize {{getRmContext().getExclusiveEnforcedPartitions()}}, since 
this is going to be invoked for every *request*






was (Author: bibinchundatt):
[~jhung]

Thank you for working on this. Sorry to come in really late too ..

{quote}
240   if (ResourceRequest.ANY.equals(req.getResourceName())) {
241 SchedulerUtils.enforcePartitionExclusivity(req,
242 getRmContext().getExclusiveEnforcedPartitions(),
243 asc.getNodeLabelExpression());
244   }
{quote}

Configuration query on the AM allocation flow is going to be costly which i 
observed while evaluating the performance..
Could you optimize {getRmContext().getExclusiveEnforcedPartitions()} ,since 
this is going to be invoked for every *request*





> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.addendum, YARN-9730.002.patch, 
> YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938302#comment-16938302
 ] 

Bibin Chundatt commented on YARN-9730:
--

[~jhung]

Thank you for working on this. Sorry to come in really late too ..

{quote}
240   if (ResourceRequest.ANY.equals(req.getResourceName())) {
241 SchedulerUtils.enforcePartitionExclusivity(req,
242 getRmContext().getExclusiveEnforcedPartitions(),
243 asc.getNodeLabelExpression());
244   }
{quote}

Configuration query on the AM allocation flow is going to be costly which i 
observed while evaluating the performance..
Could you optimize {getRmContext().getExclusiveEnforcedPartitions()} ,since 
this is going to be invoked for every *request*





> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.addendum, YARN-9730.002.patch, 
> YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-25 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938291#comment-16938291
 ] 

Peter Bacsko commented on YARN-9841:


Just some really minor comments:

{noformat}
70  if (expectedParentQueue != null) {
71Assert.assertEquals(expectedParentQueue,
72ctx != null ? ctx.getParentQueue() : inputQueue);
73  }
{noformat}

Could you add an assertion message here?


{noformat}
187 Assert.assertEquals("a", ctx != null ? ctx.getQueue() : "default");
188 Assert.assertEquals("agroup",
189 ctx != null ? ctx.getParentQueue() : "default");
190   }
{noformat}

Three things here:
1. Assertion message "a" is very short, I think it should be like "Expected 
queue".
2. Similaly, "Expected group" instead of "agroup"
3. Can {{ctx}} ever be null? I assume this test should produce the same 
behavior each time, provided the code-under-test doesn't change. So to me it 
seems more logical to make sure that {{ctx}} is not null, which  practically 
means a new assertion. Btw this applies to the piece of code above, too.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9828) Add log line for app submission in RouterWebServices.

2019-09-25 Thread Sampada Dehankar (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sampada Dehankar reassigned YARN-9828:
--

Assignee: Sampada Dehankar  (was: Abhishek Modi)

> Add log line for app submission in RouterWebServices.
> -
>
> Key: YARN-9828
> URL: https://issues.apache.org/jira/browse/YARN-9828
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Sampada Dehankar
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9843) Test TestAMSimulator.testAMSimulator fails intermittently.

2019-09-25 Thread Sampada Dehankar (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sampada Dehankar reassigned YARN-9843:
--

Assignee: Sampada Dehankar  (was: Abhishek Modi)

> Test TestAMSimulator.testAMSimulator fails intermittently.
> --
>
> Key: YARN-9843
> URL: https://issues.apache.org/jira/browse/YARN-9843
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Abhishek Modi
>Assignee: Sampada Dehankar
>Priority: Major
>
> Stack trace for failure:
> java.lang.AssertionError: java.io.IOException: Unable to delete directory 
> /testptch/hadoop/hadoop-tools/hadoop-sls/target/test-dir/output4038286622450859971/metrics.
>  at org.junit.Assert.fail(Assert.java:88)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.deleteMetricOutputDir(TestAMSimulator.java:141)
>  at 
> org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.tearDown(TestAMSimulator.java:298)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>  at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>  at org.junit.runners.Suite.runChild(Suite.java:128)
>  at org.junit.runners.Suite.runChild(Suite.java:27)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>  at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>  at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-25 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938281#comment-16938281
 ] 

Peter Bacsko commented on YARN-9841:


[~maniraj...@gmail.com] just a thought. If we have this for {{%primary_group}}, 
can't we just handle {{%secondary_group}} as well? Then we don't have to create 
yet another JIRA for that.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6715) Fix documentation about NodeHealthScriptRunner

2019-09-25 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938280#comment-16938280
 ] 

Peter Bacsko commented on YARN-6715:


[~snemeth] patches are ready to be committed to branch-3.1 and branch-3.2.

> Fix documentation about NodeHealthScriptRunner 
> ---
>
> Key: YARN-6715
> URL: https://issues.apache.org/jira/browse/YARN-6715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-6715-001.patch, YARN-6715-002.patch, 
> YARN-6715-003.patch, YARN-6715-branch-3.1.001.patch, 
> YARN-6715-branch-3.2.001.patch
>
>
> NodeHealthScriptRunner does *not* report a bad health if the script exits 
> with an exit code other than 0. Look at the {{FAILED_WITH_EXIT_CODE}} case:
> {noformat}
> void reportHealthStatus(HealthCheckerExitStatus status) {
>   long now = System.currentTimeMillis();
>   switch (status) {
>   case SUCCESS:
> setHealthStatus(true, "", now);
> break;
>   case TIMED_OUT:
> setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG);
> break;
>   case FAILED_WITH_EXCEPTION:
> setHealthStatus(false, exceptionStackTrace);
> break;
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
>   case FAILED:
> setHealthStatus(false, shexec.getOutput());
> break;
>   }
> }
> {noformat}
> Based on the discussion in YARN-5567, this is intentional, but conflicts with 
> the upstream document, which says: 
> "If the script *exits with a non-zero exit code*, times out or results in an 
> exception being thrown, the node is marked as unhealthy"
> This statement can be extremely misleading and must be corrected. We might 
> also add an extra comment to {{reportHealthStatus()}} which explains that 
> {{FAILED_WITH_EXIT_CODE}} is not buggy.
> This case also lacks unit test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-09-25 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938279#comment-16938279
 ] 

Peter Bacsko commented on YARN-9552:


[~snemeth] you can now backport this patch to branch-3.1 and branch-3.2

> FairScheduler: NODE_UPDATE can cause NoSuchElementException
> ---
>
> Key: YARN-9552
> URL: https://issues.apache.org/jira/browse/YARN-9552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9552-001.patch, YARN-9552-002.patch, 
> YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.1.001.patch, 
> YARN-9552-branch-3.1.002.patch, YARN-9552-branch-3.2.001.patch, 
> YARN-9552-branch-3.2.002.patch, YARN-9552-branch-3.2.003.patch
>
>
> We observed a race condition inside YARN with the following stack trace:
> {noformat}
> 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR 
> EventDispatcher: Error in handling event type NODE_UPDATE to the Event 
> Dispatcher
> java.util.NoSuchElementException
> at 
> java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
> at 
> java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This is basically the same as the one described in YARN-7382, but the root 
> cause is different.
> When we create an application attempt, we create an {{FSAppAttempt}} object. 
> This contains an {{AppSchedulingInfo}} which contains a set of 
> {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a 
> bit later on a separate thread during a state transition:
> {noformat}
> 2019-05-07 15:58:02,659 INFO  [RM StateStore dispatcher] 
> recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for 
> app: application_1557237478804_0001
> 2019-05-07 15:58:02,684 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
> 2019-05-07 15:58:02,690 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted 
> application application_1557237478804_0001 from user: bacskop, in queue: 
> root.bacskop, currently num of applications: 1
> 2019-05-07 15:58:02,698 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
> 2019-05-07 15:58:02,731 INFO  [RM Event dispatcher] 
> resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app 
> attempt : appattempt_1557237478804_0001_01
> 2019-05-07 15:58:02,732 INFO  [RM Event dispatcher] attempt.RMAppAttemptImpl 
> (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 
> State change from NEW to SUBMITTED on event = START
> 2019-05-07 15:58:02,746 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of 
> SchedulerApplicationAttempt
> 2019-05-07 15:58:02,747 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(230)) - *** Contents of 
> appSchedulingInfo: []
> 2019-05-07 

[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-25 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938277#comment-16938277
 ] 

Peter Bacsko commented on YARN-9841:


Jenkins picked up the junit patch, re-uploading patch 001 again.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9841:
---
Attachment: YARN-9841.001.patch

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9849) Leaf queues not inheriting parent queue status after adding status as “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml

2019-09-25 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9849:

Attachment: YARN-9849.001.patch

> Leaf queues not inheriting parent queue status after adding status as 
> “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml 
> --
>
> Key: YARN-9849
> URL: https://issues.apache.org/jira/browse/YARN-9849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sushanta Sen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9849.001.patch
>
>
> 【Precondition】:
> 1. Install the cluster
> 2. Config Queues with more numbers say 2 parent [default,q1] & 4 leaf [q2,q3]
> 3. Cluster should be up and running
> 【Test step】:
> 1.By default leaf quques inherit parent status 
> 2.Change leaf queues status  as "RUNNING" explicitly
> 3. Run refresh command, leaf queues status shown as "RUNNING" in CLI/UI
> 4. Therafter,change the leaft queues status to "STOPPED"  and refresh command
> 5. Run refresh command, leaf queues status shown as "STOPPING" in CLI/UI
> 6. Now comment the leafy queues status and run refresh queues
> 7.Observe
> 【Expect Output】:
> The leaf queues status should be displayed as "RUNNING" inheriting from the 
> parent queue.
> 【Actual Output】:
> Still displays the leaf queues status as "STOPPED" raather than inheriting 
> the same from parent which is in RUNNING



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9849) Leaf queues not inheriting parent queue status after adding status as “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml

2019-09-25 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938243#comment-16938243
 ] 

Bilwa S T commented on YARN-9849:
-

In AbstractQueue#initializeQueueState queue state is not set in below condition

if (previousState != null && configuredState==null && parentState != null)

> Leaf queues not inheriting parent queue status after adding status as 
> “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml 
> --
>
> Key: YARN-9849
> URL: https://issues.apache.org/jira/browse/YARN-9849
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sushanta Sen
>Assignee: Bilwa S T
>Priority: Major
>
> 【Precondition】:
> 1. Install the cluster
> 2. Config Queues with more numbers say 2 parent [default,q1] & 4 leaf [q2,q3]
> 3. Cluster should be up and running
> 【Test step】:
> 1.By default leaf quques inherit parent status 
> 2.Change leaf queues status  as "RUNNING" explicitly
> 3. Run refresh command, leaf queues status shown as "RUNNING" in CLI/UI
> 4. Therafter,change the leaft queues status to "STOPPED"  and refresh command
> 5. Run refresh command, leaf queues status shown as "STOPPING" in CLI/UI
> 6. Now comment the leafy queues status and run refresh queues
> 7.Observe
> 【Expect Output】:
> The leaf queues status should be displayed as "RUNNING" inheriting from the 
> parent queue.
> 【Actual Output】:
> Still displays the leaf queues status as "STOPPED" raather than inheriting 
> the same from parent which is in RUNNING



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9137) Get the IP and port of the docker container and display it on WEB UI2

2019-09-25 Thread Xun Liu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938169#comment-16938169
 ] 

Xun Liu commented on YARN-9137:
---

[~eyang], Sorry, I am too busy, I don't have time to complete the development 
of this issue, I am very sorry. 

> Get the IP and port of the docker container and display it on WEB UI2
> -
>
> Key: YARN-9137
> URL: https://issues.apache.org/jira/browse/YARN-9137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xun Liu
>Priority: Major
>
> 1) When using a container network such as Calico, the IP of the container is 
> not the IP of the host, but is allocated in the private network, and the 
> different containers can be directly connected.
>  Exposing the services in the container through a reverse proxy such as Ngxin 
> makes it easy for users to view the IP and port on WEB UI2 to use the 
> services in the container, such as Tomcat, TensorBoard, and so on.
>  2) When not using a container network such as Calico, the container also has 
> its own container port.
> So you need to display the IP and port of the docker container on WEB UI2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9137) Get the IP and port of the docker container and display it on WEB UI2

2019-09-25 Thread Xun Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xun Liu reassigned YARN-9137:
-

Assignee: (was: Xun Liu)

> Get the IP and port of the docker container and display it on WEB UI2
> -
>
> Key: YARN-9137
> URL: https://issues.apache.org/jira/browse/YARN-9137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xun Liu
>Priority: Major
>
> 1) When using a container network such as Calico, the IP of the container is 
> not the IP of the host, but is allocated in the private network, and the 
> different containers can be directly connected.
>  Exposing the services in the container through a reverse proxy such as Ngxin 
> makes it easy for users to view the IP and port on WEB UI2 to use the 
> services in the container, such as Tomcat, TensorBoard, and so on.
>  2) When not using a container network such as Calico, the container also has 
> its own container port.
> So you need to display the IP and port of the docker container on WEB UI2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938161#comment-16938161
 ] 

Hudson commented on YARN-9730:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17389 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17389/])
Addendum to YARN-9730. Support forcing configured partitions to be (jhung: rev 
606e341c1a33393e6935d31ab05eae87742c865b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java


> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.addendum, YARN-9730.002.patch, 
> YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9730:

Attachment: YARN-9730.002.addendum

> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.addendum, YARN-9730.002.patch, 
> YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Zhe Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938150#comment-16938150
 ] 

Zhe Zhang commented on YARN-9730:
-

+1 on the addendum path

> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.patch, YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass

2019-09-25 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938136#comment-16938136
 ] 

Ahmed Hussein commented on YARN-9857:
-

[~ebadger] the test does not fail.
While I was investigating some intermittent failures with other test case, I 
noticed NullPointerException in the log files coming from 
{{TestDelegationTokenRenewer}}. I thought to get rid of NPEs to make the JUnit 
cleaner and easier to debug.

> TestDelegationTokenRenewer throws NPE but tests pass
> 
>
> Key: YARN-9857
> URL: https://issues.apache.org/jira/browse/YARN-9857
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9857.001.patch
>
>
> {{TestDelegationTokenRenewer}} throws some NPEs:
> {code:bash}
> 2019-09-25 12:51:23,446 WARN  [pool-19-thread-2] 
> security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to 
> add the application to the delegation token renewer.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,446 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics
> Exception in thread "pool-19-thread-2" java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
> 2019-09-25 12:51:23,447 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,447 INFO  [main] impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
> {code}
> the RMContext dispatcher is not set for the RMMock which results in NPE 
> accessing the event handler of the dispatcher inside 
> {{DelegationTokenRenewer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9615) Add dispatcher metrics to RM

2019-09-25 Thread Aihua Xu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938133#comment-16938133
 ] 

Aihua Xu commented on YARN-9615:


This seems to be an important feature since sometimes we will see a large 
queue. The generic approach looks promising which can be adopted for other 
queues as well. 

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9615.poc.patch, screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938126#comment-16938126
 ] 

Hadoop QA commented on YARN-9730:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
4s{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 80m 
43s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9730 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981370/YARN-9730.001.addendum
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c0c1949a26c4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bdaaa3b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24838/testReport/ |
| Max. process+thread count | 841 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass

2019-09-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938096#comment-16938096
 ] 

Hadoop QA commented on YARN-9857:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m  
4s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}134m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9857 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981363/YARN-9857.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bd1a77faf712 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bdaaa3b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24837/testReport/ |
| Max. process+thread count | 827 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24837/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> TestDelegationTokenRenewer throws NPE but tests pass
> 

[jira] [Commented] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs

2019-09-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938065#comment-16938065
 ] 

Hadoop QA commented on YARN-9751:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 30s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 5 new + 81 unchanged - 3 fixed = 86 total (was 84) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 93m 
49s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}151m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9751 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981357/YARN-9751.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 703e659217ad 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bdaaa3b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24836/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24836/testReport/ |
| Max. process+thread count | 812 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass

2019-09-25 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938062#comment-16938062
 ] 

Eric Badger commented on YARN-9857:
---

[~ahussein], what branch does this fail on? I'm running on trunk and I don't 
see any failures

> TestDelegationTokenRenewer throws NPE but tests pass
> 
>
> Key: YARN-9857
> URL: https://issues.apache.org/jira/browse/YARN-9857
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9857.001.patch
>
>
> {{TestDelegationTokenRenewer}} throws some NPEs:
> {code:bash}
> 2019-09-25 12:51:23,446 WARN  [pool-19-thread-2] 
> security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to 
> add the application to the delegation token renewer.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,446 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics
> Exception in thread "pool-19-thread-2" java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
> 2019-09-25 12:51:23,447 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,447 INFO  [main] impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
> {code}
> the RMContext dispatcher is not set for the RMMock which results in NPE 
> accessing the event handler of the dispatcher inside 
> {{DelegationTokenRenewer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938052#comment-16938052
 ] 

Jonathan Hung edited comment on YARN-9730 at 9/25/19 8:40 PM:
--

Thanks for reporting...I think this is b/c we grab this conf from rmcontext's 
conf which is not initialized in the test cases. YARN-8468 adds 
TestRMAppManager which passes the test conf to RMAppManager so it's fixed in 
later versions.

Probably it's easiest to just add a null check so we don't have to fix all the 
test cases. Uploaded 001 addendum for this.


was (Author: jhung):
Thanks for reporting...I think this is b/c we grab this conf from rmcontext's 
conf which is not initialized in the test cases. YARN-8468 adds 
TestRMAppManager which passes the test conf to RMAppManager so it's fixed in 
later versions.

Probably it's easiest to just add a null check so we don't have to fix all the 
test cases. I'll upload a patch for this.

> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.patch, YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938052#comment-16938052
 ] 

Jonathan Hung commented on YARN-9730:
-

Thanks for reporting...I think this is b/c we grab this conf from rmcontext's 
conf which is not initialized in the test cases. YARN-8468 adds 
TestRMAppManager which passes the test conf to RMAppManager so it's fixed in 
later versions.

Probably it's easiest to just add a null check so we don't have to fix all the 
test cases. I'll upload a patch for this.

> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.patch, YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9730:

Attachment: YARN-9730.001.addendum

> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.patch, YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung reopened YARN-9730:
-

> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.patch, 
> YARN-9730.002.patch, YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass

2019-09-25 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938023#comment-16938023
 ] 

Jim Brennan commented on YARN-9857:
---

+1 This looks good to me. (non-binding)

[~ebadger] what do you think?

 

> TestDelegationTokenRenewer throws NPE but tests pass
> 
>
> Key: YARN-9857
> URL: https://issues.apache.org/jira/browse/YARN-9857
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9857.001.patch
>
>
> {{TestDelegationTokenRenewer}} throws some NPEs:
> {code:bash}
> 2019-09-25 12:51:23,446 WARN  [pool-19-thread-2] 
> security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to 
> add the application to the delegation token renewer.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,446 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics
> Exception in thread "pool-19-thread-2" java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
> 2019-09-25 12:51:23,447 DEBUG [main] util.MBeans 
> (MBeans.java:unregister(138)) - Unregistering 
> Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 2019-09-25 12:51:23,447 INFO  [main] impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
> {code}
> the RMContext dispatcher is not set for the RMMock which results in NPE 
> accessing the event handler of the dispatcher inside 
> {{DelegationTokenRenewer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938009#comment-16938009
 ] 

Jim Brennan commented on YARN-9730:
---

[~jhung] I believe pulling this back to branch-2 has caused failures in 
TestAppManager (and others).  Example stack trace:
{noformat}
[ERROR] Tests run: 21, Failures: 0, Errors: 7, Skipped: 0, Time elapsed: 7.216 
s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestAppManager
[ERROR] 
testRMAppRetireZeroSetting(org.apache.hadoop.yarn.server.resourcemanager.TestAppManager)
  Time elapsed: 0.054 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.RMContextImpl.getExclusiveEnforcedPartitions(RMContextImpl.java:590)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.(RMAppManager.java:115)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAppManager$TestRMAppManager.(TestAppManager.java:192)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAppManager.testRMAppRetireZeroSetting(TestAppManager.java:450)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
{noformat}

> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.patch, 
> YARN-9730.002.patch, YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are 

[jira] [Created] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass

2019-09-25 Thread Ahmed Hussein (Jira)
Ahmed Hussein created YARN-9857:
---

 Summary: TestDelegationTokenRenewer throws NPE but tests pass
 Key: YARN-9857
 URL: https://issues.apache.org/jira/browse/YARN-9857
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein


{{TestDelegationTokenRenewer}} throws some NPEs:


{code:bash}
2019-09-25 12:51:23,446 WARN  [pool-19-thread-2] 
security.DelegationTokenRenewer 
(DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to 
add the application to the delegation token renewer.
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-09-25 12:51:23,446 DEBUG [main] util.MBeans (MBeans.java:unregister(138)) 
- Unregistering Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics
Exception in thread "pool-19-thread-2" java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918)
2019-09-25 12:51:23,447 DEBUG [main] util.MBeans (MBeans.java:unregister(138)) 
- Unregistering Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-09-25 12:51:23,447 INFO  [main] impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
{code}

the RMContext dispatcher is not set for the RMMock which results in NPE 
accessing the event handler of the dispatcher inside {{DelegationTokenRenewer}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6715) Fix documentation about NodeHealthScriptRunner

2019-09-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937980#comment-16937980
 ] 

Hadoop QA commented on YARN-6715:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}139m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
53s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
58s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
27s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
30s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
36s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
49s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
27s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}253m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:080e9d0f9b3 |
| JIRA Issue | YARN-6715 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981332/YARN-6715-branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fbfce8881310 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 

[jira] [Commented] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs

2019-09-25 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937965#comment-16937965
 ] 

Jonathan Hung commented on YARN-9751:
-

Attached 003 patch which changes queue ordering policy configuration instead of 
app ordering policy conf as discussed with  [~eepayne]. [~eepayne] could you 
help take a look? thx!

> Separate queue and app ordering policy capacity scheduler configs
> -
>
> Key: YARN-9751
> URL: https://issues.apache.org/jira/browse/YARN-9751
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9751.001.patch, YARN-9751.002.patch, 
> YARN-9751.003.patch
>
>
> Right now it's not possible to specify distinct app and queue ordering 
> policies since they share the same {{ordering-policy}} suffix.
> There's already a TODO in CapacitySchedulerConfiguration for this. This Jira 
> intends to fix it.
> {noformat}
> // TODO (wangda): We need to better distinguish app ordering policy and queue
> // ordering policy's classname / configuration options, etc. And dedup code
> // if possible.{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs

2019-09-25 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9751:

Attachment: YARN-9751.003.patch

> Separate queue and app ordering policy capacity scheduler configs
> -
>
> Key: YARN-9751
> URL: https://issues.apache.org/jira/browse/YARN-9751
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9751.001.patch, YARN-9751.002.patch, 
> YARN-9751.003.patch
>
>
> Right now it's not possible to specify distinct app and queue ordering 
> policies since they share the same {{ordering-policy}} suffix.
> There's already a TODO in CapacitySchedulerConfiguration for this. This Jira 
> intends to fix it.
> {noformat}
> // TODO (wangda): We need to better distinguish app ordering policy and queue
> // ordering policy's classname / configuration options, etc. And dedup code
> // if possible.{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-09-25 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937964#comment-16937964
 ] 

Prabhu Joseph commented on YARN-9840:
-

[~maniraj...@gmail.com] Thanks for the patch. Have verified it and it looks 
fine. Would you fix the checkstyle issues.  +1 (non-binding).

cc [~snemeth] [~sunilg] Can you review and commit this Jira when you get time. 
Thanks.

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9840.001.patch, YARN-9840.002.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2

2019-09-25 Thread Srinivas S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas S T reassigned YARN-9842:
--

Assignee: Srinivas S T  (was: Abhishek Modi)

> Port YARN-9608 DecommissioningNodesWatcher should get lists of running 
> applications on node from RMNode to branch-3.0/branch-2
> --
>
> Key: YARN-9842
> URL: https://issues.apache.org/jira/browse/YARN-9842
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Srinivas S T
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7

2019-09-25 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937922#comment-16937922
 ] 

Eric Badger commented on YARN-9855:
---

Thanks [~jhung]!

> Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7
> -
>
> Key: YARN-9855
> URL: https://issues.apache.org/jira/browse/YARN-9855
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 2.7.8, 2.8.6
>
> Attachments: YARN-9855-branch-2.7.001.patch, 
> YARN-9855-branch-2.7.001.patch, YARN-9855-branch-2.8.001.patch
>
>
> As per 
> [http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201909.mbox/%3cCAAaVJWUKTBXEYV_-yWs2PT8aqhjQXq=garav+yzjxq0nx36...@mail.gmail.com%3e].
>  Update this field to use the same id as in branch-2.9 and above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6382) Address race condition on TimelineWriter.flush() caused by buffer-sized flush

2019-09-25 Thread Yousef Abu-Salah (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousef Abu-Salah reassigned YARN-6382:
--

Assignee: Yousef Abu-Salah  (was: Haibo Chen)

> Address race condition on TimelineWriter.flush() caused by buffer-sized flush
> -
>
> Key: YARN-6382
> URL: https://issues.apache.org/jira/browse/YARN-6382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Yousef Abu-Salah
>Priority: Major
>
> YARN-6376 fixes the race condition between putEntities() and periodical 
> flush() by WriterFlushThread in TimelineCollectorManager, or between 
> putEntities() in different threads.
> However, BufferedMutator can have internal size-based flush as well. We need 
> to address the resulting race condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7

2019-09-25 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937908#comment-16937908
 ] 

Jonathan Hung commented on YARN-9855:
-

Thanks Eric and Bibin, pushed to branch-2.8 and branch-2.7.

> Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7
> -
>
> Key: YARN-9855
> URL: https://issues.apache.org/jira/browse/YARN-9855
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9855-branch-2.7.001.patch, 
> YARN-9855-branch-2.7.001.patch, YARN-9855-branch-2.8.001.patch
>
>
> As per 
> [http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201909.mbox/%3cCAAaVJWUKTBXEYV_-yWs2PT8aqhjQXq=garav+yzjxq0nx36...@mail.gmail.com%3e].
>  Update this field to use the same id as in branch-2.9 and above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9011) Race condition during decommissioning

2019-09-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937879#comment-16937879
 ] 

Hadoop QA commented on YARN-9011:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 31s{color} | {color:orange} root: The patch generated 3 new + 31 unchanged - 
0 fixed = 34 total (was 31) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m 48s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 90m  
1s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}205m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ha.TestZKFailoverController |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 |
| JIRA Issue | YARN-9011 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981320/YARN-9011-006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ab4189af6041 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | 

[jira] [Commented] (YARN-6715) Fix documentation about NodeHealthScriptRunner

2019-09-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937878#comment-16937878
 ] 

Hadoop QA commented on YARN-6715:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
0s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
40s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
38s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 9s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
34s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
2s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}115m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:63396beab41 |
| JIRA Issue | YARN-6715 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981332/YARN-6715-branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0aebf7f41a19 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 

[jira] [Updated] (YARN-6715) Fix documentation about NodeHealthScriptRunner

2019-09-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-6715:
---
Attachment: YARN-6715-branch-3.1.001.patch

> Fix documentation about NodeHealthScriptRunner 
> ---
>
> Key: YARN-6715
> URL: https://issues.apache.org/jira/browse/YARN-6715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-6715-001.patch, YARN-6715-002.patch, 
> YARN-6715-003.patch, YARN-6715-branch-3.1.001.patch, 
> YARN-6715-branch-3.2.001.patch
>
>
> NodeHealthScriptRunner does *not* report a bad health if the script exits 
> with an exit code other than 0. Look at the {{FAILED_WITH_EXIT_CODE}} case:
> {noformat}
> void reportHealthStatus(HealthCheckerExitStatus status) {
>   long now = System.currentTimeMillis();
>   switch (status) {
>   case SUCCESS:
> setHealthStatus(true, "", now);
> break;
>   case TIMED_OUT:
> setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG);
> break;
>   case FAILED_WITH_EXCEPTION:
> setHealthStatus(false, exceptionStackTrace);
> break;
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
>   case FAILED:
> setHealthStatus(false, shexec.getOutput());
> break;
>   }
> }
> {noformat}
> Based on the discussion in YARN-5567, this is intentional, but conflicts with 
> the upstream document, which says: 
> "If the script *exits with a non-zero exit code*, times out or results in an 
> exception being thrown, the node is marked as unhealthy"
> This statement can be extremely misleading and must be corrected. We might 
> also add an extra comment to {{reportHealthStatus()}} which explains that 
> {{FAILED_WITH_EXIT_CODE}} is not buggy.
> This case also lacks unit test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6715) Fix documentation about NodeHealthScriptRunner

2019-09-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-6715:
---
Attachment: YARN-6715-branch-3.2.001.patch

> Fix documentation about NodeHealthScriptRunner 
> ---
>
> Key: YARN-6715
> URL: https://issues.apache.org/jira/browse/YARN-6715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-6715-001.patch, YARN-6715-002.patch, 
> YARN-6715-003.patch, YARN-6715-branch-3.2.001.patch
>
>
> NodeHealthScriptRunner does *not* report a bad health if the script exits 
> with an exit code other than 0. Look at the {{FAILED_WITH_EXIT_CODE}} case:
> {noformat}
> void reportHealthStatus(HealthCheckerExitStatus status) {
>   long now = System.currentTimeMillis();
>   switch (status) {
>   case SUCCESS:
> setHealthStatus(true, "", now);
> break;
>   case TIMED_OUT:
> setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG);
> break;
>   case FAILED_WITH_EXCEPTION:
> setHealthStatus(false, exceptionStackTrace);
> break;
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
>   case FAILED:
> setHealthStatus(false, shexec.getOutput());
> break;
>   }
> }
> {noformat}
> Based on the discussion in YARN-5567, this is intentional, but conflicts with 
> the upstream document, which says: 
> "If the script *exits with a non-zero exit code*, times out or results in an 
> exception being thrown, the node is marked as unhealthy"
> This statement can be extremely misleading and must be corrected. We might 
> also add an extra comment to {{reportHealthStatus()}} which explains that 
> {{FAILED_WITH_EXIT_CODE}} is not buggy.
> This case also lacks unit test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9607) Auto-configuring rollover-size of IFile format for non-appendable filesystems

2019-09-25 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937721#comment-16937721
 ] 

Adam Antal commented on YARN-9607:
--

HADOOP-15691 got committed, working on this can be continued.

> Auto-configuring rollover-size of IFile format for non-appendable filesystems
> -
>
> Key: YARN-9607
> URL: https://issues.apache.org/jira/browse/YARN-9607
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9607.001.patch, YARN-9607.002.patch
>
>
> In YARN-9525, we made IFile format compatible with remote folders with s3a 
> scheme. In rolling fashioned log-aggregation IFile still fails with the 
> "append is not supported" error message, which is a known limitation of the 
> format by design. 
> There is a workaround though: setting the rollover size in the configuration 
> of the IFile format, in each rolling cycle a new aggregated log file will be 
> created, thus we eliminated the append from the process. Setting this config 
> globally would cause performance problems in the regular log-aggregation, so 
> I'm suggesting to enforcing this config to zero, if the scheme of the URI is 
> s3a (or any other non-appendable filesystem).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9011) Race condition during decommissioning

2019-09-25 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937707#comment-16937707
 ] 

Peter Bacsko commented on YARN-9011:


ping [~bibinchundatt], [~adam.antal], [~tangzhankun] 

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9011) Race condition during decommissioning

2019-09-25 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937696#comment-16937696
 ] 

Peter Bacsko commented on YARN-9011:


So I uploaded patch v6 which is contains a completely new approach (without new 
tests so far).

Here is the idea:
 1. If graceful decomissioning was initiated, {{HostsFileReader}} loads the new 
XMLs _lazily_, meaning that "current" is not changed
 2. We process the included/excluded hosts. Excluded hosts are added to an 
internal set called {{gracefulDecommissionableNodes}} (which is cleared at each 
refresh)
 3. Once it's done, we call {{HostsFileReader.finishRefresh()}}, publishing the 
new settings

By doing so, decomissionable hosts can be retrieved from {{NodesListManager}} 
so there's no inconsistency.

There's a trick though: we still have to wait until {{RMNode}} leaves either 
RUNNING or UNHEALTHY state (see the changes in {{isNodeInDecommissioning()}}). 
Once this happened, we no longer need the set of {{RMNode}} instances inside 
{{NodesListManager}} and can rely on {{RMNode.getState()}}.

Please reason about this proposal, whether is acceptable or not.

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9011) Race condition during decommissioning

2019-09-25 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9011:
---
Attachment: YARN-9011-006.patch

> Race condition during decommissioning
> -
>
> Key: YARN-9011
> URL: https://issues.apache.org/jira/browse/YARN-9011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9011-001.patch, YARN-9011-002.patch, 
> YARN-9011-003.patch, YARN-9011-004.patch, YARN-9011-005.patch, 
> YARN-9011-006.patch
>
>
> During internal testing, we found a nasty race condition which occurs during 
> decommissioning.
> Node manager, incorrect behaviour:
> {noformat}
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:00:17,634 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: Disallowed NodeManager nodeId: node-6.hostname.com:8041 
> hostname:node-6.hostname.com
> {noformat}
> Node manager, expected behaviour:
> {noformat}
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Received 
> SHUTDOWN signal from Resourcemanager as part of heartbeat, hence shutting 
> down.
> 2018-06-18 21:07:37,377 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
> ResourceManager: DECOMMISSIONING  node-6.hostname.com:8041 is ready to be 
> decommissioned
> {noformat}
> Note the two different messages from the RM ("Disallowed NodeManager" vs 
> "DECOMMISSIONING"). The problem is that {{ResourceTrackerService}} can see an 
> inconsistent state of nodes while they're being updated:
> {noformat}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: hostsReader 
> include:{172.26.12.198,node-7.hostname.com,node-2.hostname.com,node-5.hostname.com,172.26.8.205,node-8.hostname.com,172.26.23.76,172.26.22.223,node-6.hostname.com,172.26.9.218,node-4.hostname.com,node-3.hostname.com,172.26.13.167,node-9.hostname.com,172.26.21.221,172.26.10.219}
>  exclude:{node-6.hostname.com}
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: Gracefully 
> decommission node node-6.hostname.com:8041 with state RUNNING
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Disallowed NodeManager nodeId: node-6.hostname.com:8041 node: 
> node-6.hostname.com
> 2018-06-18 21:00:17,576 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Put Node 
> node-6.hostname.com:8041 in DECOMMISSIONING.
> 2018-06-18 21:00:17,575 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn 
> IP=172.26.22.115OPERATION=refreshNodes  TARGET=AdminService 
> RESULT=SUCCESS
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Preserve 
> original total capability: 
> 2018-06-18 21:00:17,577 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> node-6.hostname.com:8041 Node Transitioned from RUNNING to DECOMMISSIONING
> {noformat}
> When the decommissioning succeeds, there is no output logged from 
> {{ResourceTrackerService}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically

2019-09-25 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937661#comment-16937661
 ] 

Prabhu Joseph commented on YARN-9462:
-

[~adam.antal] Yes, will test and confirm after the commits.

> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> ---
>
> Key: YARN-9462
> URL: https://issues.apache.org/jira/browse/YARN-9462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 
> TestResourceTrackerService.testNodeRemovalGracefully.txt, YARN-9462-001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9700) Docs about how to migrate from FS to CS config

2019-09-25 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9700:
-
Summary: Docs about how to migrate from FS to CS config  (was: Docs about 
how to migrate from FS to CS)

> Docs about how to migrate from FS to CS config
> --
>
> Key: YARN-9700
> URL: https://issues.apache.org/jira/browse/YARN-9700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: docs
>Reporter: Wanqiang Ji
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9700) Docs about how to migrate from FS to CS

2019-09-25 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9700:
-
Summary: Docs about how to migrate from FS to CS  (was: Docs for how to 
migration from FS to CS)

> Docs about how to migrate from FS to CS
> ---
>
> Key: YARN-9700
> URL: https://issues.apache.org/jira/browse/YARN-9700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: docs
>Reporter: Wanqiang Ji
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9808) Zero length files in container log output haven't got a header

2019-09-25 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937564#comment-16937564
 ] 

Szilard Nemeth commented on YARN-9808:
--

Thanks [~adam.antal] for filing the jira!

> Zero length files in container log output haven't got a header
> --
>
> Key: YARN-9808
> URL: https://issues.apache.org/jira/browse/YARN-9808
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9808.001.patch, YARN-9808.002.patch, 
> YARN-9808.003.patch
>
>
> Using the Yarn logs CLI for containers that have zero length files produces 
> output similar to this:
> {noformat}
> End of LogType:stderr
> ***
> End of LogType:prelaunch.err
> **
> Container: container_e25_1567431105510_0001_01_02 on host-1
> LogAggregationType: AGGREGATED
> ===
> LogType:container.log
> LogLastModifiedTime:Mon Sep 02 06:34:48 -0700 2019
> LogLength:5442
> LogContents:
> ...
> ...
> {noformat}
> Note that stderr and prelaunch.err are both zero length files. Though the 
> output is not misleading, the header is missing.
> I suggest to add the header for zero length files as well, primarily for the 
> following reasons:
> - for applications having multiple files with the same name you may want to 
> distinguish them by host - if many of those are of zero length, you can not 
> extract this information from here. Note that this is a common case for 
> stderr and prelaunch.err.
> - you may want to see the modification time (which corresponds to the 
> creation time of the zero length file)
> - would explicitly display the "LogLength:0" line, which would avoid any 
> confusion from end user side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9856) Remove log-aggregation related duplicate function

2019-09-25 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-9856:


Assignee: Szilard Nemeth

> Remove log-aggregation related duplicate function
> -
>
> Key: YARN-9856
> URL: https://issues.apache.org/jira/browse/YARN-9856
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Szilard Nemeth
>Priority: Trivial
>
> [~snemeth] has noticed a duplication in two of the log-aggregation related 
> functions.
> {quote}I noticed duplicated code in 
> org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, 
> duplicated in 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs.
>  [...]
> {quote}
> We should remove the duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9808) Zero length files in container log output haven't got a header

2019-09-25 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937558#comment-16937558
 ] 

Adam Antal commented on YARN-9808:
--

Thanks for the commit [~snemeth]. Filed YARN-9856 for the concerns.

> Zero length files in container log output haven't got a header
> --
>
> Key: YARN-9808
> URL: https://issues.apache.org/jira/browse/YARN-9808
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9808.001.patch, YARN-9808.002.patch, 
> YARN-9808.003.patch
>
>
> Using the Yarn logs CLI for containers that have zero length files produces 
> output similar to this:
> {noformat}
> End of LogType:stderr
> ***
> End of LogType:prelaunch.err
> **
> Container: container_e25_1567431105510_0001_01_02 on host-1
> LogAggregationType: AGGREGATED
> ===
> LogType:container.log
> LogLastModifiedTime:Mon Sep 02 06:34:48 -0700 2019
> LogLength:5442
> LogContents:
> ...
> ...
> {noformat}
> Note that stderr and prelaunch.err are both zero length files. Though the 
> output is not misleading, the header is missing.
> I suggest to add the header for zero length files as well, primarily for the 
> following reasons:
> - for applications having multiple files with the same name you may want to 
> distinguish them by host - if many of those are of zero length, you can not 
> extract this information from here. Note that this is a common case for 
> stderr and prelaunch.err.
> - you may want to see the modification time (which corresponds to the 
> creation time of the zero length file)
> - would explicitly display the "LogLength:0" line, which would avoid any 
> confusion from end user side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8148) Update decimal values for queue capacities shown on queue status CLI

2019-09-25 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8148:
-
Summary: Update decimal values for queue capacities shown on queue status 
CLI  (was: Update decimal values for queue capacities shown on queue status cli)

> Update decimal values for queue capacities shown on queue status CLI
> 
>
> Key: YARN-8148
> URL: https://issues.apache.org/jira/browse/YARN-8148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-8148-002.patch, YARN-8148.1.patch
>
>
> Capacities are shown with two decimal values in RM UI as part of YARN-6182. 
> The queue status cli are still showing one decimal value.
> {code}
> [root@bigdata3 yarn]# yarn queue -status default
> Queue Information : 
> Queue Name : default
>   State : RUNNING
>   Capacity : 69.9%
>   Current Capacity : .0%
>   Maximum Capacity : 70.0%
>   Default Node Label expression : 
>   Accessible Node Labels : *
>   Preemption : enabled
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9856) Remove log-aggregation related duplicate function

2019-09-25 Thread Adam Antal (Jira)
Adam Antal created YARN-9856:


 Summary: Remove log-aggregation related duplicate function
 Key: YARN-9856
 URL: https://issues.apache.org/jira/browse/YARN-9856
 Project: Hadoop YARN
  Issue Type: Task
  Components: log-aggregation, yarn
Affects Versions: 3.3.0
Reporter: Adam Antal


[~snemeth] has noticed a duplication in two of the log-aggregation related 
functions.
{quote}I noticed duplicated code in 
org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, 
duplicated in 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs.
 [...]
{quote}
We should remove the duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically

2019-09-25 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937540#comment-16937540
 ] 

Adam Antal commented on YARN-9462:
--

YARN-9011 may be related to this. We have to check whether flakiness persists 
after that commits is included in trunk.

> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> ---
>
> Key: YARN-9462
> URL: https://issues.apache.org/jira/browse/YARN-9462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 
> TestResourceTrackerService.testNodeRemovalGracefully.txt, YARN-9462-001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6715) Fix documentation about NodeHealthScriptRunner

2019-09-25 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937535#comment-16937535
 ] 

Hudson commented on YARN-6715:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17381 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17381/])
YARN-6715. Fix documentation about NodeHealthScriptRunner. Contributed 
(snemeth: rev c72457787df33b44a853fceff0cfe180850c4960)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/NodeHealthScriptRunner.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestNodeHealthScriptRunner.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManager.md


> Fix documentation about NodeHealthScriptRunner 
> ---
>
> Key: YARN-6715
> URL: https://issues.apache.org/jira/browse/YARN-6715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-6715-001.patch, YARN-6715-002.patch, 
> YARN-6715-003.patch
>
>
> NodeHealthScriptRunner does *not* report a bad health if the script exits 
> with an exit code other than 0. Look at the {{FAILED_WITH_EXIT_CODE}} case:
> {noformat}
> void reportHealthStatus(HealthCheckerExitStatus status) {
>   long now = System.currentTimeMillis();
>   switch (status) {
>   case SUCCESS:
> setHealthStatus(true, "", now);
> break;
>   case TIMED_OUT:
> setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG);
> break;
>   case FAILED_WITH_EXCEPTION:
> setHealthStatus(false, exceptionStackTrace);
> break;
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
>   case FAILED:
> setHealthStatus(false, shexec.getOutput());
> break;
>   }
> }
> {noformat}
> Based on the discussion in YARN-5567, this is intentional, but conflicts with 
> the upstream document, which says: 
> "If the script *exits with a non-zero exit code*, times out or results in an 
> exception being thrown, the node is marked as unhealthy"
> This statement can be extremely misleading and must be corrected. We might 
> also add an extra comment to {{reportHealthStatus()}} which explains that 
> {{FAILED_WITH_EXIT_CODE}} is not buggy.
> This case also lacks unit test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7291) Better input parsing for resource in allocation file

2019-09-25 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937534#comment-16937534
 ] 

Szilard Nemeth commented on YARN-7291:
--

Ping [~wilfreds], [~sunilg]

> Better input parsing for resource in allocation file
> 
>
> Key: YARN-7291
> URL: https://issues.apache.org/jira/browse/YARN-7291
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Zoltan Siegl
>Priority: Minor
>  Labels: newbie
> Fix For: 3.3.0
>
> Attachments: YARN-7291.001.patch, YARN-7291.002.patch, 
> YARN-7291.003.patch, YARN-7291.004.patch, YARN-7291.005.patch, 
> YARN-7291.005.patch
>
>
> When you set max/min share for queues in fair scheduler allocation file,  
> "1024 mb, 2 4 vcores" is parsed the same as "1024 mb, 4 vcores" without any 
> issue, the same to "50% memory, 50% 100%cpu" which is parsed the same as "50% 
> memory, 100%cpu". That causes confusing. We should fix it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9492) TestRMEmbeddedElector.testCallbackSynchronization fails intermittent

2019-09-25 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937531#comment-16937531
 ] 

Szilard Nemeth commented on YARN-9492:
--

Hi [~Prabhu Joseph]! Do you plan to work this on the foreseeable future or can 
we take it over? 
Thanks!

> TestRMEmbeddedElector.testCallbackSynchronization fails intermittent
> 
>
> Key: YARN-9492
> URL: https://issues.apache.org/jira/browse/YARN-9492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> TestRMEmbeddedElector.testCallbackSynchronization fails intermittent
> {code}
> Error Message
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss
> Stacktrace
> org.apache.hadoop.service.ServiceStateException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector.testCallbackSynchronization(TestRMEmbeddedElector.java:156)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector.testCallbackSynchronization(TestRMEmbeddedElector.java:117)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:53)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1165)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1136)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.connectToZooKeeper(ActiveStandbyElector.java:699)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:853)
>   at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:336)

[jira] [Assigned] (YARN-8078) TestDistributedShell#testDSShellWithoutDomainV2 fails on trunk

2019-09-25 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned YARN-8078:


Assignee: Szilard Nemeth

> TestDistributedShell#testDSShellWithoutDomainV2 fails on trunk
> --
>
> Key: YARN-8078
> URL: https://issues.apache.org/jira/browse/YARN-8078
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Weiwei Yang
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: UT
>
> java.lang.AssertionError: Unexpected number of YARN_CONTAINER_FINISHED event 
> published. 
> Expected :1
> Actual :0
> at org.junit.Assert.failNotEquals(Assert.java:743)
>  at org.junit.Assert.assertEquals(Assert.java:118)
>  at org.junit.Assert.assertEquals(Assert.java:555)
>  at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityForTimelineV2(TestDistributedShell.java:692)
>  at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:584)
>  at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:450)
>  at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:309)
>  at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2(TestDistributedShell.java:305)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6715) Fix documentation about NodeHealthScriptRunner

2019-09-25 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937519#comment-16937519
 ] 

Szilard Nemeth edited comment on YARN-6715 at 9/25/19 8:36 AM:
---

Hi [~pbacsko]!
+1 on the latest patch!
Committed this to trunk!
Do you think it's worthwile to backport to branch-3.2 and branch-3.1 as well?

Thanks!


was (Author: snemeth):
Hi [~pbacsko]!
+1 on the latest patch!
Committing this to trunk!
Do you think it's worthwile to backport to branch-3.2 and branch-3.1 as well?

Thanks!

> Fix documentation about NodeHealthScriptRunner 
> ---
>
> Key: YARN-6715
> URL: https://issues.apache.org/jira/browse/YARN-6715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-6715-001.patch, YARN-6715-002.patch, 
> YARN-6715-003.patch
>
>
> NodeHealthScriptRunner does *not* report a bad health if the script exits 
> with an exit code other than 0. Look at the {{FAILED_WITH_EXIT_CODE}} case:
> {noformat}
> void reportHealthStatus(HealthCheckerExitStatus status) {
>   long now = System.currentTimeMillis();
>   switch (status) {
>   case SUCCESS:
> setHealthStatus(true, "", now);
> break;
>   case TIMED_OUT:
> setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG);
> break;
>   case FAILED_WITH_EXCEPTION:
> setHealthStatus(false, exceptionStackTrace);
> break;
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
>   case FAILED:
> setHealthStatus(false, shexec.getOutput());
> break;
>   }
> }
> {noformat}
> Based on the discussion in YARN-5567, this is intentional, but conflicts with 
> the upstream document, which says: 
> "If the script *exits with a non-zero exit code*, times out or results in an 
> exception being thrown, the node is marked as unhealthy"
> This statement can be extremely misleading and must be corrected. We might 
> also add an extra comment to {{reportHealthStatus()}} which explains that 
> {{FAILED_WITH_EXIT_CODE}} is not buggy.
> This case also lacks unit test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6715) Fix documentation about NodeHealthScriptRunner

2019-09-25 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937519#comment-16937519
 ] 

Szilard Nemeth commented on YARN-6715:
--

Hi [~pbacsko]!
+1 on the latest patch!
Committing this to trunk!
Do you think it's worthwile to backport to branch-3.2 and branch-3.1 as well?

Thanks!

> Fix documentation about NodeHealthScriptRunner 
> ---
>
> Key: YARN-6715
> URL: https://issues.apache.org/jira/browse/YARN-6715
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-6715-001.patch, YARN-6715-002.patch, 
> YARN-6715-003.patch
>
>
> NodeHealthScriptRunner does *not* report a bad health if the script exits 
> with an exit code other than 0. Look at the {{FAILED_WITH_EXIT_CODE}} case:
> {noformat}
> void reportHealthStatus(HealthCheckerExitStatus status) {
>   long now = System.currentTimeMillis();
>   switch (status) {
>   case SUCCESS:
> setHealthStatus(true, "", now);
> break;
>   case TIMED_OUT:
> setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG);
> break;
>   case FAILED_WITH_EXCEPTION:
> setHealthStatus(false, exceptionStackTrace);
> break;
>   case FAILED_WITH_EXIT_CODE:
> setHealthStatus(true, "", now);
> break;
>   case FAILED:
> setHealthStatus(false, shexec.getOutput());
> break;
>   }
> }
> {noformat}
> Based on the discussion in YARN-5567, this is intentional, but conflicts with 
> the upstream document, which says: 
> "If the script *exits with a non-zero exit code*, times out or results in an 
> exception being thrown, the node is marked as unhealthy"
> This statement can be extremely misleading and must be corrected. We might 
> also add an extra comment to {{reportHealthStatus()}} which explains that 
> {{FAILED_WITH_EXIT_CODE}} is not buggy.
> This case also lacks unit test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9808) Zero length files in container log output haven't got a header

2019-09-25 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937517#comment-16937517
 ] 

Hudson commented on YARN-9808:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17380 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17380/])
YARN-9808. Zero length files in container log output haven't got a (snemeth: 
rev bec0864394fbf30d7979bb7359dc0b5403731c0c)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestNMWebServices.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/filecontroller/ifile/TestLogAggregationIndexedFileController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogToolUtils.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> Zero length files in container log output haven't got a header
> --
>
> Key: YARN-9808
> URL: https://issues.apache.org/jira/browse/YARN-9808
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9808.001.patch, YARN-9808.002.patch, 
> YARN-9808.003.patch
>
>
> Using the Yarn logs CLI for containers that have zero length files produces 
> output similar to this:
> {noformat}
> End of LogType:stderr
> ***
> End of LogType:prelaunch.err
> **
> Container: container_e25_1567431105510_0001_01_02 on host-1
> LogAggregationType: AGGREGATED
> ===
> LogType:container.log
> LogLastModifiedTime:Mon Sep 02 06:34:48 -0700 2019
> LogLength:5442
> LogContents:
> ...
> ...
> {noformat}
> Note that stderr and prelaunch.err are both zero length files. Though the 
> output is not misleading, the header is missing.
> I suggest to add the header for zero length files as well, primarily for the 
> following reasons:
> - for applications having multiple files with the same name you may want to 
> distinguish them by host - if many of those are of zero length, you can not 
> extract this information from here. Note that this is a common case for 
> stderr and prelaunch.err.
> - you may want to see the modification time (which corresponds to the 
> creation time of the zero length file)
> - would explicitly display the "LogLength:0" line, which would avoid any 
> confusion from end user side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9808) Zero length files in container log output haven't got a header

2019-09-25 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937515#comment-16937515
 ] 

Szilard Nemeth commented on YARN-9808:
--

Thanks [~adam.antal] for this patch! 
Really appreciate that you extracted the formatContainerLogHeader method in 
LogToolUtils and that you spent time to extend the testcases as [~shuzirra] 
said before.
+1 on the latest patch, committing this to trunk soon!

[~adam.antal]: 
I noticed duplicated code in 
org.apache.hadoop.yarn.logaggregation.LogToolUtils#outputContainerLog, 
duplicated in 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat.LogReader#readContainerLogs.
 This is not related to your patch, but could you please file a jira for that? 
Most likely we could have similar jiras upcoming as the code here is very 
bloated and dirty.

> Zero length files in container log output haven't got a header
> --
>
> Key: YARN-9808
> URL: https://issues.apache.org/jira/browse/YARN-9808
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: log-aggregation, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9808.001.patch, YARN-9808.002.patch, 
> YARN-9808.003.patch
>
>
> Using the Yarn logs CLI for containers that have zero length files produces 
> output similar to this:
> {noformat}
> End of LogType:stderr
> ***
> End of LogType:prelaunch.err
> **
> Container: container_e25_1567431105510_0001_01_02 on host-1
> LogAggregationType: AGGREGATED
> ===
> LogType:container.log
> LogLastModifiedTime:Mon Sep 02 06:34:48 -0700 2019
> LogLength:5442
> LogContents:
> ...
> ...
> {noformat}
> Note that stderr and prelaunch.err are both zero length files. Though the 
> output is not misleading, the header is missing.
> I suggest to add the header for zero length files as well, primarily for the 
> following reasons:
> - for applications having multiple files with the same name you may want to 
> distinguish them by host - if many of those are of zero length, you can not 
> extract this information from here. Note that this is a common case for 
> stderr and prelaunch.err.
> - you may want to see the modification time (which corresponds to the 
> creation time of the zero length file)
> - would explicitly display the "LogLength:0" line, which would avoid any 
> confusion from end user side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7

2019-09-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937514#comment-16937514
 ] 

Hadoop QA commented on YARN-9855:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 39m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.7 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
59s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} branch-2.7 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
19s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:b93746a0168 |
| JIRA Issue | YARN-9855 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981287/YARN-9855-branch-2.7.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 0e0e4e9ca822 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2.7 / 6079107 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| Multi-JDK versions |  /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 
/usr/lib/jvm/java-8-openjdk-amd64:1.8.0_222 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24831/testReport/ |
| Max. process+thread count | 84 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24831/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7
> -
>
> Key: YARN-9855
> URL: https://issues.apache.org/jira/browse/YARN-9855
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>

[jira] [Commented] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7

2019-09-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937504#comment-16937504
 ] 

Hadoop QA commented on YARN-9855:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 31m 
56s{color} | {color:red} Docker failed to build yetus/hadoop:06eafeedf12. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9855 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12981287/YARN-9855-branch-2.7.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24832/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7
> -
>
> Key: YARN-9855
> URL: https://issues.apache.org/jira/browse/YARN-9855
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9855-branch-2.7.001.patch, 
> YARN-9855-branch-2.7.001.patch, YARN-9855-branch-2.8.001.patch
>
>
> As per 
> [http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201909.mbox/%3cCAAaVJWUKTBXEYV_-yWs2PT8aqhjQXq=garav+yzjxq0nx36...@mail.gmail.com%3e].
>  Update this field to use the same id as in branch-2.9 and above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7

2019-09-25 Thread Bibin A Chundatt (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9855:
---
Attachment: YARN-9855-branch-2.7.001.patch

> Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7
> -
>
> Key: YARN-9855
> URL: https://issues.apache.org/jira/browse/YARN-9855
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9855-branch-2.7.001.patch, 
> YARN-9855-branch-2.7.001.patch, YARN-9855-branch-2.8.001.patch
>
>
> As per 
> [http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201909.mbox/%3cCAAaVJWUKTBXEYV_-yWs2PT8aqhjQXq=garav+yzjxq0nx36...@mail.gmail.com%3e].
>  Update this field to use the same id as in branch-2.9 and above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7

2019-09-25 Thread Bibin A Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937482#comment-16937482
 ] 

Bibin A Chundatt commented on YARN-9855:


Uploaded  again 2.7 patch to trigger jenkins

> Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7
> -
>
> Key: YARN-9855
> URL: https://issues.apache.org/jira/browse/YARN-9855
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9855-branch-2.7.001.patch, 
> YARN-9855-branch-2.7.001.patch, YARN-9855-branch-2.8.001.patch
>
>
> As per 
> [http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201909.mbox/%3cCAAaVJWUKTBXEYV_-yWs2PT8aqhjQXq=garav+yzjxq0nx36...@mail.gmail.com%3e].
>  Update this field to use the same id as in branch-2.9 and above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7

2019-09-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937479#comment-16937479
 ] 

Hadoop QA commented on YARN-9855:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-9855-branch-2.7 does not apply to branch-2.7. Rebase 
required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9855 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24830/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7
> -
>
> Key: YARN-9855
> URL: https://issues.apache.org/jira/browse/YARN-9855
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9855-branch-2.7.001.patch, 
> YARN-9855-branch-2.8.001.patch
>
>
> As per 
> [http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201909.mbox/%3cCAAaVJWUKTBXEYV_-yWs2PT8aqhjQXq=garav+yzjxq0nx36...@mail.gmail.com%3e].
>  Update this field to use the same id as in branch-2.9 and above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9855) Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7

2019-09-25 Thread Bibin A Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937474#comment-16937474
 ] 

Bibin A Chundatt commented on YARN-9855:


Thank you [~ebadger] for finding issue and [~jhung] for handling the issue

+1 LGTM.



> Fix ApplicationReportProto submitTime id in branch-2.8/branch-2.7
> -
>
> Key: YARN-9855
> URL: https://issues.apache.org/jira/browse/YARN-9855
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9855-branch-2.7.001.patch, 
> YARN-9855-branch-2.8.001.patch
>
>
> As per 
> [http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201909.mbox/%3cCAAaVJWUKTBXEYV_-yWs2PT8aqhjQXq=garav+yzjxq0nx36...@mail.gmail.com%3e].
>  Update this field to use the same id as in branch-2.9 and above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9851) Make execution type check compatiable

2019-09-25 Thread Bibin A Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937117#comment-16937117
 ] 

Bibin A Chundatt edited comment on YARN-9851 at 9/25/19 7:10 AM:
-

[~cane]

YARN-9547 didn't fix this issue ?  Could you check with the patch applied.


was (Author: bibinchundatt):
YARN-9547 didn't fix this issue ?

> Make execution type check compatiable
> -
>
> Key: YARN-9851
> URL: https://issues.apache.org/jira/browse/YARN-9851
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9851-001.patch
>
>
> During upgrade from 2.6 to 3.1, we encountered a problem:
> {code:java}
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568719110875_6460_08_01, status: RUNNING, 
> execution type: null
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568886618758_11172_01_62, status: RUNNING, 
> execution type: null
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568886618758_11172_01_63, status: RUNNING, 
> execution type: null
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568886618758_11172_01_64, status: RUNNING, 
> execution type: null
> 2019-09-23,19:29:05,303 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Lost 
> container container_e35_1568886618758_30617_01_06, status: RUNNING, 
> execution type: null
> for (ContainerStatus remoteContainer : containerStatuses) {
>   if (remoteContainer.getState() == ContainerState.RUNNING
>   && remoteContainer.getExecutionType() == ExecutionType.GUARANTEED) {
> nodeContainers.add(remoteContainer.getContainerId());
>   } else {
> LOG.warn("Lost container " + remoteContainer.getContainerId()
> + ", status: " + remoteContainer.getState()
> + ", execution type: " + remoteContainer.getExecutionType());
>   }
> }​
> {code}
> The cause is that we has nm with version 2.6, which do not have executionType 
> for container status.
> We should check here make the upgrade process more tranparently



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org