[jira] [Commented] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls

2019-09-09 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926318#comment-16926318
 ] 

Akira Ajisaka commented on YARN-9812:
-

Thanks [~abmodi] for the fix and thanks [~elgoiri] for review!

> mvn javadoc:javadoc fails in hadoop-sls
> ---
>
> Key: YARN-9812
> URL: https://issues.apache.org/jira/browse/YARN-9812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
> Attachments: YARN-9812.001.patch, YARN-9812.002.patch
>
>
> {noformat}
> [ERROR] 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57:
>  error: bad use of '>'
> [ERROR]  * pending -> requests which are NOT yet sent to RM.
> [ERROR] ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58:
>  error: bad use of '>'
> [ERROR]  * scheduled -> requests which are sent to RM but not yet assigned.
> [ERROR]   ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59:
>  error: bad use of '>'
> [ERROR]  * assigned -> requests which are assigned to a container.
> [ERROR]  ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60:
>  error: bad use of '>'
> [ERROR]  * completed -> request corresponding to which container has 
> completed.
> [ERROR]   ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9825) Changes for initializing placement rules with ResourceScheduler in branch-2

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9825:

Description: YARN-8016 and YARN-8948 add functionality to initialize 
placement rules with ResourceScheduler. We need this in branch-2, but it 
doesn't apply cleanly. Hence we just port the initialization logic.  (was: 
YARN-8016 and YARN-8948 add functionality to initialize placement rules with 
ResourceScheduler. We need this in branch-2.)

> Changes for initializing placement rules with ResourceScheduler in branch-2
> ---
>
> Key: YARN-9825
> URL: https://issues.apache.org/jira/browse/YARN-9825
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
>
> YARN-8016 and YARN-8948 add functionality to initialize placement rules with 
> ResourceScheduler. We need this in branch-2, but it doesn't apply cleanly. 
> Hence we just port the initialization logic.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9825) Changes for initializing placement rules with ResourceScheduler in branch-2

2019-09-09 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9825:
---

 Summary: Changes for initializing placement rules with 
ResourceScheduler in branch-2
 Key: YARN-9825
 URL: https://issues.apache.org/jira/browse/YARN-9825
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung
Assignee: Jonathan Hung


YARN-8016 and YARN-8948 add functionality to initialize placement rules with 
ResourceScheduler. We need this in branch-2.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8541) RM startup failure on recovery after user deletion

2019-09-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926298#comment-16926298
 ] 

Jonathan Hung edited comment on YARN-8541 at 9/10/19 3:23 AM:
--

I've ported YARN-8016  to branch-3.1 which introduces TestPlacementManager. 
Hence I have also ported TestPlacementManager changes from branch-3.2 to 
branch-3.1. Attached [^YARN-8541-branch-3.1.003.patch.addendum] for 
completeness.


was (Author: jhung):
I've ported the TestPlacementManager changes from branch-3.2 to branch-3.1. 
Attached [^YARN-8541-branch-3.1.003.patch.addendum] for completeness.

> RM startup failure on recovery after user deletion
> --
>
> Key: YARN-8541
> URL: https://issues.apache.org/jira/browse/YARN-8541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: yimeng
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8541-branch-3.1.003.patch, 
> YARN-8541-branch-3.1.003.patch.addendum, YARN-8541.001.patch, 
> YARN-8541.002.patch, YARN-8541.003.patch
>
>
> My hadoop version 3.1.0. I found that  a problem RM startup failure on 
> recovery as the follow test step:
> 1.create a user "user1" have the permisson to submit app.
> 2.use user1 to submit a job ,wait job finished.
> 3.delete user "user1"
> 4.restart yarn 
> 5.the RM restart failed
> RM logs:
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue 
> root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.0, numApps=0, 
> numContainers=0 | CapacitySchedulerQueueManager.java:163
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue 
> mappings, override: false | UserGroupMappingPlacementRule.java:232
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized 
> CapacityScheduler with calculator=class 
> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, 
> minimumAllocation=<>, maximumAllocation=< vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | 
> CapacityScheduler.java:392
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not 
> found | Configuration.java:2767
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS 
> Processing chain. Root 
> Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor].
>  | AMSProcessingChain.java:62
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement 
> handler will be used, all scheduling requests will be rejected. | 
> ApplicationMasterService.java:130
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding 
> [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor]
>  tp top of AMS Processing chain. | AMSProcessingChain.java:75
> 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the 
> winning of election | ActiveStandbyElector.java:897
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application 
> application_1531624956005_0001 submitted by user super reason: No groups 
> found for user super
>  at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241)
>  at java.security.AccessController.doPrivileged(Native 

[jira] [Commented] (YARN-8541) RM startup failure on recovery after user deletion

2019-09-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926298#comment-16926298
 ] 

Jonathan Hung commented on YARN-8541:
-

I've ported the TestPlacementManager changes from branch-3.2 to branch-3.1. 
Attached [^YARN-8541-branch-3.1.003.patch.addendum] for completeness.

> RM startup failure on recovery after user deletion
> --
>
> Key: YARN-8541
> URL: https://issues.apache.org/jira/browse/YARN-8541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: yimeng
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8541-branch-3.1.003.patch, 
> YARN-8541-branch-3.1.003.patch.addendum, YARN-8541.001.patch, 
> YARN-8541.002.patch, YARN-8541.003.patch
>
>
> My hadoop version 3.1.0. I found that  a problem RM startup failure on 
> recovery as the follow test step:
> 1.create a user "user1" have the permisson to submit app.
> 2.use user1 to submit a job ,wait job finished.
> 3.delete user "user1"
> 4.restart yarn 
> 5.the RM restart failed
> RM logs:
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue 
> root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.0, numApps=0, 
> numContainers=0 | CapacitySchedulerQueueManager.java:163
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue 
> mappings, override: false | UserGroupMappingPlacementRule.java:232
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized 
> CapacityScheduler with calculator=class 
> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, 
> minimumAllocation=<>, maximumAllocation=< vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | 
> CapacityScheduler.java:392
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not 
> found | Configuration.java:2767
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS 
> Processing chain. Root 
> Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor].
>  | AMSProcessingChain.java:62
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement 
> handler will be used, all scheduling requests will be rejected. | 
> ApplicationMasterService.java:130
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding 
> [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor]
>  tp top of AMS Processing chain. | AMSProcessingChain.java:75
> 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the 
> winning of election | ActiveStandbyElector.java:897
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application 
> application_1531624956005_0001 submitted by user super reason: No groups 
> found for user super
>  at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1241)
>  at 
> 

[jira] [Comment Edited] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2019-09-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926257#comment-16926257
 ] 

Jonathan Hung edited comment on YARN-8948 at 9/10/19 3:20 AM:
--

FYI I have pushed this to branch-3.2 and branch-3.1.


was (Author: jhung):
FYI I have pushed this to branch-3.2.

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch, YARN-8948.004.patch, YARN-8948.005.patch, 
> YARN-8948.006.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8541) RM startup failure on recovery after user deletion

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8541:

Attachment: YARN-8541-branch-3.1.003.patch.addendum

> RM startup failure on recovery after user deletion
> --
>
> Key: YARN-8541
> URL: https://issues.apache.org/jira/browse/YARN-8541
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: yimeng
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8541-branch-3.1.003.patch, 
> YARN-8541-branch-3.1.003.patch.addendum, YARN-8541.001.patch, 
> YARN-8541.002.patch, YARN-8541.003.patch
>
>
> My hadoop version 3.1.0. I found that  a problem RM startup failure on 
> recovery as the follow test step:
> 1.create a user "user1" have the permisson to submit app.
> 2.use user1 to submit a job ,wait job finished.
> 3.delete user "user1"
> 4.restart yarn 
> 5.the RM restart failed
> RM logs:
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue 
> root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, 
> usedResources=usedCapacity=0.0, numApps=0, 
> numContainers=0 | CapacitySchedulerQueueManager.java:163
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue 
> mappings, override: false | UserGroupMappingPlacementRule.java:232
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized 
> CapacityScheduler with calculator=class 
> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, 
> minimumAllocation=<>, maximumAllocation=< vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | 
> CapacityScheduler.java:392
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not 
> found | Configuration.java:2767
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS 
> Processing chain. Root 
> Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor].
>  | AMSProcessingChain.java:62
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement 
> handler will be used, all scheduling requests will be rejected. | 
> ApplicationMasterService.java:130
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding 
> [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor]
>  tp top of AMS Processing chain. | AMSProcessingChain.java:75
> 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the 
> winning of election | ActiveStandbyElector.java:897
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
>  at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>  at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application 
> application_1531624956005_0001 submitted by user super reason: No groups 
> found for user super
>  at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1241)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  ... 5 more
> Caused by: 

[jira] [Commented] (YARN-8361) Change App Name Placement Rule to use App Name instead of App Id for configuration

2019-09-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926297#comment-16926297
 ] 

Jonathan Hung commented on YARN-8361:
-

I've ported this to branch-3.1.

> Change App Name Placement Rule to use App Name instead of App Id for 
> configuration
> --
>
> Key: YARN-8361
> URL: https://issues.apache.org/jira/browse/YARN-8361
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.4
>
> Attachments: YARN-8361.001.patch, YARN-8361.002.patch, 
> YARN-8361.003.patch
>
>
> in YARN-8016, we expose a framework to let user specify custom placement rule 
> through CS configuration, and also add a new placement rule which is mapping 
> specific app with queues. However, the strategy implemented in YARN-8016 was 
> using application id which is hard for user to use this config. In this JIRA, 
> we are changing the mapping to use application name. More specifically, 
> 1. AppNamePlacementRule used app id while specifying queue mapping placement 
> rules, should change to app name
> 2. Change documentation as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8948:

Fix Version/s: 3.1.4

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch, YARN-8948.004.patch, YARN-8948.005.patch, 
> YARN-8948.006.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8361) Change App Name Placement Rule to use App Name instead of App Id for configuration

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8361:

Fix Version/s: 3.1.4

> Change App Name Placement Rule to use App Name instead of App Id for 
> configuration
> --
>
> Key: YARN-8361
> URL: https://issues.apache.org/jira/browse/YARN-8361
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.4
>
> Attachments: YARN-8361.001.patch, YARN-8361.002.patch, 
> YARN-8361.003.patch
>
>
> in YARN-8016, we expose a framework to let user specify custom placement rule 
> through CS configuration, and also add a new placement rule which is mapping 
> specific app with queues. However, the strategy implemented in YARN-8016 was 
> using application id which is hard for user to use this config. In this JIRA, 
> we are changing the mapping to use application name. More specifically, 
> 1. AppNamePlacementRule used app id while specifying queue mapping placement 
> rules, should change to app name
> 2. Change documentation as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8016) Refine PlacementRule interface and add a app-name queue mapping rule as an example

2019-09-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926295#comment-16926295
 ] 

Jonathan Hung commented on YARN-8016:
-

I've ported this to branch-3.1.

> Refine PlacementRule interface and add a app-name queue mapping rule as an 
> example
> --
>
> Key: YARN-8016
> URL: https://issues.apache.org/jira/browse/YARN-8016
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.4
>
> Attachments: YARN-8016.001.patch, YARN-8016.002.patch, 
> YARN-8016.003.patch, YARN-8016.004.patch, YARN-8016.005.patch
>
>
> After YARN-3635/YARN-6689, PlacementRule becomes a common interface which can 
> be used by scheduler and can be dynamically updated by scheduler according to 
> configs. There're some other works. 
> - There's no way to initialize PlacementRule.
> - No example of PlacementRule except the user-group mapping one.
> This JIRA is targeted to refine PlacementRule interfaces and add another 
> PlacementRule example.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8016) Refine PlacementRule interface and add a app-name queue mapping rule as an example

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8016:

Fix Version/s: 3.1.4

> Refine PlacementRule interface and add a app-name queue mapping rule as an 
> example
> --
>
> Key: YARN-8016
> URL: https://issues.apache.org/jira/browse/YARN-8016
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.4
>
> Attachments: YARN-8016.001.patch, YARN-8016.002.patch, 
> YARN-8016.003.patch, YARN-8016.004.patch, YARN-8016.005.patch
>
>
> After YARN-3635/YARN-6689, PlacementRule becomes a common interface which can 
> be used by scheduler and can be dynamically updated by scheduler according to 
> configs. There're some other works. 
> - There's no way to initialize PlacementRule.
> - No example of PlacementRule except the user-group mapping one.
> This JIRA is targeted to refine PlacementRule interfaces and add another 
> PlacementRule example.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9762) Add submission context label to audit logs

2019-09-09 Thread Manoj Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926293#comment-16926293
 ] 

Manoj Kumar commented on YARN-9762:
---

Refactoring is done to handle some of checkstyle errors for number of method 
argument.  

> Add submission context label to audit logs
> --
>
> Key: YARN-9762
> URL: https://issues.apache.org/jira/browse/YARN-9762
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Manoj Kumar
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9762.01.patch
>
>
> Currently we log NODELABEL in container allocation/release audit logs, we 
> should also log NODELABEL of application submission context on app submission.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8948:

Fix Version/s: 3.2.2

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch, YARN-8948.004.patch, YARN-8948.005.patch, 
> YARN-8948.006.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8948) PlacementRule interface should be for all YarnSchedulers

2019-09-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926257#comment-16926257
 ] 

Jonathan Hung commented on YARN-8948:
-

FYI I have pushed this to branch-3.2.

> PlacementRule interface should be for all YarnSchedulers
> 
>
> Key: YARN-8948
> URL: https://issues.apache.org/jira/browse/YARN-8948
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-8948.001.patch, YARN-8948.002.patch, 
> YARN-8948.003.patch, YARN-8948.004.patch, YARN-8948.005.patch, 
> YARN-8948.006.patch
>
>
> *Issue 1:*
> YARN-3635 intention was to add PlacementRule interface common for all 
> YarnSchedules.
> {code}
> 33  public abstract boolean initialize(
> 34  CapacitySchedulerContext schedulerContext) throws IOException;
> {code}
> PlacementRule initialization is done using CapacitySchedulerContext binding 
> to CapacityScheduler



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9824) Fall back to configured queue ordering policy class name

2019-09-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926206#comment-16926206
 ] 

Jonathan Hung commented on YARN-9824:
-

Test failure is related to YARN-9492

> Fall back to configured queue ordering policy class name
> 
>
> Key: YARN-9824
> URL: https://issues.apache.org/jira/browse/YARN-9824
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9824.001.patch
>
>
> Currently this is how configured queue ordering policy is determined:
> {noformat}
> if (policyType.trim().equals(QUEUE_UTILIZATION_ORDERING_POLICY)) {
>   // Doesn't respect priority
>   qop = new PriorityUtilizationQueueOrderingPolicy(false);
> } else if (policyType.trim().equals(
> QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY)) {
>   qop = new PriorityUtilizationQueueOrderingPolicy(true);
> } else {
>   String message =
>   "Unable to construct queue ordering policy=" + policyType + " queue="
>   + queue;
>   throw new YarnRuntimeException(message);
> } {noformat}
> If we want to enable a policy which is not QUEUE_UTILIZATION_ORDERING_POLICY 
> or QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY, it requires code change here 
> to add a keyword for this policy.
> It'd be easier if the admin could configure a class name here instead of 
> requiring a keyword.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9824) Fall back to configured queue ordering policy class name

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926204#comment-16926204
 ] 

Hadoop QA commented on YARN-9824:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 20s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9824 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979891/YARN-9824.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 83fd702c13a1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d69b811 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24780/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24780/testReport/ |
| Max. process+thread count | 808 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926172#comment-16926172
 ] 

Hadoop QA commented on YARN-9819:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 91m  
7s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 16 unchanged - 1 fixed = 16 total (was 17) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 41s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}247m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9819 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979876/YARN-9819.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ae51c492183a 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 469165e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24779/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24779/testReport/ |
| Max. process+thread count | 841 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (YARN-9815) ReservationACLsTestBase fails with NPE

2019-09-09 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926099#comment-16926099
 ] 

Jim Brennan commented on YARN-9815:
---

[~ahussein], I think a better solution would be to just add a null check for 
acls in ReservationAclsManager.checkAccess().

 

> ReservationACLsTestBase fails with NPE
> --
>
> Key: YARN-9815
> URL: https://issues.apache.org/jira/browse/YARN-9815
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9805.001.patch, YARN-9815.001.patch
>
>
> Running ReservationACLsTestBase throws a NPE running the FairScheduler. Old 
> revisions back in 2016 also throw NPE.
> In the test case, QueueC does not have reserveACLs, so 
> ReservationsACLsManager would throw NPE when it tries to access the ACL on 
> line 82.
> I still could not find what was the first revision that caused this test case 
> to fail. I stopped at bbfaf3c2712c9ba82b0f8423bdeb314bf505a692 which was 
> working fine.
> I have OsX with java 1.8.0_201
>  
> {code:java}
> [ERROR] 
> testApplicationACLs[1](org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase)
>   Time elapsed: 1.897 s  <<< ERROR![ERROR] 
> testApplicationACLs[1](org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase)
>   Time elapsed: 1.897 s  <<< 
> ERROR!java.lang.NullPointerException:java.lang.NullPointerException at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ReservationsACLsManager.checkAccess(ReservationsACLsManager.java:83)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkReservationACLs(ClientRMService.java:1527)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1290)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:511)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:645)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitReservation(ApplicationClientProtocolPBClientImpl.java:511)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase.submitReservation(ReservationACLsTestBase.java:447)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase.verifySubmitReservationSuccess(ReservationACLsTestBase.java:247)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase.testApplicationACLs(ReservationACLsTestBase.java:125)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
> 

[jira] [Updated] (YARN-9770) Create a queue ordering policy which picks child queues with equal probability

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9770:

Labels:   (was: release-blocker)

> Create a queue ordering policy which picks child queues with equal probability
> --
>
> Key: YARN-9770
> URL: https://issues.apache.org/jira/browse/YARN-9770
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9770.001.patch, YARN-9770.002.patch, 
> YARN-9770.003.patch, activeUsers_overlay.png
>
>
> Ran some simulations with the default queue_utilization_ordering_policy:
> An underutilized queue which receives an application with many (thousands) 
> resource requests will hog scheduler allocations for a long time (on the 
> order of a minute). In the meantime apps are getting submitted to all other 
> queues, which increases activeUsers in these queues, which drops user limit 
> in these queues to small values if minimum-user-limit-percent is configured 
> to small values (e.g. 10%).
> To avoid this issue, we assign to queues with equal probability, to avoid 
> scenarios where queues don't get allocations for a long time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9824) Fall back to configured queue ordering policy class name

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9824:

Attachment: YARN-9824.001.patch

> Fall back to configured queue ordering policy class name
> 
>
> Key: YARN-9824
> URL: https://issues.apache.org/jira/browse/YARN-9824
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9824.001.patch
>
>
> Currently this is how configured queue ordering policy is determined:
> {noformat}
> if (policyType.trim().equals(QUEUE_UTILIZATION_ORDERING_POLICY)) {
>   // Doesn't respect priority
>   qop = new PriorityUtilizationQueueOrderingPolicy(false);
> } else if (policyType.trim().equals(
> QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY)) {
>   qop = new PriorityUtilizationQueueOrderingPolicy(true);
> } else {
>   String message =
>   "Unable to construct queue ordering policy=" + policyType + " queue="
>   + queue;
>   throw new YarnRuntimeException(message);
> } {noformat}
> If we want to enable a policy which is not QUEUE_UTILIZATION_ORDERING_POLICY 
> or QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY, it requires code change here 
> to add a keyword for this policy.
> It'd be easier if the admin could configure a class name here instead of 
> requiring a keyword.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9824) Fall back to configured queue ordering policy class name

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9824:

Attachment: (was: YARN-9824.001.patch)

> Fall back to configured queue ordering policy class name
> 
>
> Key: YARN-9824
> URL: https://issues.apache.org/jira/browse/YARN-9824
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
>
> Currently this is how configured queue ordering policy is determined:
> {noformat}
> if (policyType.trim().equals(QUEUE_UTILIZATION_ORDERING_POLICY)) {
>   // Doesn't respect priority
>   qop = new PriorityUtilizationQueueOrderingPolicy(false);
> } else if (policyType.trim().equals(
> QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY)) {
>   qop = new PriorityUtilizationQueueOrderingPolicy(true);
> } else {
>   String message =
>   "Unable to construct queue ordering policy=" + policyType + " queue="
>   + queue;
>   throw new YarnRuntimeException(message);
> } {noformat}
> If we want to enable a policy which is not QUEUE_UTILIZATION_ORDERING_POLICY 
> or QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY, it requires code change here 
> to add a keyword for this policy.
> It'd be easier if the admin could configure a class name here instead of 
> requiring a keyword.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9824) Fall back to configured queue ordering policy class name

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9824:

Attachment: YARN-9824.001.patch

> Fall back to configured queue ordering policy class name
> 
>
> Key: YARN-9824
> URL: https://issues.apache.org/jira/browse/YARN-9824
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9824.001.patch
>
>
> Currently this is how configured queue ordering policy is determined:
> {noformat}
> if (policyType.trim().equals(QUEUE_UTILIZATION_ORDERING_POLICY)) {
>   // Doesn't respect priority
>   qop = new PriorityUtilizationQueueOrderingPolicy(false);
> } else if (policyType.trim().equals(
> QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY)) {
>   qop = new PriorityUtilizationQueueOrderingPolicy(true);
> } else {
>   String message =
>   "Unable to construct queue ordering policy=" + policyType + " queue="
>   + queue;
>   throw new YarnRuntimeException(message);
> } {noformat}
> If we want to enable a policy which is not QUEUE_UTILIZATION_ORDERING_POLICY 
> or QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY, it requires code change here 
> to add a keyword for this policy.
> It'd be easier if the admin could configure a class name here instead of 
> requiring a keyword.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9824) Fall back to configured queue ordering policy class name

2019-09-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung reassigned YARN-9824:
---

Assignee: Jonathan Hung

> Fall back to configured queue ordering policy class name
> 
>
> Key: YARN-9824
> URL: https://issues.apache.org/jira/browse/YARN-9824
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
>
> Currently this is how configured queue ordering policy is determined:
> {noformat}
> if (policyType.trim().equals(QUEUE_UTILIZATION_ORDERING_POLICY)) {
>   // Doesn't respect priority
>   qop = new PriorityUtilizationQueueOrderingPolicy(false);
> } else if (policyType.trim().equals(
> QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY)) {
>   qop = new PriorityUtilizationQueueOrderingPolicy(true);
> } else {
>   String message =
>   "Unable to construct queue ordering policy=" + policyType + " queue="
>   + queue;
>   throw new YarnRuntimeException(message);
> } {noformat}
> If we want to enable a policy which is not QUEUE_UTILIZATION_ORDERING_POLICY 
> or QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY, it requires code change here 
> to add a keyword for this policy.
> It'd be easier if the admin could configure a class name here instead of 
> requiring a keyword.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9822) TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down.

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926081#comment-16926081
 ] 

Hadoop QA commented on YARN-9822:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
55s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
37s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
39s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-yarn-server-timelineservice-documentstore in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9822 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979877/YARN-9822-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4112cb28551e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | 

[jira] [Created] (YARN-9824) Fall back to configured queue ordering policy class name

2019-09-09 Thread Jonathan Hung (Jira)
Jonathan Hung created YARN-9824:
---

 Summary: Fall back to configured queue ordering policy class name
 Key: YARN-9824
 URL: https://issues.apache.org/jira/browse/YARN-9824
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jonathan Hung


Currently this is how configured queue ordering policy is determined:
{noformat}
if (policyType.trim().equals(QUEUE_UTILIZATION_ORDERING_POLICY)) {
  // Doesn't respect priority
  qop = new PriorityUtilizationQueueOrderingPolicy(false);
} else if (policyType.trim().equals(
QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY)) {
  qop = new PriorityUtilizationQueueOrderingPolicy(true);
} else {
  String message =
  "Unable to construct queue ordering policy=" + policyType + " queue="
  + queue;
  throw new YarnRuntimeException(message);
} {noformat}
If we want to enable a policy which is not QUEUE_UTILIZATION_ORDERING_POLICY or 
QUEUE_PRIORITY_UTILIZATION_ORDERING_POLICY, it requires code change here to add 
a keyword for this policy.

It'd be easier if the admin could configure a class name here instead of 
requiring a keyword.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9762) Add submission context label to audit logs

2019-09-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925984#comment-16925984
 ] 

Jonathan Hung commented on YARN-9762:
-

Thanks for the patch [~mkumar1984], I see there was a bunch of refactoring that 
was done (i.e. using a RMAuditLog builder you created), I think we should defer 
this to a different patch. Can we just address adding the label to audit logs 
and do this refactoring in another JIRA? 

> Add submission context label to audit logs
> --
>
> Key: YARN-9762
> URL: https://issues.apache.org/jira/browse/YARN-9762
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Manoj Kumar
>Priority: Major
>  Labels: release-blocker
> Attachments: YARN-9762.01.patch
>
>
> Currently we log NODELABEL in container allocation/release audit logs, we 
> should also log NODELABEL of application submission context on app submission.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9822) TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down.

2019-09-09 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9822:

Attachment: YARN-9822-002.patch

> TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down.
> -
>
> Key: YARN-9822
> URL: https://issues.apache.org/jira/browse/YARN-9822
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9822-001.patch, YARN-9822-002.patch
>
>
> TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down. 
> YARN-9374 prevents the threads getting blocked when it has already identified 
> that Hbase down before accessing Hbase. TimelineCollector can check if the 
> Writer Backend is up or down before locking the writer.
> {code}
>   synchronized (writer) {
>   response = writeTimelineEntities(entities, callerUgi);
>   flushBufferedTimelineEntities();
> }
> {code}
> {code}
> "qtp183259297-80" #80 daemon prio=5 os_prio=0 tid=0x7f5f567fd000 
> nid=0x5fbb waiting for monitor entry [0x7f5f236d4000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:164)
>   - waiting to lock <0x0006c7c05770> (a 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService.putEntities(TimelineCollectorWebService.java:186)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1624)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> 

[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925922#comment-16925922
 ] 

Abhishek Modi commented on YARN-9819:
-

Thanks [~elgoiri] for review. 

Attached v3 patch with javadocs for all public functions. 

Private functions introduced in TestOpportunisticContainerAllocatorAMService 
are one liner and quite self explanatory. Please let me know if you think we 
need documentation there too.

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch, 
> YARN-9819.003.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-09 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9819:

Attachment: YARN-9819.003.patch

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch, 
> YARN-9819.003.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9822) TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down.

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925901#comment-16925901
 ] 

Hadoop QA commented on YARN-9822:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
46s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  6s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
31s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
48s{color} | {color:green} hadoop-yarn-server-timelineservice-documentstore in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9822 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979865/YARN-9822-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0b015702443e 4.15.0-54-generic 

[jira] [Commented] (YARN-9728)  ResourceManager REST API can produce an illegal xml response

2019-09-09 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925899#comment-16925899
 ] 

Eric Yang commented on YARN-9728:
-

[~Prabhu Joseph] Thank you for the patch.

+1 for patch 007.  

[~tde] any comment before we bring this to closure?

>  ResourceManager REST API can produce an illegal xml response
> -
>
> Key: YARN-9728
> URL: https://issues.apache.org/jira/browse/YARN-9728
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api, resourcemanager
>Affects Versions: 2.7.3
>Reporter: Thomas
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: IllegalResponseChrome.png, YARN-9728-001.patch, 
> YARN-9728-002.patch, YARN-9728-003.patch, YARN-9728-004.patch, 
> YARN-9728-005.patch, YARN-9728-006.patch, YARN-9728-007.patch
>
>
> When a spark job throws an exception with a message containing a character 
> out of the range supported by xml 1.0, then
>  the application fails and the stack trace will be stored into the 
> {{diagnostics}} field. So far, so good.
> But the issue occurred when we try to get application information with the 
> ResourceManager REST API
>  The xml response will contain the illegal xml 1.0 char and will be invalid.
>  *+Examples of illegals characters in xml 1.0 :+* 
>  * {{\u}}
>  * {{\u0001}}
>  * {{\u0002}}
>  * {{\u0003}}
>  * {{\u0004}}
> _For more information about supported characters :_
>  [https://www.w3.org/TR/xml/#charsets]
> *+Example of illegal response from the Ressource Manager API :+* 
> {code:xml}
> 
> 
>   application_1326821518301_0005
>   user1
>   job
>   a1
>   FINISHED
>   FAILED
>   100.0
>   History
>   
> http://host.domain.com:8088/proxy/application_1326821518301_0005/jobhistory/job/job_1326821518301_5_5
>   Exception in thread "main" java.lang.Exception: \u0001
>   at com..main(JobWithSpecialCharMain.java:6)
>   [...]
> 
> {code}
>  
> *+Example of job to reproduce :+*
> {code:java}
> public class JobWithSpecialCharMain {
>  public static void main(String[] args) throws Exception {
>   throw new Exception("\u0001");
>  }
> }
> {code}
> {code:bash}
> javac -d . JobWithSpecialCharMain.java
> jar cvf repro.jar com/
> spark-submit --class com.JobWithSpecialCharMain --master yarn-cluster 
> repro.jar
> {code}
> !IllegalResponseChrome.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925892#comment-16925892
 ] 

Íñigo Goiri commented on YARN-9819:
---

Let's add javadocs to the new methods (and maybe the one we touched).
I'd like to track the new {{OpportunisticContainersStatus}}.
The private methods in {{TestOpportunisticContainerAllocatorAMService}} could 
also benefit from documentation.

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9822) TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down.

2019-09-09 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9822:

Attachment: YARN-9822-001.patch

> TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down.
> -
>
> Key: YARN-9822
> URL: https://issues.apache.org/jira/browse/YARN-9822
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9822-001.patch
>
>
> TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down. 
> YARN-9374 prevents the threads getting blocked when it has already identified 
> that Hbase down before accessing Hbase. TimelineCollector can check if the 
> Writer Backend is up or down before locking the writer.
> {code}
>   synchronized (writer) {
>   response = writeTimelineEntities(entities, callerUgi);
>   flushBufferedTimelineEntities();
> }
> {code}
> {code}
> "qtp183259297-80" #80 daemon prio=5 os_prio=0 tid=0x7f5f567fd000 
> nid=0x5fbb waiting for monitor entry [0x7f5f236d4000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:164)
>   - waiting to lock <0x0006c7c05770> (a 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService.putEntities(TimelineCollectorWebService.java:186)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1624)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> 

[jira] [Commented] (YARN-9805) Fine-grained SchedulerNode synchronization

2019-09-09 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925732#comment-16925732
 ] 

Ahmed Hussein commented on YARN-9805:
-

[~Jim_Brennan] Can you please take a look at the changes?

> Fine-grained SchedulerNode synchronization
> --
>
> Key: YARN-9805
> URL: https://issues.apache.org/jira/browse/YARN-9805
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9805.001.patch, YARN-9805.002.patch, 
> YARN-9805.003.patch
>
>
> Yarn schedulerNode and RMNode are using synchronized methods on reading and 
> updating the resources.
> Instead, use read-write reentrant locks to provide fine-grained locking and 
> to avoid blocking concurrent reads.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9815) ReservationACLsTestBase fails with NPE

2019-09-09 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925729#comment-16925729
 ] 

Ahmed Hussein commented on YARN-9815:
-

[~Jim_Brennan] I checked the failing test case. It is not related to my changes 
and I checked that the test case passes successfully.

> ReservationACLsTestBase fails with NPE
> --
>
> Key: YARN-9815
> URL: https://issues.apache.org/jira/browse/YARN-9815
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9805.001.patch, YARN-9815.001.patch
>
>
> Running ReservationACLsTestBase throws a NPE running the FairScheduler. Old 
> revisions back in 2016 also throw NPE.
> In the test case, QueueC does not have reserveACLs, so 
> ReservationsACLsManager would throw NPE when it tries to access the ACL on 
> line 82.
> I still could not find what was the first revision that caused this test case 
> to fail. I stopped at bbfaf3c2712c9ba82b0f8423bdeb314bf505a692 which was 
> working fine.
> I have OsX with java 1.8.0_201
>  
> {code:java}
> [ERROR] 
> testApplicationACLs[1](org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase)
>   Time elapsed: 1.897 s  <<< ERROR![ERROR] 
> testApplicationACLs[1](org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase)
>   Time elapsed: 1.897 s  <<< 
> ERROR!java.lang.NullPointerException:java.lang.NullPointerException at 
> org.apache.hadoop.yarn.server.resourcemanager.security.ReservationsACLsManager.checkAccess(ReservationsACLsManager.java:83)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.checkReservationACLs(ClientRMService.java:1527)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1290)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:511)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:645)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1001) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:929) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2921)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitReservation(ApplicationClientProtocolPBClientImpl.java:511)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase.submitReservation(ReservationACLsTestBase.java:447)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase.verifySubmitReservationSuccess(ReservationACLsTestBase.java:247)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ReservationACLsTestBase.testApplicationACLs(ReservationACLsTestBase.java:125)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) 

[jira] [Commented] (YARN-9814) JobHistoryServer can't delete aggregated files, if remote app root directory is created by NodeManager

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925728#comment-16925728
 ] 

Hadoop QA commented on YARN-9814:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m  
4s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
8s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
43s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
58s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9814 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979850/YARN-9814.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 7b90213529c3 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 60af879 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 

[jira] [Comment Edited] (YARN-9823) NodeManager cannot get right ResourceTrack address in Federation mode

2019-09-09 Thread chaoli (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925714#comment-16925714
 ] 

chaoli edited comment on YARN-9823 at 9/9/19 2:01 PM:
--

This is bug.  because RM has not deliver resource tracker address in federation 
heartbeat service

there is also no process code in 
*FederationRMFailoverProxyProvider.updateRMAddress*
{code:java}
// private void updateRMAddress(SubClusterInfo subClusterInfo) {
  if (subClusterInfo != null) {
if (protocol == ApplicationClientProtocol.class) {
  conf.set(YarnConfiguration.RM_ADDRESS,
  subClusterInfo.getClientRMServiceAddress());
} else if (protocol == ApplicationMasterProtocol.class) {
  conf.set(YarnConfiguration.RM_SCHEDULER_ADDRESS,
  subClusterInfo.getAMRMServiceAddress());
} else if (protocol == ResourceManagerAdministrationProtocol.class) {
  conf.set(YarnConfiguration.RM_ADMIN_ADDRESS,
  subClusterInfo.getRMAdminServiceAddress());
}
  }
}
{code}


was (Author: lichaojacobs):
This is bug.  because RM has not deliver resource tracker address in federation 
heartbeat service

there is also no process code in 
*FederationRMFailoverProxyProvider.updateRMAddress*

!image-2019-09-09-21-58-09-017.png!

> NodeManager cannot get right ResourceTrack address in Federation mode
> -
>
> Key: YARN-9823
> URL: https://issues.apache.org/jira/browse/YARN-9823
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, nodemanager
>Affects Versions: 2.9.2
> Environment: h2. Hadoop:
> Hadoop 2.9.2 (some line number may not be right because we have merged some 
> 3.0+ patch)
> Security with Kerberos
> configure from 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/Federation.html]
> h2. Java:
> Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
> Kerberos:
>  
>  
>Reporter: qiwei huang
>Priority: Major
>
> {{the NM will infinitely try to connect the wrong RM's resource tracker port}}
> {quote}{{INFO [main:RetryInvocationHandler@411] - java.net.ConnectException: 
> Call From standby.rm.server/10.122.138.139 to }}{{standby.rm.server}}{{:8031 
> failed on connection exception: java.net.ConnectException: Connection 
> refused; For more details see: 
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ResourceTrackerPBClientImpl.registerNodeManager over dev1 after 19 failover 
> attempts. Trying to failover after sleeping for 40497ms.}}
> {quote}
>  
> {{After change *yarn.client.failover-proxy-provider* to 
> *org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider*,
>  the ** NodeManager cannot find the right ResourceTracker address:}}
> {quote}{{getRMHAId:233, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getConfKeyForRMInstance:294, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getConfValueForRMInstance:302, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getConfValueForRMInstance:314, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getSocketAddr:3341, YarnConfiguration (org.apache.hadoop.yarn.conf)}}
> {{getRMAddress:77, ServerRMProxy (org.apache.hadoop.yarn.server.api)}}
> {{run:144, FederationRMFailoverProxyProvider$1 
> (org.apache.hadoop.yarn.server.federation.failover)}}
> {{doPrivileged:-1, AccessController (java.security)}}
> {{doAs:422, Subject (javax.security.auth)}}
> {{doAs:1893, UserGroupInformation (org.apache.hadoop.security)}}
> {{getProxyInternal:141, FederationRMFailoverProxyProvider 
> (org.apache.hadoop.yarn.server.federation.failover)}}
> {{performFailover:192, FederationRMFailoverProxyProvider 
> (org.apache.hadoop.yarn.server.federation.failover)}}
> {{failover:217, RetryInvocationHandler$ProxyDescriptor 
> (org.apache.hadoop.io.retry)}}
> {{processRetryInfo:149, RetryInvocationHandler$Call 
> (org.apache.hadoop.io.retry)}}
> {{processWaitTimeAndRetryInfo:142, RetryInvocationHandler$Call 
> (org.apache.hadoop.io.retry)}}
> {{invokeOnce:107, RetryInvocationHandler$Call (org.apache.hadoop.io.retry)}}
> {{invoke:359, RetryInvocationHandler (org.apache.hadoop.io.retry)}}
> {{registerNodeManager:-1, $Proxy85 (com.sun.proxy)}}
> {{registerWithRM:378, NodeStatusUpdaterImpl 
> (org.apache.hadoop.yarn.server.nodemanager)}}
> {{serviceStart:252, NodeStatusUpdaterImpl 
> (org.apache.hadoop.yarn.server.nodemanager)}}
> {{start:194, AbstractService (org.apache.hadoop.service)}}
> {{serviceStart:121, CompositeService (org.apache.hadoop.service)}}
> {{start:194, AbstractService (org.apache.hadoop.service)}}
> {{initAndStartNodeManager:864, NodeManager 
> (org.apache.hadoop.yarn.server.nodemanager)}}
> {{main:931, NodeManager (org.apache.hadoop.yarn.server.nodemanager)}}
> {quote}
> {{the Provider will try to 

[jira] [Commented] (YARN-9823) NodeManager cannot get right ResourceTrack address in Federation mode

2019-09-09 Thread chaoli (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925714#comment-16925714
 ] 

chaoli commented on YARN-9823:
--

This is bug.  because RM has not deliver resource tracker address in federation 
heartbeat service

there is also no process code in 
*FederationRMFailoverProxyProvider.updateRMAddress*

!image-2019-09-09-21-58-09-017.png!

> NodeManager cannot get right ResourceTrack address in Federation mode
> -
>
> Key: YARN-9823
> URL: https://issues.apache.org/jira/browse/YARN-9823
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, nodemanager
>Affects Versions: 2.9.2
> Environment: h2. Hadoop:
> Hadoop 2.9.2 (some line number may not be right because we have merged some 
> 3.0+ patch)
> Security with Kerberos
> configure from 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/Federation.html]
> h2. Java:
> Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
> Kerberos:
>  
>  
>Reporter: qiwei huang
>Priority: Major
>
> {{the NM will infinitely try to connect the wrong RM's resource tracker port}}
> {quote}{{INFO [main:RetryInvocationHandler@411] - java.net.ConnectException: 
> Call From standby.rm.server/10.122.138.139 to }}{{standby.rm.server}}{{:8031 
> failed on connection exception: java.net.ConnectException: Connection 
> refused; For more details see: 
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ResourceTrackerPBClientImpl.registerNodeManager over dev1 after 19 failover 
> attempts. Trying to failover after sleeping for 40497ms.}}
> {quote}
>  
> {{After change *yarn.client.failover-proxy-provider* to 
> *org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider*,
>  the ** NodeManager cannot find the right ResourceTracker address:}}
> {quote}{{getRMHAId:233, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getConfKeyForRMInstance:294, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getConfValueForRMInstance:302, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getConfValueForRMInstance:314, HAUtil (org.apache.hadoop.yarn.conf)}}
> {{getSocketAddr:3341, YarnConfiguration (org.apache.hadoop.yarn.conf)}}
> {{getRMAddress:77, ServerRMProxy (org.apache.hadoop.yarn.server.api)}}
> {{run:144, FederationRMFailoverProxyProvider$1 
> (org.apache.hadoop.yarn.server.federation.failover)}}
> {{doPrivileged:-1, AccessController (java.security)}}
> {{doAs:422, Subject (javax.security.auth)}}
> {{doAs:1893, UserGroupInformation (org.apache.hadoop.security)}}
> {{getProxyInternal:141, FederationRMFailoverProxyProvider 
> (org.apache.hadoop.yarn.server.federation.failover)}}
> {{performFailover:192, FederationRMFailoverProxyProvider 
> (org.apache.hadoop.yarn.server.federation.failover)}}
> {{failover:217, RetryInvocationHandler$ProxyDescriptor 
> (org.apache.hadoop.io.retry)}}
> {{processRetryInfo:149, RetryInvocationHandler$Call 
> (org.apache.hadoop.io.retry)}}
> {{processWaitTimeAndRetryInfo:142, RetryInvocationHandler$Call 
> (org.apache.hadoop.io.retry)}}
> {{invokeOnce:107, RetryInvocationHandler$Call (org.apache.hadoop.io.retry)}}
> {{invoke:359, RetryInvocationHandler (org.apache.hadoop.io.retry)}}
> {{registerNodeManager:-1, $Proxy85 (com.sun.proxy)}}
> {{registerWithRM:378, NodeStatusUpdaterImpl 
> (org.apache.hadoop.yarn.server.nodemanager)}}
> {{serviceStart:252, NodeStatusUpdaterImpl 
> (org.apache.hadoop.yarn.server.nodemanager)}}
> {{start:194, AbstractService (org.apache.hadoop.service)}}
> {{serviceStart:121, CompositeService (org.apache.hadoop.service)}}
> {{start:194, AbstractService (org.apache.hadoop.service)}}
> {{initAndStartNodeManager:864, NodeManager 
> (org.apache.hadoop.yarn.server.nodemanager)}}
> {{main:931, NodeManager (org.apache.hadoop.yarn.server.nodemanager)}}
> {quote}
> {{the Provider will try to find the main RM address on }}*{{getRMHAId:233,}}* 
> {{but it cannot find the right address because it can just return the local 
> Address: }}{{}}
> {quote}{{if (!s.isUnresolved() && NetUtils.isLocalAddress(s.getAddress())) {}}
> {{ currentRMId = rmId.trim();}}
> {{ found++;}}
> {{}}}
> {quote}
> {{If the NM and RM is on the same node, and the this RM is in standby 
> situation, the NM will }}{{infinitely}}{{ call RPC to RM}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9814) JobHistoryServer can't delete aggregated files, if remote app root directory is created by NodeManager

2019-09-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925626#comment-16925626
 ] 

Adam Antal commented on YARN-9814:
--

Valid Jenkins error: forgot to add license header. 

Fixed in [^YARN-9814.003.patch].

> JobHistoryServer can't delete aggregated files, if remote app root directory 
> is created by NodeManager
> --
>
> Key: YARN-9814
> URL: https://issues.apache.org/jira/browse/YARN-9814
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, yarn
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9814.001.patch, YARN-9814.002.patch, 
> YARN-9814.003.patch
>
>
> If remote-app-log-dir is not created before starting Yarn processes, the 
> NodeManager creates it during the init of AppLogAggregator service. In a 
> custom system the primary group of the yarn user (which starts the NM/RM 
> daemons) is not hadoop, but set to a more restricted group (say yarn). If 
> NodeManager creates the folder it derives the group of the folder from the 
> primary group of the login user (which is yarn:yarn in this case), thus 
> setting the root log folder and all its subfolders to yarn group, ultimately 
> making it unaccessible to other processes - e.g. the JobHistoryServer's 
> AggregatedLogDeletionService.
> I suggest to make this group configurable. If this new configuration is not 
> set then we can still stick to the existing behaviour. 
> Creating the root app-log-dir each time during the setup of this system is a 
> bit error prone, and an end user can easily forget it. I think the best to 
> put this step is the LogAggregationService, which was responsible for 
> creating the folder already.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9814) JobHistoryServer can't delete aggregated files, if remote app root directory is created by NodeManager

2019-09-09 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9814:
-
Attachment: YARN-9814.003.patch

> JobHistoryServer can't delete aggregated files, if remote app root directory 
> is created by NodeManager
> --
>
> Key: YARN-9814
> URL: https://issues.apache.org/jira/browse/YARN-9814
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, yarn
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9814.001.patch, YARN-9814.002.patch, 
> YARN-9814.003.patch
>
>
> If remote-app-log-dir is not created before starting Yarn processes, the 
> NodeManager creates it during the init of AppLogAggregator service. In a 
> custom system the primary group of the yarn user (which starts the NM/RM 
> daemons) is not hadoop, but set to a more restricted group (say yarn). If 
> NodeManager creates the folder it derives the group of the folder from the 
> primary group of the login user (which is yarn:yarn in this case), thus 
> setting the root log folder and all its subfolders to yarn group, ultimately 
> making it unaccessible to other processes - e.g. the JobHistoryServer's 
> AggregatedLogDeletionService.
> I suggest to make this group configurable. If this new configuration is not 
> set then we can still stick to the existing behaviour. 
> Creating the root app-log-dir each time during the setup of this system is a 
> bit error prone, and an end user can easily forget it. I think the best to 
> put this step is the LogAggregationService, which was responsible for 
> creating the folder already.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9814) JobHistoryServer can't delete aggregated files, if remote app root directory is created by NodeManager

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925570#comment-16925570
 ] 

Hadoop QA commented on YARN-9814:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
50s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
39s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9814 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979826/YARN-9814.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux e0dfb1660e93 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 60af879 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 

[jira] [Updated] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted

2019-09-09 Thread Rohith Sharma K S (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-9820:

Fix Version/s: (was: 3.2.2)
   3.2.1

> RM logs InvalidStateTransitionException when app is submitted
> -
>
> Key: YARN-9820
> URL: https://issues.apache.org/jira/browse/YARN-9820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Critical
> Fix For: 2.10.0, 3.3.0, 3.2.1, 3.1.4
>
> Attachments: YARN-9820-001.patch, YARN-9820-002.patch, 
> YARN-9820-003.patch
>
>
> It is observed that RM logs InvalidStateTransitionException. Not sure what is 
> the impact but its better to handle it. 
> {noformat}
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED 
> on event = LAUNCHED
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the 
> launch time for applicationId: application_1567926390667_0001, attemptId: 
> appattempt_1567926390667_0001_01launchTime: 1567926646327
> 2019-09-08 12:40:46,328 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating 
> info for app: application_1567926390667_0001
> 2019-09-08 12:40:46,332 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: 
> application_1567926390667_0001 can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> APP_UPDATE_SAVED at ACCEPTED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted

2019-09-09 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925567#comment-16925567
 ] 

Rohith Sharma K S commented on YARN-9820:
-

I back ported to 3.2.1 and updated fix version. 

> RM logs InvalidStateTransitionException when app is submitted
> -
>
> Key: YARN-9820
> URL: https://issues.apache.org/jira/browse/YARN-9820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Critical
> Fix For: 2.10.0, 3.3.0, 3.2.1, 3.1.4
>
> Attachments: YARN-9820-001.patch, YARN-9820-002.patch, 
> YARN-9820-003.patch
>
>
> It is observed that RM logs InvalidStateTransitionException. Not sure what is 
> the impact but its better to handle it. 
> {noformat}
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED 
> on event = LAUNCHED
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the 
> launch time for applicationId: application_1567926390667_0001, attemptId: 
> appattempt_1567926390667_0001_01launchTime: 1567926646327
> 2019-09-08 12:40:46,328 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating 
> info for app: application_1567926390667_0001
> 2019-09-08 12:40:46,332 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: 
> application_1567926390667_0001 can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> APP_UPDATE_SAVED at ACCEPTED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9823) NodeManager cannot get right ResourceTrack address in Federation mode

2019-09-09 Thread qiwei huang (Jira)
qiwei huang created YARN-9823:
-

 Summary: NodeManager cannot get right ResourceTrack address in 
Federation mode
 Key: YARN-9823
 URL: https://issues.apache.org/jira/browse/YARN-9823
 Project: Hadoop YARN
  Issue Type: Bug
  Components: federation, nodemanager
Affects Versions: 2.9.2
 Environment: h2. Hadoop:

Hadoop 2.9.2 (some line number may not be right because we have merged some 
3.0+ patch)

Security with Kerberos

configure from 
[https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/Federation.html]
h2. Java:

Java(TM) SE Runtime Environment (build 1.8.0_77-b03)

Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)

Kerberos:

 

 
Reporter: qiwei huang


{{the NM will infinitely try to connect the wrong RM's resource tracker port}}
{quote}{{INFO [main:RetryInvocationHandler@411] - java.net.ConnectException: 
Call From standby.rm.server/10.122.138.139 to }}{{standby.rm.server}}{{:8031 
failed on connection exception: java.net.ConnectException: Connection refused; 
For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while 
invoking ResourceTrackerPBClientImpl.registerNodeManager over dev1 after 19 
failover attempts. Trying to failover after sleeping for 40497ms.}}
{quote}
 

{{After change *yarn.client.failover-proxy-provider* to 
*org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider*,
 the ** NodeManager cannot find the right ResourceTracker address:}}
{quote}{{getRMHAId:233, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getConfKeyForRMInstance:294, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getConfValueForRMInstance:302, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getConfValueForRMInstance:314, HAUtil (org.apache.hadoop.yarn.conf)}}
{{getSocketAddr:3341, YarnConfiguration (org.apache.hadoop.yarn.conf)}}
{{getRMAddress:77, ServerRMProxy (org.apache.hadoop.yarn.server.api)}}
{{run:144, FederationRMFailoverProxyProvider$1 
(org.apache.hadoop.yarn.server.federation.failover)}}
{{doPrivileged:-1, AccessController (java.security)}}
{{doAs:422, Subject (javax.security.auth)}}
{{doAs:1893, UserGroupInformation (org.apache.hadoop.security)}}
{{getProxyInternal:141, FederationRMFailoverProxyProvider 
(org.apache.hadoop.yarn.server.federation.failover)}}
{{performFailover:192, FederationRMFailoverProxyProvider 
(org.apache.hadoop.yarn.server.federation.failover)}}
{{failover:217, RetryInvocationHandler$ProxyDescriptor 
(org.apache.hadoop.io.retry)}}
{{processRetryInfo:149, RetryInvocationHandler$Call 
(org.apache.hadoop.io.retry)}}
{{processWaitTimeAndRetryInfo:142, RetryInvocationHandler$Call 
(org.apache.hadoop.io.retry)}}
{{invokeOnce:107, RetryInvocationHandler$Call (org.apache.hadoop.io.retry)}}
{{invoke:359, RetryInvocationHandler (org.apache.hadoop.io.retry)}}
{{registerNodeManager:-1, $Proxy85 (com.sun.proxy)}}
{{registerWithRM:378, NodeStatusUpdaterImpl 
(org.apache.hadoop.yarn.server.nodemanager)}}
{{serviceStart:252, NodeStatusUpdaterImpl 
(org.apache.hadoop.yarn.server.nodemanager)}}
{{start:194, AbstractService (org.apache.hadoop.service)}}
{{serviceStart:121, CompositeService (org.apache.hadoop.service)}}
{{start:194, AbstractService (org.apache.hadoop.service)}}
{{initAndStartNodeManager:864, NodeManager 
(org.apache.hadoop.yarn.server.nodemanager)}}
{{main:931, NodeManager (org.apache.hadoop.yarn.server.nodemanager)}}
{quote}
{{the Provider will try to find the main RM address on }}*{{getRMHAId:233,}}* 
{{but it cannot find the right address because it can just return the local 
Address: }}{{}}
{quote}{{if (!s.isUnresolved() && NetUtils.isLocalAddress(s.getAddress())) {}}
{{ currentRMId = rmId.trim();}}
{{ found++;}}
{{}}}
{quote}
{{If the NM and RM is on the same node, and the this RM is in standby 
situation, the NM will }}{{infinitely}}{{ call RPC to RM}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925490#comment-16925490
 ] 

Abhishek Modi commented on YARN-9821:
-

Sure [~rohithsharma]. I am leaving this Jira as unresolved and you can mark it 
as resolved after you backport it to 3.2 branches. Thanks.

> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch, YARN-9821-002.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258)
>   - locked <0x000784ee8220> (a 
> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
>   at 
> 

[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-09 Thread Rohith Sharma K S (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925486#comment-16925486
 ] 

Rohith Sharma K S commented on YARN-9821:
-

Only for branch-3.2. May be I can only back port it since it required for 
branch-3.2.1 as well.

> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch, YARN-9821-002.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258)
>   - locked <0x000784ee8220> (a 
> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
>   at 
> 

[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925484#comment-16925484
 ] 

Abhishek Modi commented on YARN-9816:
-

Thanks [~Prabhu Joseph]. changes looks good to me. will commit shortly.

> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
> ---
>
> Key: YARN-9816
> URL: https://issues.apache.org/jira/browse/YARN-9816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9816-001.patch
>
>
> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError.  
> This happens when a file is present under /ats/active.
> {code}
> [hdfs@node2 yarn]$ hadoop fs -ls /ats/active
> Found 1 items
> -rw-r--r--   3 hdfs hadoop  0 2019-09-06 16:34 
> /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0
> {code}
> Error Message:
> {code:java}
> java.lang.StackOverflowError
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
> at com.sun.proxy.$Proxy15.getListing(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
>  {code}
> One of our user has tried to distcp hdfs://ats/active dir. Distcp job has 
> created the 
> temp file 

[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925483#comment-16925483
 ] 

Hudson commented on YARN-9821:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17260 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17260/])
YARN-9821. NM hangs at serviceStop when ATSV2 Backend Hbase is Down. (abmodi: 
rev 60af8793b45b4057101a22e4248d7ca022b52d79)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/HBaseTimelineWriterImpl.java


> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch, YARN-9821-002.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at 

[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925481#comment-16925481
 ] 

Abhishek Modi commented on YARN-9821:
-

Thanks [~Prabhu Joseph] for the patch and [~rohithsharma] for additional 
review. I have committed it to trunk.

[~rohithsharma] should we commit it to 3.2 and 3.1 branch also?

> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch, YARN-9821-002.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258)
>   - locked <0x000784ee8220> (a 
> 

[jira] [Updated] (YARN-9814) JobHistoryServer can't delete aggregated files, if remote app root directory is created by NodeManager

2019-09-09 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9814:
-
Attachment: YARN-9814.002.patch

> JobHistoryServer can't delete aggregated files, if remote app root directory 
> is created by NodeManager
> --
>
> Key: YARN-9814
> URL: https://issues.apache.org/jira/browse/YARN-9814
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, yarn
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9814.001.patch, YARN-9814.002.patch
>
>
> If remote-app-log-dir is not created before starting Yarn processes, the 
> NodeManager creates it during the init of AppLogAggregator service. In a 
> custom system the primary group of the yarn user (which starts the NM/RM 
> daemons) is not hadoop, but set to a more restricted group (say yarn). If 
> NodeManager creates the folder it derives the group of the folder from the 
> primary group of the login user (which is yarn:yarn in this case), thus 
> setting the root log folder and all its subfolders to yarn group, ultimately 
> making it unaccessible to other processes - e.g. the JobHistoryServer's 
> AggregatedLogDeletionService.
> I suggest to make this group configurable. If this new configuration is not 
> set then we can still stick to the existing behaviour. 
> Creating the root app-log-dir each time during the setup of this system is a 
> bit error prone, and an end user can easily forget it. I think the best to 
> put this step is the LogAggregationService, which was responsible for 
> creating the folder already.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9512) [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLC

2019-09-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925450#comment-16925450
 ] 

Adam Antal commented on YARN-9512:
--

Note that related failure has been fixed in YARN-9813. We can hopefully use a 
similar approach here.

> [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath ClassCastException: 
> class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
> java.net.URLClassLoader
> ---
>
> Key: YARN-9512
> URL: https://issues.apache.org/jira/browse/YARN-9512
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Siyao Meng
>Assignee: Szilard Nemeth
>Priority: Major
>
> Found in maven JDK 11 unit test run. Compiled on JDK 8:
> {code}
> [ERROR] 
> testCustomizedAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices)
>   Time elapsed: 0.019 s  <<< ERROR!java.lang.ClassCastException: class 
> jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
> java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
> java.net.URLClassLoader are in module java.base of loader 'bootstrap')
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices$ServiceC.getMetaData(TestAuxServices.java:197)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStart(AuxServices.java:315)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testCustomizedAuxServiceClassPath(TestAuxServices.java:344)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9813) RM does not start on JDK11 when UIv2 is enabled

2019-09-09 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925449#comment-16925449
 ] 

Adam Antal commented on YARN-9813:
--

You were right [~eyang]. Thanks for the correction.

Thanks for the commit [~leftnoteasy].

> RM does not start on JDK11 when UIv2 is enabled
> ---
>
> Key: YARN-9813
> URL: https://issues.apache.org/jira/browse/YARN-9813
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, yarn
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Critical
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: YARN-9813.001.patch, YARN-9813.002.patch, 
> YARN-9813.003.patch
>
>
> Starting a ResourceManager on JDK11 with UIv2 is enabled, RM startup fails 
> with the following message:
> {noformat}
> Error starting ResourceManager
> java.lang.ClassCastException: class 
> jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
> java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
> java.net.URLClassLoader are in module java.base of loader 'bootstrap')
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:1190)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1333)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1531)
> {noformat}
> It is a known issue that the systemClassLoader is not URLClassLoader anymore 
> from JDK9 (see related UT failure: YARN-9512). 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted

2019-09-09 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925430#comment-16925430
 ] 

Prabhu Joseph commented on YARN-9820:
-

Thanks [~jhung].

> RM logs InvalidStateTransitionException when app is submitted
> -
>
> Key: YARN-9820
> URL: https://issues.apache.org/jira/browse/YARN-9820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Critical
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9820-001.patch, YARN-9820-002.patch, 
> YARN-9820-003.patch
>
>
> It is observed that RM logs InvalidStateTransitionException. Not sure what is 
> the impact but its better to handle it. 
> {noformat}
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED 
> on event = LAUNCHED
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the 
> launch time for applicationId: application_1567926390667_0001, attemptId: 
> appattempt_1567926390667_0001_01launchTime: 1567926646327
> 2019-09-08 12:40:46,328 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating 
> info for app: application_1567926390667_0001
> 2019-09-08 12:40:46,332 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: 
> application_1567926390667_0001 can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> APP_UPDATE_SAVED at ACCEPTED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted

2019-09-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925429#comment-16925429
 ] 

Hudson commented on YARN-9820:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17258 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17258/])
YARN-9820. RM logs InvalidStateTransitionException when app is (jhung: rev 
387c332b64e4b785e383162c9d6a3613aca4ac5c)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


> RM logs InvalidStateTransitionException when app is submitted
> -
>
> Key: YARN-9820
> URL: https://issues.apache.org/jira/browse/YARN-9820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Critical
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9820-001.patch, YARN-9820-002.patch, 
> YARN-9820-003.patch
>
>
> It is observed that RM logs InvalidStateTransitionException. Not sure what is 
> the impact but its better to handle it. 
> {noformat}
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED 
> on event = LAUNCHED
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the 
> launch time for applicationId: application_1567926390667_0001, attemptId: 
> appattempt_1567926390667_0001_01launchTime: 1567926646327
> 2019-09-08 12:40:46,328 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating 
> info for app: application_1567926390667_0001
> 2019-09-08 12:40:46,332 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: 
> application_1567926390667_0001 can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> APP_UPDATE_SAVED at ACCEPTED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted

2019-09-09 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925422#comment-16925422
 ] 

Prabhu Joseph commented on YARN-9820:
-

Test case failure is not related to this patch. This will be fixed by YARN-7721.

> RM logs InvalidStateTransitionException when app is submitted
> -
>
> Key: YARN-9820
> URL: https://issues.apache.org/jira/browse/YARN-9820
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: YARN-9820-001.patch, YARN-9820-002.patch, 
> YARN-9820-003.patch
>
>
> It is observed that RM logs InvalidStateTransitionException. Not sure what is 
> the impact but its better to handle it. 
> {noformat}
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED 
> on event = LAUNCHED
> 2019-09-08 12:40:46,327 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the 
> launch time for applicationId: application_1567926390667_0001, attemptId: 
> appattempt_1567926390667_0001_01launchTime: 1567926646327
> 2019-09-08 12:40:46,328 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating 
> info for app: application_1567926390667_0001
> 2019-09-08 12:40:46,332 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: 
> application_1567926390667_0001 can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> APP_UPDATE_SAVED at ACCEPTED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7721) TestContinuousScheduling fails sporadically with NPE

2019-09-09 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925421#comment-16925421
 ] 

Prabhu Joseph commented on YARN-7721:
-

[~snemeth] Can you review this Jira when you get time. Have observed this issue 
in YARN-9820. Thanks.

> TestContinuousScheduling fails sporadically with NPE
> 
>
> Key: YARN-7721
> URL: https://issues.apache.org/jira/browse/YARN-7721
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Jason Lowe
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7721.001.patch
>
>
> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime is 
> failing sporadically with an NPE in precommit builds, and I can usually 
> reproduce it locally after a few tries:
> {noformat}
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.085 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:383)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> [...]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted

2019-09-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925414#comment-16925414
 ] 

Hadoop QA commented on YARN-9820:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 40s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9820 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979813/YARN-9820-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d5975784ac42 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3b9584d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24774/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24774/testReport/ |
| Max. process+thread count | 823 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: