[jira] [Commented] (YARN-10909) AbstractCSQueue: Check for methods added for test code but not annotated with VisibleForTesting

2021-09-08 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412303#comment-17412303
 ] 

Tao Yang commented on YARN-10909:
-

Hi, [~jackwangcs]. VisibleForTesting annotation can be used for the methods be 
called only in Test scope.

For the PR, I can see AbstractCSQueue#hasChildQueues and 
AbstractCSQueue#getLastSubmittedTimestamp are called in both Product and Test 
scopes, VisibleForTesting annotation is not fit for them, please take a look.

> AbstractCSQueue: Check for methods added for test code but not annotated with 
> VisibleForTesting
> ---
>
> Key: YARN-10909
> URL: https://issues.apache.org/jira/browse/YARN-10909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: jackwangcs
>Priority: Minor
>  Labels: newbie, pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For example, AbstractCSQueue#setMaxCapacity(float) is only used for testing, 
> but not annotated. There can be other methods in this class like this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10903) Too many "Failed to accept allocation proposal" because of wrong Headroom check for DRF

2021-09-08 Thread Tao Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412281#comment-17412281
 ] 

Tao Yang commented on YARN-10903:
-

Thanks [~jackwangcs] for raising this issue, which may generate invalid 
proposals to slow down the normal scheduling process. Good catch! 

The PR generally LGTM, just some minor check-style warnings need to be fixed, 
please take a look.

> Too many "Failed to accept allocation proposal" because of wrong Headroom 
> check for DRF
> ---
>
> Key: YARN-10903
> URL: https://issues.apache.org/jira/browse/YARN-10903
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: jackwangcs
>Assignee: jackwangcs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The headroom check in  `ParentQueue.canAssign` and 
> `RegularContainerAllocator#checkHeadroom` does not consider the DRF cases.
> This will cause a lot of "Failed to accept allocation proposal" when a queue 
> is near-fully used. 
> In the log:
> Headroom: memory:256, vCores:729
> Request: memory:56320, vCores:5
> clusterResource: memory:673966080, vCores:110494
> If use the DRF, then 
> {code:java}
> Resources.greaterThanOrEqual(rc, clusterResource, Resources.add(
> currentResourceLimits.getHeadroom(), resourceCouldBeUnReserved),
> required); {code}
> will be true but in fact we can not allocate resources to the request due to 
> the max limit(no enough memory).
> {code:java}
> 2021-07-21 23:49:39,012 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1626747977559_95859 
> headRoom= currentConsumption=0
> 2021-07-21 23:49:39,012 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator:
>   Request={AllocationRequestId: -1, Priority: 1, Capability:  vCores:5>, # Containers: 19, Location: *, Relax Locality: true, Execution 
> Type Request: null, Node Label Expression: prod-best-effort-node}
> .
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Try to commit allocation proposal=New 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.ResourceCommitRequest:
>  ALLOCATED=[(Application=appattempt_1626747977559_95859_01; 
> Node=:8041; Resource=)]
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
>  userLimit is fetched. userLimit=, 
> userSpecificUserLimit=, 
> schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=prod-best-effort-node
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Headroom calculation for user x:  userLimit= 
> queueMaxAvailRes= consumed= 
> partition=prod-best-effort-node
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource= exceeded maxResourceLimit of the 
> queue =
> 2021-07-21 23:49:39,013 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10893) Add metrics for getClusterMetrics and getApplications APIs in FederationClientInterceptor

2021-09-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned YARN-10893:
--

Assignee: Akshat Bordia

> Add metrics for getClusterMetrics and getApplications APIs in 
> FederationClientInterceptor
> -
>
> Key: YARN-10893
> URL: https://issues.apache.org/jira/browse/YARN-10893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Akshat Bordia
>Assignee: Akshat Bordia
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently getClusterMetrics and getApplications APIs in 
> FederationClientInterceptor do not have metrics being recorded. Need to add 
> the metrics for the latency, successful and failed attempt counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10910) AbstractCSQueue#setupQueueConfigs: Separate validation logic from initialization logic

2021-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10910:
--
Labels: pull-request-available  (was: )

> AbstractCSQueue#setupQueueConfigs: Separate validation logic from 
> initialization logic
> --
>
> Key: YARN-10910
> URL: https://issues.apache.org/jira/browse/YARN-10910
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> AbstractCSQueue#setupQueueConfigs contains initialization + validation logic. 
> The task is to factor out validation logic from this method to a separate 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412085#comment-17412085
 ] 

Gergely Pollák commented on YARN-10870:
---

The patch apply cleanly to 3.1, 3.2 and 3.3 (and yes I know 3.1 is EOL, but 
I've accidentally checked and uploaded it as well, we don't necessarily need to 
backport it there), so I've uploaded the unchanged patch for each branch.

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch, 
> YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, 
> YARN-10870.branch-3.3.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollák updated YARN-10870:
--
Attachment: YARN-10870.branch-3.3.002.patch

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch, 
> YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, 
> YARN-10870.branch-3.3.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollák updated YARN-10870:
--
Attachment: (was: YARN-10870.branch-3.2.001.patch)

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch, 
> YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollák updated YARN-10870:
--
Attachment: YARN-10870.branch-3.2.002.patch
YARN-10870.branch-3.1.002.patch

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch, 
> YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollák updated YARN-10870:
--
Attachment: YARN-10870.branch-3.2.001.patch

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch, 
> YARN-10870.branch-3.2.001.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reopened YARN-10870:
---

Reopening jira, as we discussed [~shuzirra] will upload backport patches for 
3.3 / 3.2 branches.

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412039#comment-17412039
 ] 

Szilard Nemeth commented on YARN-10870:
---

Thanks [~shuzirra] for working on this.
Latest patch LGTM, committed to trunk.


> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10870:
--
Fix Version/s: 3.4.0

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9975) Support proxy ACL user for CapacityScheduler

2021-09-08 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412023#comment-17412023
 ] 

Szilard Nemeth commented on YARN-9975:
--

[~epayne],
Just committed HADOOP-17857.
Can you confirm that one covers this jira as well? If so, can we close out this 
one?
Thanks.

> Support proxy ACL user for CapacityScheduler
> 
>
> Key: YARN-9975
> URL: https://issues.apache.org/jira/browse/YARN-9975
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> As commented in YARN-9698.
> I will open a new jira for the proxy user feature. 
> The background is that we have long running  sql thriftserver for many users:
> {quote}{{user->sql proxy-> sql thriftserver}}{quote}
> But we do not have keytab for all users on 'sql proxy'. We just use a super 
> user like 'sql_prc' to submit the 'sql thriftserver' application. To support 
> this we should change the scheduler to support proxy user acl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9048) Add znode hierarchy in Federation ZK State Store

2021-09-08 Thread jackwangcs (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jackwangcs reassigned YARN-9048:


Assignee: (was: jackwangcs)

> Add znode hierarchy in Federation ZK State Store
> 
>
> Key: YARN-9048
> URL: https://issues.apache.org/jira/browse/YARN-9048
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Priority: Major
>
> Similar to YARN-2962 consider having hierarchy in ZK federation store for 
> applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10909) AbstractCSQueue: Check for methods added for test code but not annotated with VisibleForTesting

2021-09-08 Thread jackwangcs (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jackwangcs reassigned YARN-10909:
-

Assignee: jackwangcs

> AbstractCSQueue: Check for methods added for test code but not annotated with 
> VisibleForTesting
> ---
>
> Key: YARN-10909
> URL: https://issues.apache.org/jira/browse/YARN-10909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: jackwangcs
>Priority: Minor
>  Labels: newbie, pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For example, AbstractCSQueue#setMaxCapacity(float) is only used for testing, 
> but not annotated. There can be other methods in this class like this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9975) Support proxy ACL user for CapacityScheduler

2021-09-08 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412002#comment-17412002
 ] 

Szilard Nemeth commented on YARN-9975:
--

Hey [~epayne],
Sorry for the late response.
Thanks for following up with this and the new HADOOP-17857 jira as well. 
Actually, I just came across this jira and saw that it doesn't really belonged 
to the correct umbrellal so I commented, but me or my company did not have this 
use-case in mind.
Anyways, I started to review HADOOP-17857 and we can continue the discussion 
there and on YARN-1115 after it got merged in.
Thanks.

> Support proxy ACL user for CapacityScheduler
> 
>
> Key: YARN-9975
> URL: https://issues.apache.org/jira/browse/YARN-9975
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> As commented in YARN-9698.
> I will open a new jira for the proxy user feature. 
> The background is that we have long running  sql thriftserver for many users:
> {quote}{{user->sql proxy-> sql thriftserver}}{quote}
> But we do not have keytab for all users on 'sql proxy'. We just use a super 
> user like 'sql_prc' to submit the 'sql thriftserver' application. To support 
> this we should change the scheduler to support proxy user acl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9048) Add znode hierarchy in Federation ZK State Store

2021-09-08 Thread jackwangcs (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jackwangcs reassigned YARN-9048:


Assignee: jackwangcs

> Add znode hierarchy in Federation ZK State Store
> 
>
> Key: YARN-9048
> URL: https://issues.apache.org/jira/browse/YARN-9048
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: jackwangcs
>Priority: Major
>
> Similar to YARN-2962 consider having hierarchy in ZK federation store for 
> applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411994#comment-17411994
 ] 

Hadoop QA commented on YARN-10870:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 55s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 18m 
16s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
51s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1205/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 59s{color} | 

[jira] [Updated] (YARN-10901) Permission checking error on an existing directory in LogAggregationFileController#verifyAndCreateRemoteLogDir

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10901:
--
Fix Version/s: 3.4.0

> Permission checking error on an existing directory in 
> LogAggregationFileController#verifyAndCreateRemoteLogDir
> --
>
> Key: YARN-10901
> URL: https://issues.apache.org/jira/browse/YARN-10901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.2, 3.3.1
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *LogAggregationFileController.verifyAndCreateRemoteLogDir* tries to check 
> whether the remote file system has set/modify permissions on the 
> _yarn.nodemanager.remote-app-log-dir:_
>  
> {code:java}
>   //Check if FS has capability to set/modify permissions
>   try {
> remoteFS.setPermission(qualified, new 
> FsPermission(TLDIR_PERMISSIONS));
>   } catch (UnsupportedOperationException use) {
> LOG.info("Unable to set permissions for configured filesystem since"
> + " it does not support this", remoteFS.getScheme());
> fsSupportsChmod = false;
>   } catch (IOException e) {
> LOG.warn("Failed to check if FileSystem suppports permissions on "
> + "remoteLogDir [" + remoteRootLogDir + "]", e);
>   } {code}
> But it will fail if the _yarn.nodemanager.remote-app-log-dir_'s owner is not 
> the same as the NodeManager's user.
>  
> Example error
> {code:java}
> 2021-08-27 11:33:21,649 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController:
>  Failed to check if FileSystem suppports permissions on remoteLogDir 
> [/tmp/logs]2021-08-27 11:33:21,649 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController:
>  Failed to check if FileSystem suppports permissions on remoteLogDir 
> [/tmp/logs]org.apache.hadoop.security.AccessControlException: Permission 
> denied. user=yarn is not the owner of inode=/tmp/logs at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:464)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:407)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:417)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:297)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1931)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1876)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:64)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1976)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:858)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:548)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>  at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>  at org.apache.hadoop.hdfs.DFSClient.setPermission(DFSClient.java:1921) at 
> 

[jira] [Resolved] (YARN-10901) Permission checking error on an existing directory in LogAggregationFileController#verifyAndCreateRemoteLogDir

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10901.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> Permission checking error on an existing directory in 
> LogAggregationFileController#verifyAndCreateRemoteLogDir
> --
>
> Key: YARN-10901
> URL: https://issues.apache.org/jira/browse/YARN-10901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.2, 3.3.1
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *LogAggregationFileController.verifyAndCreateRemoteLogDir* tries to check 
> whether the remote file system has set/modify permissions on the 
> _yarn.nodemanager.remote-app-log-dir:_
>  
> {code:java}
>   //Check if FS has capability to set/modify permissions
>   try {
> remoteFS.setPermission(qualified, new 
> FsPermission(TLDIR_PERMISSIONS));
>   } catch (UnsupportedOperationException use) {
> LOG.info("Unable to set permissions for configured filesystem since"
> + " it does not support this", remoteFS.getScheme());
> fsSupportsChmod = false;
>   } catch (IOException e) {
> LOG.warn("Failed to check if FileSystem suppports permissions on "
> + "remoteLogDir [" + remoteRootLogDir + "]", e);
>   } {code}
> But it will fail if the _yarn.nodemanager.remote-app-log-dir_'s owner is not 
> the same as the NodeManager's user.
>  
> Example error
> {code:java}
> 2021-08-27 11:33:21,649 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController:
>  Failed to check if FileSystem suppports permissions on remoteLogDir 
> [/tmp/logs]2021-08-27 11:33:21,649 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController:
>  Failed to check if FileSystem suppports permissions on remoteLogDir 
> [/tmp/logs]org.apache.hadoop.security.AccessControlException: Permission 
> denied. user=yarn is not the owner of inode=/tmp/logs at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:464)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:407)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:417)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:297)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1931)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1876)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:64)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1976)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:858)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:548)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>  at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>  at org.apache.hadoop.hdfs.DFSClient.setPermission(DFSClient.java:1921) at 
> 

[jira] [Assigned] (YARN-10929) Refrain from creating new Configuration object in AbstractManagedParentQueue#initializeLeafQueueConfigs

2021-09-08 Thread jackwangcs (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jackwangcs reassigned YARN-10929:
-

Assignee: jackwangcs

> Refrain from creating new Configuration object in 
> AbstractManagedParentQueue#initializeLeafQueueConfigs
> ---
>
> Key: YARN-10929
> URL: https://issues.apache.org/jira/browse/YARN-10929
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: jackwangcs
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> AbstractManagedParentQueue#initializeLeafQueueConfigs creates a new 
> CapacitySchedulerConfiguration with templated configs only. We should stop 
> doing this. 
> Also, there is a sorting of config keys in this method, but in the end the 
> configs are added to the Configuration object which is an enhanced Map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10903) Too many "Failed to accept allocation proposal" because of wrong Headroom check for DRF

2021-09-08 Thread jackwangcs (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411977#comment-17411977
 ] 

jackwangcs commented on YARN-10903:
---

Hi [~gandras] [~bteke] [~snemeth], could you help to review this patch when you 
have time? Thanks!

> Too many "Failed to accept allocation proposal" because of wrong Headroom 
> check for DRF
> ---
>
> Key: YARN-10903
> URL: https://issues.apache.org/jira/browse/YARN-10903
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: jackwangcs
>Assignee: jackwangcs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The headroom check in  `ParentQueue.canAssign` and 
> `RegularContainerAllocator#checkHeadroom` does not consider the DRF cases.
> This will cause a lot of "Failed to accept allocation proposal" when a queue 
> is near-fully used. 
> In the log:
> Headroom: memory:256, vCores:729
> Request: memory:56320, vCores:5
> clusterResource: memory:673966080, vCores:110494
> If use the DRF, then 
> {code:java}
> Resources.greaterThanOrEqual(rc, clusterResource, Resources.add(
> currentResourceLimits.getHeadroom(), resourceCouldBeUnReserved),
> required); {code}
> will be true but in fact we can not allocate resources to the request due to 
> the max limit(no enough memory).
> {code:java}
> 2021-07-21 23:49:39,012 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1626747977559_95859 
> headRoom= currentConsumption=0
> 2021-07-21 23:49:39,012 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator:
>   Request={AllocationRequestId: -1, Priority: 1, Capability:  vCores:5>, # Containers: 19, Location: *, Relax Locality: true, Execution 
> Type Request: null, Node Label Expression: prod-best-effort-node}
> .
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Try to commit allocation proposal=New 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.ResourceCommitRequest:
>  ALLOCATED=[(Application=appattempt_1626747977559_95859_01; 
> Node=:8041; Resource=)]
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
>  userLimit is fetched. userLimit=, 
> userSpecificUserLimit=, 
> schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=prod-best-effort-node
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Headroom calculation for user x:  userLimit= 
> queueMaxAvailRes= consumed= 
> partition=prod-best-effort-node
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource= exceeded maxResourceLimit of the 
> queue =
> 2021-07-21 23:49:39,013 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10902) Resources on application blacklisted node with reserved container can not allocate to other applications

2021-09-08 Thread jackwangcs (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jackwangcs reassigned YARN-10902:
-

Assignee: jackwangcs

> Resources on application blacklisted node with reserved container can not 
> allocate to other applications
> 
>
> Key: YARN-10902
> URL: https://issues.apache.org/jira/browse/YARN-10902
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: jackwangcs
>Assignee: jackwangcs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If a node has reserved container of an application and the application adds 
> this node into its blacklist, resources on the node can not allocate to other 
> applications in the current allocation process.
>  In RegularContainerAllocator, if it finds the node is in blacklist, it will 
> not allocate resources. Furthermore, this node has a reserved container, 
> other queues or application will not have opportunity to allocate.
> {code:java}
> ContainerAllocation tryAllocateOnNode(Resource clusterResource,
> FiCaSchedulerNode node, SchedulingMode schedulingMode,
> ResourceLimits resourceLimits, SchedulerRequestKey schedulerKey,
> RMContainer reservedContainer) {
>   ContainerAllocation result;
>   // Sanity checks before assigning to this node
>   result = checkIfNodeBlackListed(node, schedulerKey);
>   if (null != result) {
> return result;
>   }
>   // 
> }{code}
> In this case, the reserved container should be cancelled. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10903) Too many "Failed to accept allocation proposal" because of wrong Headroom check for DRF

2021-09-08 Thread jackwangcs (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jackwangcs reassigned YARN-10903:
-

Assignee: jackwangcs

> Too many "Failed to accept allocation proposal" because of wrong Headroom 
> check for DRF
> ---
>
> Key: YARN-10903
> URL: https://issues.apache.org/jira/browse/YARN-10903
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: jackwangcs
>Assignee: jackwangcs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The headroom check in  `ParentQueue.canAssign` and 
> `RegularContainerAllocator#checkHeadroom` does not consider the DRF cases.
> This will cause a lot of "Failed to accept allocation proposal" when a queue 
> is near-fully used. 
> In the log:
> Headroom: memory:256, vCores:729
> Request: memory:56320, vCores:5
> clusterResource: memory:673966080, vCores:110494
> If use the DRF, then 
> {code:java}
> Resources.greaterThanOrEqual(rc, clusterResource, Resources.add(
> currentResourceLimits.getHeadroom(), resourceCouldBeUnReserved),
> required); {code}
> will be true but in fact we can not allocate resources to the request due to 
> the max limit(no enough memory).
> {code:java}
> 2021-07-21 23:49:39,012 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1626747977559_95859 
> headRoom= currentConsumption=0
> 2021-07-21 23:49:39,012 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator:
>   Request={AllocationRequestId: -1, Priority: 1, Capability:  vCores:5>, # Containers: 19, Location: *, Relax Locality: true, Execution 
> Type Request: null, Node Label Expression: prod-best-effort-node}
> .
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Try to commit allocation proposal=New 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.ResourceCommitRequest:
>  ALLOCATED=[(Application=appattempt_1626747977559_95859_01; 
> Node=:8041; Resource=)]
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager:
>  userLimit is fetched. userLimit=, 
> userSpecificUserLimit=, 
> schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=prod-best-effort-node
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Headroom calculation for user x:  userLimit= 
> queueMaxAvailRes= consumed= 
> partition=prod-best-effort-node
> 2021-07-21 23:49:39,013 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource= exceeded maxResourceLimit of the 
> queue =
> 2021-07-21 23:49:39,013 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10919) Remove LeafQueue#scheduler field

2021-09-08 Thread jackwangcs (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411972#comment-17411972
 ] 

jackwangcs commented on YARN-10919:
---

Sure, thanks for your reminder, I will ask the owner next time.

> Remove LeafQueue#scheduler field 
> -
>
> Key: YARN-10919
> URL: https://issues.apache.org/jira/browse/YARN-10919
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As it is the same object as AbstractCSQueue#csContext (from parent class).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10919) Remove LeafQueue#scheduler field

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10919:
--
Fix Version/s: 3.4.0

> Remove LeafQueue#scheduler field 
> -
>
> Key: YARN-10919
> URL: https://issues.apache.org/jira/browse/YARN-10919
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As it is the same object as AbstractCSQueue#csContext (from parent class).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10646) TestCapacitySchedulerWeightMode test descriptor comments doesn't reflect the correct scenario

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10646:
--
Fix Version/s: 3.4.0

> TestCapacitySchedulerWeightMode test descriptor comments doesn't reflect the 
> correct scenario
> -
>
> Key: YARN-10646
> URL: https://issues.apache.org/jira/browse/YARN-10646
> Project: Hadoop YARN
>  Issue Type: Sub-task
> Environment: 
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There is a mixup in TestCapacitySchedulerWeightMode in the configuration 
> creator method comments and the test case descriptor comments. See the 
> following code:
> {code:java}
> /*
>* Queue structure:
>*  root (*)
>*  ___
>* /   \
>*   a x(=100%), y(50%)   b y(=50%), z(=100%)
>*    __
>*  /   /  \
>* a1 ([x,y]: w=100)b1(no)  b2([y,z]: w=100)
>*
>* Parent uses weight, child uses percentage
>*/
>   public static Configuration getCSConfWithLabelsParentUsePctChildUseWeight(
>   Configuration config) {
> {code}
> While inside the method all the queues (including the second level ones) are 
> configured with capacity, only some labels are configured with weights:
> {code:java}
> conf.setLabeledQueueWeight(CapacitySchedulerConfiguration.ROOT, "x", 100);
> conf.setLabeledQueueWeight(CapacitySchedulerConfiguration.ROOT, "y", 100);
> conf.setLabeledQueueWeight(CapacitySchedulerConfiguration.ROOT, "z", 100);
> ...
> conf.setQueues(A, new String[] { "a1" });
> conf.setCapacityByLabel(A1, RMNodeLabelsManager.NO_LABEL, 100);
> conf.setMaximumCapacity(A1, 100);
> conf.setAccessibleNodeLabels(A1, toSet("x", "y"));
> conf.setDefaultNodeLabelExpression(A1, "x");
> conf.setCapacityByLabel(A1, "x", 100);
> conf.setCapacityByLabel(A1, "y", 100);
> conf.setQueues(B, new String[] { "b1", "b2" });
> conf.setCapacityByLabel(B1, RMNodeLabelsManager.NO_LABEL, 50);
> conf.setMaximumCapacity(B1, 50);
> conf.setAccessibleNodeLabels(B1, RMNodeLabelsManager.EMPTY_STRING_SET);
> conf.setCapacityByLabel(B2, RMNodeLabelsManager.NO_LABEL, 50);
> conf.setMaximumCapacity(B2, 50);
> conf.setAccessibleNodeLabels(B2, toSet("y", "z"));
> conf.setCapacityByLabel(B2, "y", 100);
> conf.setCapacityByLabel(B2, "z", 100);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10693) Add documentation for YARN-10623 auto refresh queue conf in CS

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10693:
--
Fix Version/s: 3.4.0

> Add documentation for YARN-10623 auto refresh queue conf in CS
> --
>
> Key: YARN-10693
> URL: https://issues.apache.org/jira/browse/YARN-10693
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10693.001.patch, YARN-10693.002.patch, 
> YARN-10693.003.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10623) Capacity scheduler should support refresh queue automatically by a thread policy.

2021-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10623:
--
Labels: pull-request-available  (was: )

> Capacity scheduler should support refresh queue automatically by a thread 
> policy.
> -
>
> Key: YARN-10623
> URL: https://issues.apache.org/jira/browse/YARN-10623
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10623.001.patch, YARN-10623.002.patch, 
> YARN-10623.003.patch, YARN-10623.004.patch, YARN-10623.005.patch, 
> YARN-10623.006.patch, YARN-10623.007.patch, YARN-10623.008.patch, 
> YARN-10623.009.patch, YARN-10623.010.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In fair scheduler, it is supported that refresh queue related conf 
> automatically by a thread to reload, but in capacity scheduler we only 
> support to refresh queue related changes by refreshQueues, it is needed for 
> our cluster to realize queue manage.
> cc [~wangda] [~ztang] [~pbacsko] [~snemeth] [~gandras]  [~bteke] [~shuzirra]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10693) Add documentation for YARN-10623 auto refresh queue conf in CS

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10693:
--
Summary: Add documentation for YARN-10623 auto refresh queue conf in CS  
(was: Add document for YARN-10623 auto refresh queue conf in CS)

> Add documentation for YARN-10623 auto refresh queue conf in CS
> --
>
> Key: YARN-10693
> URL: https://issues.apache.org/jira/browse/YARN-10693
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10693.001.patch, YARN-10693.002.patch, 
> YARN-10693.003.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2021-09-08 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke resolved YARN-10496.
--
Resolution: Fixed

> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> -
>
> Key: YARN-10496
> URL: https://issues.apache.org/jira/browse/YARN-10496
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |
> 
> |
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10522) Document for Flexible Auto Queue Creation in Capacity Scheduler

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10522:
--
Fix Version/s: 3.4.0

> Document for Flexible Auto Queue Creation in Capacity Scheduler
> ---
>
> Key: YARN-10522
> URL: https://issues.apache.org/jira/browse/YARN-10522
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10522.001.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We should update document to support this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10576) Update Capacity Scheduler documentation with JSON-based placement mapping

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10576:
--
Fix Version/s: 3.4.0

> Update Capacity Scheduler documentation with JSON-based placement mapping
> -
>
> Key: YARN-10576
> URL: https://issues.apache.org/jira/browse/YARN-10576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10576-001.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The weight mode and AQC also affects how the new placement engine in CS works 
> and the documentation has to reflect that.
> Certain statements in the documentation are no longer valid, for example:
> * create flag: "Only applies to managed queue parents" - there is no 
> ManagedParentQueue in weight mode.
> * "The nested rules primaryGroupUser and secondaryGroupUser expects the 
> parent queues to exist, ie. they cannot be created automatically". This only 
> applies to the legacy absolute/percentage mode.
> Find all statements that mentions possible limitations and fix them if 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10576) Update Capacity Scheduler documentation with JSON-based placement mapping

2021-09-08 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411925#comment-17411925
 ] 

Szilard Nemeth commented on YARN-10576:
---

Hi [~bteke],
Thanks for working on this. 
The PR looks good to me, committed to trunk.
As a matter of fact, we agreed on some minor fixes with [~bteke] so I made some 
changes locally and pushed the commit so that's why I closed the PR.
Thanks.

> Update Capacity Scheduler documentation with JSON-based placement mapping
> -
>
> Key: YARN-10576
> URL: https://issues.apache.org/jira/browse/YARN-10576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10576-001.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The weight mode and AQC also affects how the new placement engine in CS works 
> and the documentation has to reflect that.
> Certain statements in the documentation are no longer valid, for example:
> * create flag: "Only applies to managed queue parents" - there is no 
> ManagedParentQueue in weight mode.
> * "The nested rules primaryGroupUser and secondaryGroupUser expects the 
> parent queues to exist, ie. they cannot be created automatically". This only 
> applies to the legacy absolute/percentage mode.
> Find all statements that mentions possible limitations and fix them if 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10576) Update Capacity Scheduler documentation with JSON-based placement mapping

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10576:
--
Summary: Update Capacity Scheduler documentation with JSON-based placement 
mapping  (was: Update Capacity Scheduler documentation about JSON-based 
placement mapping)

> Update Capacity Scheduler documentation with JSON-based placement mapping
> -
>
> Key: YARN-10576
> URL: https://issues.apache.org/jira/browse/YARN-10576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-10576-001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The weight mode and AQC also affects how the new placement engine in CS works 
> and the documentation has to reflect that.
> Certain statements in the documentation are no longer valid, for example:
> * create flag: "Only applies to managed queue parents" - there is no 
> ManagedParentQueue in weight mode.
> * "The nested rules primaryGroupUser and secondaryGroupUser expects the 
> parent queues to exist, ie. they cannot be created automatically". This only 
> applies to the legacy absolute/percentage mode.
> Find all statements that mentions possible limitations and fix them if 
> necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10929) Refrain from creating new Configuration object in AbstractManagedParentQueue#initializeLeafQueueConfigs

2021-09-08 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411903#comment-17411903
 ] 

Szilard Nemeth commented on YARN-10929:
---

Hi [~jackwangcs],
Please try again, I modified your permissions.

> Refrain from creating new Configuration object in 
> AbstractManagedParentQueue#initializeLeafQueueConfigs
> ---
>
> Key: YARN-10929
> URL: https://issues.apache.org/jira/browse/YARN-10929
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> AbstractManagedParentQueue#initializeLeafQueueConfigs creates a new 
> CapacitySchedulerConfiguration with templated configs only. We should stop 
> doing this. 
> Also, there is a sorting of config keys in this method, but in the end the 
> configs are added to the Configuration object which is an enhanced Map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollák updated YARN-10870:
--
Attachment: YARN-10870.002.patch

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411891#comment-17411891
 ] 

Hadoop QA commented on YARN-10870:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 12s{color} 
| {color:red}{color} | {color:red} YARN-10870 does not apply to trunk. Rebase 
required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-10870 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13033209/YARN-10870.001.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1204/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Attachments: YARN-10870.001.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10914) Simplify duplicated code for tracking ResourceUsage in AbstractCSQueue

2021-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-10914:
--
Labels: pull-request-available  (was: )

> Simplify duplicated code for tracking ResourceUsage in AbstractCSQueue
> --
>
> Key: YARN-10914
> URL: https://issues.apache.org/jira/browse/YARN-10914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Alternatively, those could be moved to some computation class, too.
> Relevant methods: 
> incReservedResource, decReservedResource, incPendingResource, 
> decPendingResource, incUsedResource, decUsedResource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411636#comment-17411636
 ] 

Yuan Luo edited comment on YARN-10934 at 9/8/21, 7:26 AM:
--

[~snemeth] Thanks for your reply, have fixed title, it is a NPE Error. I have 
added some yarn config in the attachment.  We use DefaultResourceCalculator and 
queue number of vcore configuration is 0, suspicion and the related, but the 
code is not found the problem.


was (Author: luoyuan):
[~snemeth] Thanks for your reply, have fixed title, it is a NPE Error. I will 
add some information in the attachment.  

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: (was: RM-capacity-scheduler.xml)

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: RM-capacity-scheduler.xml

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: RM-capacity-scheduler.xml

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: (was: RM-capacity-scheduler.xml)

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE

2021-09-08 Thread Yuan Luo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Luo updated YARN-10934:

Attachment: RM-capacity-scheduler.xml
RM-yarn-site.xml

> LeafQueue activateApplications NPE
> --
>
> Key: YARN-10934
> URL: https://issues.apache.org/jira/browse/YARN-10934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 3.3.1
>Reporter: Yuan Luo
>Priority: Major
> Attachments: RM-capacity-scheduler.xml, RM-yarn-site.xml
>
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed 
> DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then 
> our RM crashed, the Exception stack like below.  I think this is a serious 
> bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher 
> (MarkerIgnoringBase.java:error(159)) - Error in handling event type 
> APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
> at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org