[ 
https://issues.apache.org/jira/browse/YARN-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355179#comment-17355179
 ] 

Hadoop QA commented on YARN-10787:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 
51s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} branch-3.3 Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
34s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 55s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 22m 
26s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
56s{color} | {color:green}{color} | {color:green} branch-3.3 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
26s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
27s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 27s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 32s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 8 new + 21 unchanged - 2 fixed = 29 total (was 23) {color} 
|
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
27s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
40s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/patch-shadedclient.txt{color}
 | {color:red} patch has errors when building and testing our client artifacts. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} spotbugs {color} | {color:red}  0m 
28s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/patch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 30s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
27s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/patch-asflicense-problems.txt{color}
 | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m 42s{color} | 
{color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10787 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13026251/YARN-10787.branch-3.3.001.patch
 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle spotbugs |
| uname | Linux 7f3b1a839031 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | branch-3.3 / ede03cc35c9 |
| Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 |
|  Test Results | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/testReport/ |
| Max. process+thread count | 513 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1033/console |
| versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Queue submit ACL check is wrong when CS queue is ambiguous
> ----------------------------------------------------------
>
>                 Key: YARN-10787
>                 URL: https://issues.apache.org/jira/browse/YARN-10787
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.4.0
>            Reporter: Szilard Nemeth
>            Assignee: Gergely Pollák
>            Priority: Major
>             Fix For: 3.4.0
>
>         Attachments: YARN-10787.001.patch, YARN-10787.branch-3.3.001.patch
>
>
> Let's suppose we have a Capacity Scheduler configuration with 2 or more leaf 
> queues with the same name in the queue hierarchy. That's what we call an 
> ambiguous queue name.
>  Let's also enable ACL checks and define acl_submit_applications / 
> acl_administer_queue configs with the correct value, adding the username to 
> the ACL value there.
> Here's a minimalistic YARN + CS config:
> h2. 1. YARN config snippet:
> {code:java}
> <property><name>yarn.acl.enable</name><value>true</value>
> {code}
> h2. 2. CS config snippet:
> {code:java}
> <property>
>       <name>yarn.scheduler.capacity.root.someparent1.queues</name>
>       <value>anyotherqueue1,somequeue,anyotherqueue2</value>
> </property>
> <property>
>       <name>yarn.scheduler.capacity.root.someparent2.queues</name>
>       <value>anyotherqueue3,somequeue,anyotherqueue4</value>
> </property>
> <property>
>       
> <name>yarn.scheduler.capacity.root.someparent1.somequeue.acl_submit_applications</name>
>       <value>someuser1 </value>
> </property>
> <property>
>       
> <name>yarn.scheduler.capacity.root.someparent2.somequeue.acl_submit_applications</name>
>       <value>someuser1 </value>
> </property>
> <property>
>       
> <name>yarn.scheduler.capacity.root.someparent1.somequeue.acl_administer_queue</name>
>       <value>someuser1 </value>
> </property>
> <property>
>       
> <name>yarn.scheduler.capacity.root.someparent2.somequeue.acl_administer_queue</name>
>       <value>someuser1 </value>
> </property>
> {code}
> So in this case, we have an ambiguous queue named "somequeue" under 2 
> different paths:
>  - root.someparent1.somequeue
>  - root.someparent2.somequeue
> When a user submits an application correctly with the full queue path e.g. 
> root.someparent1.somequeue, YARN will still fail to place the application to 
> that queue and will use the short name in case ACL checking is enabled.
> h2. 3. LOG SNIPPET
> {code:java}
> 2021-05-20 22:04:32,031 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.placement.CSMappingPlacementRule:
>  Placement final result 'root.someparent1.somequeue' for application 
> 'application_1621540945412_0001'
>  2021-05-20 22:04:32,031 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Placed 
> application with ID application_1621540945412_0001 in queue: somequeue, 
> original submission queue was: root.someparent1.somequeue
>  2021-05-20 22:04:32,031 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Ambiguous queue reference: somequeue please use full queue path instead.
>  2021-05-20 22:04:32,031 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application 'application_1621540945412_0001' is submitted without priority 
> hence considering default queue/cluster priority: 0
>  2021-05-20 22:04:32,032 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Priority '0' is acceptable in queue : somequeue for application: 
> application_1621540945412_0001
>  2021-05-20 22:04:32,993 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Exception in 
> submitting application_1621540945412_0001
>  org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.security.AccessControlException: User someuser1 does not 
> have permission to submit application_1621540945412_0001 to queue somequeue
>  at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> {code}
> h2. 4. FULL STACKTRACE:
> {code:java}
>  org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.security.AccessControlException: User someuser1 does not 
> have permission to submit application_1621540945412_0001 to queue somequeue
>       at 
> org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:433)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:330)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:650)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
>       at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
>       at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> Caused by: org.apache.hadoop.security.AccessControlException: User someuser1 
> does not have permission to submit application_1621540945412_0001 to queue 
> somequeue
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:436)
>       ... 12 more
> 2021-05-20 22:04:32,994 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=someuser1   
>    IP=172.17.61.133        OPERATION=Submit Application Request    
> TARGET=ClientRMService  RESULT=FAILURE  DESCRIPTION=Exception in submitting 
> application PERMISSIONS=org.apache.hadoop.security.AccessControlException: 
> User someuser1 does not have permission to submit 
> application_1621540945412_0001 to queue somequeue      
> APPID=application_1621540945412_0001    QUEUENAME=somequeue
> {code}
> h1. DETAILS:
> *1. The whole thing happens in RMAppManager#createAndPopulateNewRMApp:*
>  Class / method: 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager#createAndPopulateNewRMApp
> [LINK|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L407]
> *2. RMAppManager#copyPlacementQueueToSubmissionContext is called* for 
> applications that are new, meaning we are not recovering, an application is 
> submitted in a normal way:
>  Class / method: 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager#copyPlacementQueueToSubmissionContext
> [Called 
> at|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L420]
> [Method 
> link|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L991]
> The problem is that copyPlacementQueueToSubmissionContext sets the queue of 
> context (ApplicationSubmissionContext object) from placementContext.getQueue 
> (ApplicationPlacementContext object). If placementcontext holds the queue 
> name in the short form, this will override the default submission queue 
> value, let's suppose it was the full queue path.
>  An example of a generated log from this method:
> {code:java}
>  2021-05-20 22:04:32,031 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Placed 
> application with ID application_1621540945412_0001 in queue: somequeue, 
> original submission queue was: root.someparent1.somequeue
> {code}
> *3. The problematic code block is here:* [Code 
> block|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L446-L475]
> 3.1 First, the short queuename will be gathered from submissionContext, as it 
> was overridden by 'copyPlacementQueueToSubmissionContext': 
> [Link|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L448]
>  This is a bad design, as here we are relying on the fact that the queue name 
> was overridden in the submission context object.
> 3.2 Since the queue name will be in the short form and it's ambiguous, the 
> call to 
> [scheduler.getQueue()|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L450]
>  will return null, as it's implemented like this by design: If the queue name 
> is ambiguous, it returns null.
> 3.3 The condition of checking if csqueue is null AND placementContext is not 
> null will evaluate to true 
> [here|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L452]
> *3.4. The Parent queue will be queried from CS* by the parent queue name of 
> the placement context: 
> [Link|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L456]
> *3.5 Finally, the ACL check fails* as csqueue is the queue object of the 
> parent queue of the queue 'root.someparent1.somequeue' which will be the 
> queue: 'root.someparent1'.
>  In this case, the user don't have a submission ACL set for the parent queue, 
> but the leaf queue so the ACL check fails.
> h2. LIST OF THINGS TO FIX / DO:
>  - Add a unit testcase that replicates the above config and the issue.
>  - Rename copyPlacementQueueToSubmissionContext: This method not really 
> copies anything, it simply overrides the queue value.
>  - Add Debug log to print csqueue object before the authorization code: [Auth 
> code 
> block|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L459-L475]
>  - Fix log messages: As 'copyPlacementQueueToSubmissionContext' overrides 
> (not copies) the original queue name with the queue name from the 
> PlacementContext, all calls to submissionContext.getQueue() will return the 
> short queue name. This results in very misleading log messages as well, 
> including the exception message itself:
> {code:java}
>  org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.security.AccessControlException: User someuser1 does not 
> have permission to submit application_1621540945412_0001 to queue somequeue
> {code}
> All log messages should print the original submission queue, if possible.
>  - Actual code fix for the issue: Use full queue path to get the queue object.
>  Again, this is the code block where the fix should happen: 
> [LINK|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L447-L458]
> 'queueName' should have the value set from: 
> *org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext#getFullQueuePath.*
> The equivalent of this in the linked code block:
> {code:java}
> placementContext.getFullQueuePath()
> {code}
> This should happen only if placementContext is not null.
> h2. LONG TERM FIX:
> Investigate if it's possible to eliminate 
> copyPlacementQueueToSubmissionContext.
>  This could introduce nasty backward incompatible issues with recovery, so it 
> should be thought through really carefully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to