[ 
https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265208#comment-17265208
 ] 

Gergely Pollak edited comment on YARN-10506 at 1/14/21, 8:29 PM:
-----------------------------------------------------------------

Thank you [~wangda] [~pbacsko] for your insights!

Actually the queue creation checks were present in the legacy engine 
([https://github.com/apache/hadoop/blob/9d5144e66d71551ad6e1a8d3f1670e49ba181999/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java#L333-L340]),
 if I see correctly since 3.0. But the first commit seems much older: 
[https://github.com/apache/hadoop/commit/0987a7b8cbbbb2c1e4c2095821d98a7db19644df]
 . 

I've just kept it during the refactor, and as Peter pointed out now we actually 
build on it, since we need to be able to determine if a rule will be executed 
successfully (ie. the queue can be created), and if it cannot we can move onto 
the next rule. This fallback behaviour is part of the FS Placement engine as 
well, so we need it in order to server FS customers, so it was somewhat lucky 
that it was already in place. (In the original implementation we just used this 
check to throw exceptions, which was quite pointless, since the CS would have 
rejected the application if the queue was uncreatable anyway)

So this is why I say we can safely remove those flags from the application 
placement context, this way we can reduce the code complexity.

This way we only need what [~pbacsko] mentioned:
{code:java}
1) "create" flag on the rule itself in the JSON - true/false
2) "yarh.scheduler.capacity.<path>.auto-queue-creation-v2.enabled" - 
true/false{code}
Also I'm currently adopting this change in the CSMappingPlacementRule, and I 
don't use these flags. What I'll need in the future a better, centralised way 
to determine if a path can be created, and both CS and the MappingEngine should 
use the same methods, to do it. Currently we are implementing the same logic 
differently, and it's much harder to maintain the code this way, but during the 
followup code cleanup jiras we'll take care of this.

About the race condition, I think (and I might be very wrong, because race 
conditions can be really sneaky), we cannot do much about, theoretically it can 
happen that the Mapping engine determines a rule can be executed, because the 
target queue exists or there is a proper dynamic parent, then a config change 
occurs, and by the time we get to the actual creation the queue structure 
changes, at this point the rejection is the proper action to take since the 
submission IS against the configuration. And I don't see a way around it since 
at the time the MappingEngine makes the decision that decision is the correct 
one, but we shouldn't enforce this decision if the configuration changes. But 
this race condition should only occur when the user changes the queue 
configuration, and in this case they should expect failing application 
submissions for the queues which have been changed or removed.


was (Author: shuzirra):
Actually the queue creation checks were present in the legacy engine 
(https://github.com/apache/hadoop/blob/9d5144e66d71551ad6e1a8d3f1670e49ba181999/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java#L333-L340),
 if I see correctly since 3.0. But the first commit seems much older: 
[https://github.com/apache/hadoop/commit/0987a7b8cbbbb2c1e4c2095821d98a7db19644df]
 . 

I've just kept it during the refactor, and as Peter pointed out now we actually 
build on it, since we need to be able to determine if a rule will be executed 
successfully (ie. the queue can be created), and if it cannot we can move onto 
the next rule. This fallback behaviour is part of the FS Placement engine as 
well, so we need it in order to server FS customers, so it was somewhat lucky 
that it was already in place. (In the original implementation we just used this 
check to throw exceptions, which was quite pointless, since the CS would have 
rejected the application if the queue was uncreatable anyway)

So this is why I say we can safely remove those flags from the application 
placement context, this way we can reduce the code complexity.

This way we only need what [~pbacsko] mentioned:
{code:java}
1) "create" flag on the rule itself in the JSON - true/false
2) "yarh.scheduler.capacity.<path>.auto-queue-creation-v2.enabled" - 
true/false{code}
Also I'm currently adopting this change in the CSMappingPlacementRule, and I 
don't use these flags. What I'll need in the future a better, centralised way 
to determine if a path can be created, and both CS and the MappingEngine should 
use the same methods, to do it. Currently we are implementing the same logic 
differently, and it's much harder to maintain the code this way, but during the 
followup code cleanup jiras we'll take care of this.

About the race condition, I think (and I might be very wrong, because race 
conditions can be really sneaky), we cannot do much about, theoretically it can 
happen that the Mapping engine determines a rule can be executed, because the 
target queue exists or there is a proper dynamic parent, then a config change 
occurs, and by the time we get to the actual creation the queue structure 
changes, at this point the rejection is the proper action to take since the 
submission IS against the configuration. And I don't see a way around it since 
at the time the MappingEngine makes the decision that decision is the correct 
one, but we shouldn't enforce this decision if the configuration changes. But 
this race condition should only occur when the user changes the queue 
configuration, and in this case they should expect failing application 
submissions for the queues which have been changed or removed.

> Update queue creation logic to use weight mode and allow the flexible 
> static/dynamic creation
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-10506
>                 URL: https://issues.apache.org/jira/browse/YARN-10506
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Benjamin Teke
>            Assignee: Andras Gyori
>            Priority: Major
>         Attachments: YARN-10506-006-10504-010.patch, 
> YARN-10506-007-10504-010.patch, YARN-10506-008.patch, YARN-10506-010.patch, 
> YARN-10506-012.patch, YARN-10506.001.patch, YARN-10506.002.patch, 
> YARN-10506.003.patch, YARN-10506.004.patch, YARN-10506.005.patch, 
> YARN-10506.006-combined.patch, YARN-10506.006.patch, YARN-10506.007.patch, 
> YARN-10506.009.patch, YARN-10506.011.patch
>
>
> The queue creation logic should be updated to use weight mode and support the 
> flexible creation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to