[jira] [Comment Edited] (YARN-11114) RMWebServices returns only apps matching exactly the submitted queue name

2022-04-26 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528191#comment-17528191
 ] 

Szilard Nemeth edited comment on YARN-4 at 4/26/22 2:04 PM:


Let me update this with my progress.
I can see 3 ways to solve this.

*Option 1. Just running apps, no other apps (Current implementation)*

The current solution in the PR implements this.
1. Queries running apps by short and full queue names
2. It doesn't / can't query non-running apps by other name than the submitted 
name. 
For example, if the application is subbmitted to "root.default", only this 
exact name can be queried, so the query with value of "default" won't return 
the application.
This is the downside of how the queue is stored inside RmAppImpl as the 
submitted queue is stored, not both versions (leaf name, full queue path).
As there's a clear way to translate only running apps leaf queue to full path 
and vice versa, running apps can be queried by both queue notation.
However, I don't really like this solution as for non-running apps, it works 
differently due to the shortcoming mentioned above.

Advantages: 
 - No API / interface change is required

Disadvantages: 
 - Inconsistent API responses for running vs. non-running apps

 

*Option 2. Store short queue name / full queue path in RmAppImpl with new 
fields*

This could be achieved in: RMAppManager#createAndPopulateNewRMApp

Advantages
 - ClientRMService#getApplications could clearly filter for queue name / full 
queue path, without any hassle.

Disadvantages
 - RmAppImpl should be touched and new fields should be added
 - Impact on RM State store
 - Impact on all schedulers: They need to translate between leaf queue / full 
queue path in order to store both values.

 

*Option 3. Resolve full queue path from leaf queue name and vice-versa*
As ClientRMService has a reference to the scheduler (type: YarnScheduler), a 
new method could be added to resolve full queue path from the given queue name.

Advantages
 - ClientRMService#getApplications could clearly filter for both queue notations

Disadvantages
 - Impact on the YarnScheduler interface
 - Impact on all scheduler implementations


was (Author: snemeth):
Let me update this with my progress.
I can see 3 ways to solve this.


Option 1. Just running apps, no other apps (Current implementation)

The current solution in the PR implements this.
1. Queries running apps by short and full queue names
2. It doesn't / can't query non-running apps by other name than the submitted 
name. 
For example, if the application is subbmitted to "root.default", only this 
exact name can be queried, so the query with value of "default" won't return 
the application.
This is the downside of how the queue is stored inside RmAppImpl as the 
submitted queue is stored, not both versions (leaf name, full queue path).
As there's a clear way to translate only running apps leaf queue to full path 
and vice versa, running apps can be queried by both queue notation.
However, I don't really like this solution as for non-running apps, it works 
differently due to the shortcoming mentioned above.

Advantages: 
- No API / interface change is required

Disadvantages: 
- Inconsistent API responses for running vs. non-running apps

 

Option 2. Store short queue name / full queue path in RmAppImpl with new fields

This could be achieved in: RMAppManager#createAndPopulateNewRMApp

Advantages
- ClientRMService#getApplications could clearly filter for queue name / full 
queue path, without any hassle.

Disadvantages
- RmAppImpl should be touched and new fields should be added
- Impact on RM State store
- Impact on all schedulers: They need to translate between leaf queue / full 
queue path in order to store both values.

 


Option 3. Resolve full queue path from leaf queue name and vice-versa
As ClientRMService has a reference to the scheduler (type: YarnScheduler), a 
new method could be added to resolve full queue path from the given queue name.

Advantages
- ClientRMService#getApplications could clearly filter for both queue notations

Disadvantages
- Impact on the YarnScheduler interface
- Impact on all scheduler implementations

> RMWebServices returns only apps matching exactly the submitted queue name
> -
>
> Key: YARN-4
> URL: https://issues.apache.org/jira/browse/YARN-4
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, webapp
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've added 2 testcases that demonstrate the issue with [this 
> commit|https://github.com/szilard-nemeth/hadoop/com

[jira] [Comment Edited] (YARN-11114) RMWebServices returns only apps matching exactly the submitted queue name

2022-04-22 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525453#comment-17525453
 ] 

Szilard Nemeth edited comment on YARN-4 at 4/22/22 2:28 PM:


For NOTE#3: It's probably gonna be useful to go back in time and checkout a 
commit just before YARN-9879 got merged and add the same testcases to test how 
this worked before the changes introduced by YARN-9879.

If it turns out that either the ClientRMService was indeed only able to filter 
by either the short or the full queue name, then the scope of this Jira can be 
narrowed down.


was (Author: snemeth):
For NOTE#3: It's probably gonna be useful to go back in time and checkout a 
commit just gefoer YARN-9879 got merged and add the same testcases to test how 
this worked before the changes introduced by YARN-9879.

If it turns out that either the ClientRMService was indeed only able to filter 
by either the short or the full queue name, then the scope of this Jira can be 
narrowed down.

> RMWebServices returns only apps matching exactly the submitted queue name
> -
>
> Key: YARN-4
> URL: https://issues.apache.org/jira/browse/YARN-4
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, webapp
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>
> I've added 2 testcases that demonstrate the issue with [this 
> commit|https://github.com/szilard-nemeth/hadoop/commit/88dcf40f4dab564477542b8efb82f4f20d132eee].
> 1. With 'testAppsQueryByQueueShortname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default".
> The testcase queries the apps by queue name "default" and the response only 
> contains the runningApp, which is submitted to "default" so the other app 
> that is submitted to "root.default" is not returned.
> 2. With 'testAppsQueryByQueueFullname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default" (same 
> setup as above).
> The testcase queries the apps by queue name "root.default" (which is the full 
> queue path) and the response only contains the finishedApp, which is 
> submittted to "root.default" so the other app that is submitted to "default" 
> is not returned.
> A trivial conclusion of this is that only those applications are included in 
> the response that exactly match the queue name where the application is 
> submitted to, either specified explicity at submission or resolved by the 
> placement engine.
> Before YARN-9879 was implemented, Capacity Scheduler was only capable of 
> definining a leaf queue with a specific name in the whole hierarchy once, 
> meaning that leaf queue names were unique.
> For example root.a.testQueue and root.b.testQueue couldn't coexist, as the 
> leaf queue name is the same.
> At this point, I supposed that YARN-9879 is causing this issue, but as the 
> behaviour of CS before YARN-9879 was merged didn't allow two leaf queues with 
> the same name, a query of "root.default" and "default" could easily work as 
> it was guaranteed that there's not another "default" leaf queue in the 
> hierarchy, just one. I digged a bit further.
> I also noticed that YARN-8659 ([commit 
> link|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797])
>  could have introduced this issue a long time ago, as it removed the iterator 
> logic that queried the applications with method YarnScheduler#getAppsInQueue 
> (see 
> [this|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797#diff-5b432bf3a8eb3e039878300ffb9db1f728226b9e3f63c4eb53be5ed5a833390aL843]).
> Let's follow the implementation of YarnScheduler#getAppsInQueue for CS: 
> 1. First of all, 
> [here|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L2501-L2509]
>  is the method definition.
> [CapacityScheduler#getQueue|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L824-L829]
>  is called from here.
> 2. 
> [CapacityScheduler#getQueue|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L824-L829]
>