[jira] [Assigned] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler

2022-04-26 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang reassigned YARN-5:
---

Assignee: Junfan Zhang

> Add configuration to disable AM preemption for capacity scheduler
> -
>
> Key: YARN-5
> URL: https://issues.apache.org/jira/browse/YARN-5
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yuan Luo
>Assignee: Junfan Zhang
>Priority: Major
>
> I think it's necessary to add configuration to disable AM preemption for 
> capacity-scheduler, like fair-scheduler feature: YARN-9537.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11115) Add configuration to disable AM preemption for capacity scheduler

2022-04-26 Thread Yuan Luo (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528508#comment-17528508
 ] 

Yuan Luo commented on YARN-5:
-

[~zuston]  If you can help do that, that would be great.

> Add configuration to disable AM preemption for capacity scheduler
> -
>
> Key: YARN-5
> URL: https://issues.apache.org/jira/browse/YARN-5
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Yuan Luo
>Priority: Major
>
> I think it's necessary to add configuration to disable AM preemption for 
> capacity-scheduler, like fair-scheduler feature: YARN-9537.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10080) Support show app id on localizer thread pool

2022-04-26 Thread Ashutosh Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528458#comment-17528458
 ] 

Ashutosh Gupta commented on YARN-10080:
---

[~cane] - This seems to be useful while debugging. Can you raise a PR or I can 
do that?

> Support show app id on localizer thread pool
> 
>
> Key: YARN-10080
> URL: https://issues.apache.org/jira/browse/YARN-10080
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-10080-001.patch, YARN-10080.002.patch
>
>
> Currently when we are troubleshooting a container localizer issue, if we want 
> to analyze the jstack with thread detail, we can not figure out which thread 
> is processing the given container. So i want to add app id on the thread name



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11092) Upgrade jquery ui to 1.13.1

2022-04-26 Thread D M Murali Krishna Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528335#comment-17528335
 ] 

D M Murali Krishna Reddy commented on YARN-11092:
-

[~groot] , you can take up this task.

> Upgrade jquery ui to 1.13.1
> ---
>
> Key: YARN-11092
> URL: https://issues.apache.org/jira/browse/YARN-11092
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
>
> The current jquery-ui version used(1.12.1) in the trunk has the following 
> vulnerabilities CVE-2021-41182, CVE-2021-41183, CVE-2021-41184, so we need to 
> upgrade to at least 1.13.0.
>  
> Also currently for the UI2 we are using the shims repo which is not being 
> maintained as per the discussion 
> [https://github.com/components/jqueryui/issues/70] , so if possible we should 
> move to the main jquery repo [https://github.com/jquery/jquery-ui] 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11114) RMWebServices returns only apps matching exactly the submitted queue name

2022-04-26 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528276#comment-17528276
 ] 

Szilard Nemeth commented on YARN-4:
---

Just checked how this worked before YARN-9879.
Checked out git commit: 25a03bfeced (Before YARN-9879, parent of commit of 
YARN-9879)

*testAppsQueryByQueueShortname*
        runningApp1, queue: default
        runningApp2, queue: root.default
        finishedApp, queue: root.default
        query: default

        runningApp1 is in the result list
        runningApp2 is NOT in the result list
        finishedApp is NOT in the result list
  
*testAppsQueryByQueueFullname*
        runningApp1, queue: default
        runningApp2, queue: root.default
        finishedApp, queue: root.default
        query: root.default

        runningApp1 is NOT in the result list
        runningApp2 is in the result list
        finishedApp is in the result list

Conclusion: Just the exact queue name match of submitted vs. queried queue name 
works.

So, Option 1 above just improves on this as it queries running apps based on 
both queue notations.

I think it's okay to keep Option 1 for now.

> RMWebServices returns only apps matching exactly the submitted queue name
> -
>
> Key: YARN-4
> URL: https://issues.apache.org/jira/browse/YARN-4
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, webapp
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've added 2 testcases that demonstrate the issue with [this 
> commit|https://github.com/szilard-nemeth/hadoop/commit/88dcf40f4dab564477542b8efb82f4f20d132eee].
> 1. With 'testAppsQueryByQueueShortname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default".
> The testcase queries the apps by queue name "default" and the response only 
> contains the runningApp, which is submitted to "default" so the other app 
> that is submitted to "root.default" is not returned.
> 2. With 'testAppsQueryByQueueFullname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default" (same 
> setup as above).
> The testcase queries the apps by queue name "root.default" (which is the full 
> queue path) and the response only contains the finishedApp, which is 
> submittted to "root.default" so the other app that is submitted to "default" 
> is not returned.
> A trivial conclusion of this is that only those applications are included in 
> the response that exactly match the queue name where the application is 
> submitted to, either specified explicity at submission or resolved by the 
> placement engine.
> Before YARN-9879 was implemented, Capacity Scheduler was only capable of 
> definining a leaf queue with a specific name in the whole hierarchy once, 
> meaning that leaf queue names were unique.
> For example root.a.testQueue and root.b.testQueue couldn't coexist, as the 
> leaf queue name is the same.
> At this point, I supposed that YARN-9879 is causing this issue, but as the 
> behaviour of CS before YARN-9879 was merged didn't allow two leaf queues with 
> the same name, a query of "root.default" and "default" could easily work as 
> it was guaranteed that there's not another "default" leaf queue in the 
> hierarchy, just one. I digged a bit further.
> I also noticed that YARN-8659 ([commit 
> link|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797])
>  could have introduced this issue a long time ago, as it removed the iterator 
> logic that queried the applications with method YarnScheduler#getAppsInQueue 
> (see 
> [this|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797#diff-5b432bf3a8eb3e039878300ffb9db1f728226b9e3f63c4eb53be5ed5a833390aL843]).
> Let's follow the implementation of YarnScheduler#getAppsInQueue for CS: 
> 1. First of all, 
> [here|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L2501-L2509]
>  is the method definition.
> [CapacityScheduler#getQueue|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L824-L829]
>  is called from here.
> 2. 
> 

[jira] [Comment Edited] (YARN-11114) RMWebServices returns only apps matching exactly the submitted queue name

2022-04-26 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528191#comment-17528191
 ] 

Szilard Nemeth edited comment on YARN-4 at 4/26/22 2:04 PM:


Let me update this with my progress.
I can see 3 ways to solve this.

*Option 1. Just running apps, no other apps (Current implementation)*

The current solution in the PR implements this.
1. Queries running apps by short and full queue names
2. It doesn't / can't query non-running apps by other name than the submitted 
name. 
For example, if the application is subbmitted to "root.default", only this 
exact name can be queried, so the query with value of "default" won't return 
the application.
This is the downside of how the queue is stored inside RmAppImpl as the 
submitted queue is stored, not both versions (leaf name, full queue path).
As there's a clear way to translate only running apps leaf queue to full path 
and vice versa, running apps can be queried by both queue notation.
However, I don't really like this solution as for non-running apps, it works 
differently due to the shortcoming mentioned above.

Advantages: 
 - No API / interface change is required

Disadvantages: 
 - Inconsistent API responses for running vs. non-running apps

 

*Option 2. Store short queue name / full queue path in RmAppImpl with new 
fields*

This could be achieved in: RMAppManager#createAndPopulateNewRMApp

Advantages
 - ClientRMService#getApplications could clearly filter for queue name / full 
queue path, without any hassle.

Disadvantages
 - RmAppImpl should be touched and new fields should be added
 - Impact on RM State store
 - Impact on all schedulers: They need to translate between leaf queue / full 
queue path in order to store both values.

 

*Option 3. Resolve full queue path from leaf queue name and vice-versa*
As ClientRMService has a reference to the scheduler (type: YarnScheduler), a 
new method could be added to resolve full queue path from the given queue name.

Advantages
 - ClientRMService#getApplications could clearly filter for both queue notations

Disadvantages
 - Impact on the YarnScheduler interface
 - Impact on all scheduler implementations


was (Author: snemeth):
Let me update this with my progress.
I can see 3 ways to solve this.


Option 1. Just running apps, no other apps (Current implementation)

The current solution in the PR implements this.
1. Queries running apps by short and full queue names
2. It doesn't / can't query non-running apps by other name than the submitted 
name. 
For example, if the application is subbmitted to "root.default", only this 
exact name can be queried, so the query with value of "default" won't return 
the application.
This is the downside of how the queue is stored inside RmAppImpl as the 
submitted queue is stored, not both versions (leaf name, full queue path).
As there's a clear way to translate only running apps leaf queue to full path 
and vice versa, running apps can be queried by both queue notation.
However, I don't really like this solution as for non-running apps, it works 
differently due to the shortcoming mentioned above.

Advantages: 
- No API / interface change is required

Disadvantages: 
- Inconsistent API responses for running vs. non-running apps

 

Option 2. Store short queue name / full queue path in RmAppImpl with new fields

This could be achieved in: RMAppManager#createAndPopulateNewRMApp

Advantages
- ClientRMService#getApplications could clearly filter for queue name / full 
queue path, without any hassle.

Disadvantages
- RmAppImpl should be touched and new fields should be added
- Impact on RM State store
- Impact on all schedulers: They need to translate between leaf queue / full 
queue path in order to store both values.

 


Option 3. Resolve full queue path from leaf queue name and vice-versa
As ClientRMService has a reference to the scheduler (type: YarnScheduler), a 
new method could be added to resolve full queue path from the given queue name.

Advantages
- ClientRMService#getApplications could clearly filter for both queue notations

Disadvantages
- Impact on the YarnScheduler interface
- Impact on all scheduler implementations

> RMWebServices returns only apps matching exactly the submitted queue name
> -
>
> Key: YARN-4
> URL: https://issues.apache.org/jira/browse/YARN-4
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, webapp
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've added 2 testcases that demonstrate the issue with [this 
> 

[jira] [Commented] (YARN-11114) RMWebServices returns only apps matching exactly the submitted queue name

2022-04-26 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528191#comment-17528191
 ] 

Szilard Nemeth commented on YARN-4:
---

Let me update this with my progress.
I can see 3 ways to solve this.


Option 1. Just running apps, no other apps (Current implementation)

The current solution in the PR implements this.
1. Queries running apps by short and full queue names
2. It doesn't / can't query non-running apps by other name than the submitted 
name. 
For example, if the application is subbmitted to "root.default", only this 
exact name can be queried, so the query with value of "default" won't return 
the application.
This is the downside of how the queue is stored inside RmAppImpl as the 
submitted queue is stored, not both versions (leaf name, full queue path).
As there's a clear way to translate only running apps leaf queue to full path 
and vice versa, running apps can be queried by both queue notation.
However, I don't really like this solution as for non-running apps, it works 
differently due to the shortcoming mentioned above.

Advantages: 
- No API / interface change is required

Disadvantages: 
- Inconsistent API responses for running vs. non-running apps

 

Option 2. Store short queue name / full queue path in RmAppImpl with new fields

This could be achieved in: RMAppManager#createAndPopulateNewRMApp

Advantages
- ClientRMService#getApplications could clearly filter for queue name / full 
queue path, without any hassle.

Disadvantages
- RmAppImpl should be touched and new fields should be added
- Impact on RM State store
- Impact on all schedulers: They need to translate between leaf queue / full 
queue path in order to store both values.

 


Option 3. Resolve full queue path from leaf queue name and vice-versa
As ClientRMService has a reference to the scheduler (type: YarnScheduler), a 
new method could be added to resolve full queue path from the given queue name.

Advantages
- ClientRMService#getApplications could clearly filter for both queue notations

Disadvantages
- Impact on the YarnScheduler interface
- Impact on all scheduler implementations

> RMWebServices returns only apps matching exactly the submitted queue name
> -
>
> Key: YARN-4
> URL: https://issues.apache.org/jira/browse/YARN-4
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, webapp
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've added 2 testcases that demonstrate the issue with [this 
> commit|https://github.com/szilard-nemeth/hadoop/commit/88dcf40f4dab564477542b8efb82f4f20d132eee].
> 1. With 'testAppsQueryByQueueShortname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default".
> The testcase queries the apps by queue name "default" and the response only 
> contains the runningApp, which is submitted to "default" so the other app 
> that is submitted to "root.default" is not returned.
> 2. With 'testAppsQueryByQueueFullname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default" (same 
> setup as above).
> The testcase queries the apps by queue name "root.default" (which is the full 
> queue path) and the response only contains the finishedApp, which is 
> submittted to "root.default" so the other app that is submitted to "default" 
> is not returned.
> A trivial conclusion of this is that only those applications are included in 
> the response that exactly match the queue name where the application is 
> submitted to, either specified explicity at submission or resolved by the 
> placement engine.
> Before YARN-9879 was implemented, Capacity Scheduler was only capable of 
> definining a leaf queue with a specific name in the whole hierarchy once, 
> meaning that leaf queue names were unique.
> For example root.a.testQueue and root.b.testQueue couldn't coexist, as the 
> leaf queue name is the same.
> At this point, I supposed that YARN-9879 is causing this issue, but as the 
> behaviour of CS before YARN-9879 was merged didn't allow two leaf queues with 
> the same name, a query of "root.default" and "default" could easily work as 
> it was guaranteed that there's not another "default" leaf queue in the 
> hierarchy, just one. I digged a bit further.
> I also noticed that YARN-8659 ([commit 
> link|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797])
>  could have introduced this issue a long time ago, as it removed the iterator 
> logic that queried the applications with method YarnScheduler#getAppsInQueue 
> (see 
> 

[jira] [Updated] (YARN-11114) RMWebServices returns only apps matching exactly the submitted queue name

2022-04-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-4:
--
Labels: pull-request-available  (was: )

> RMWebServices returns only apps matching exactly the submitted queue name
> -
>
> Key: YARN-4
> URL: https://issues.apache.org/jira/browse/YARN-4
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, webapp
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've added 2 testcases that demonstrate the issue with [this 
> commit|https://github.com/szilard-nemeth/hadoop/commit/88dcf40f4dab564477542b8efb82f4f20d132eee].
> 1. With 'testAppsQueryByQueueShortname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default".
> The testcase queries the apps by queue name "default" and the response only 
> contains the runningApp, which is submitted to "default" so the other app 
> that is submitted to "root.default" is not returned.
> 2. With 'testAppsQueryByQueueFullname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default" (same 
> setup as above).
> The testcase queries the apps by queue name "root.default" (which is the full 
> queue path) and the response only contains the finishedApp, which is 
> submittted to "root.default" so the other app that is submitted to "default" 
> is not returned.
> A trivial conclusion of this is that only those applications are included in 
> the response that exactly match the queue name where the application is 
> submitted to, either specified explicity at submission or resolved by the 
> placement engine.
> Before YARN-9879 was implemented, Capacity Scheduler was only capable of 
> definining a leaf queue with a specific name in the whole hierarchy once, 
> meaning that leaf queue names were unique.
> For example root.a.testQueue and root.b.testQueue couldn't coexist, as the 
> leaf queue name is the same.
> At this point, I supposed that YARN-9879 is causing this issue, but as the 
> behaviour of CS before YARN-9879 was merged didn't allow two leaf queues with 
> the same name, a query of "root.default" and "default" could easily work as 
> it was guaranteed that there's not another "default" leaf queue in the 
> hierarchy, just one. I digged a bit further.
> I also noticed that YARN-8659 ([commit 
> link|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797])
>  could have introduced this issue a long time ago, as it removed the iterator 
> logic that queried the applications with method YarnScheduler#getAppsInQueue 
> (see 
> [this|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797#diff-5b432bf3a8eb3e039878300ffb9db1f728226b9e3f63c4eb53be5ed5a833390aL843]).
> Let's follow the implementation of YarnScheduler#getAppsInQueue for CS: 
> 1. First of all, 
> [here|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L2501-L2509]
>  is the method definition.
> [CapacityScheduler#getQueue|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L824-L829]
>  is called from here.
> 2. 
> [CapacityScheduler#getQueue|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L824-L829]
>  is then calling 
> [QueueManager#getQueue|https://github.com/apache/hadoop/blob/da09d68056d4e6a9490ddc6d9ae816b65217e117/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java#L136-L138].
> 3. 
> [QueueManager#getQueue|https://github.com/apache/hadoop/blob/da09d68056d4e6a9490ddc6d9ae816b65217e117/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java#L136-L138]
>  is then calling [CSQueueStore#get|#get].
> 4. [CSQueueStore#get|#get] calls the 'getMap' fields getOrDefault