from:"Shilun Fan \(Jira\)"

[jira] [Updated] (YARN-11684) PriorityQueueComparator violates general contract

2024-04-20 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11684:
--
Fix Version/s: 3.4.1
   3.5.0

> PriorityQueueComparator violates general contract
> -
>
> Key: YARN-11684
> URL: https://issues.apache.org/jira/browse/YARN-11684
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.5.0
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> YARN-10178 tried to fix the issue but there are still 2 property that might 
> change during sorting which causes an exception.
> {code}
> 2024-04-10 12:36:56,420 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[Thread-28,5,main] threw an Exception.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
> at java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeCollapse(TimSort.java:441)
> at java.util.TimSort.sort(TimSort.java:245)
> at java.util.Arrays.sort(Arrays.java:1512)
> at 
> java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:348)
> at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
> at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:483)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
> at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:260)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:1100)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:942)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1719)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1654)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1811)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1557)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:539)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:591)
> {code}
> The `queue.getAccessibleNodeLabels()` and `queue.getPriority()` could change 
> in another thread while the `queues` are being sorted. Those should be saved 
> when constructing the PriorityQueueResourcesForSorting helper object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11444) Improve YARN md documentation format

2024-04-07 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-11444.
---
   Fix Version/s: 3.4.1
  3.5.0
Hadoop Flags: Reviewed
Target Version/s: 3.4.1, 3.5.0  (was: 3.5.0)
  Resolution: Fixed

> Improve YARN md documentation format
> 
>
> Key: YARN-11444
> URL: https://issues.apache.org/jira/browse/YARN-11444
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> 1. Modify some typo errors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11444) Improve YARN md documentation format

2024-04-05 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11444:
--
Description: 1. Modify some typo errors  (was: 1. Improve the table format 
to make the readability better
2. Modify some typo errors
3. Modify the list number to display correctly)

> Improve YARN md documentation format
> 
>
> Key: YARN-11444
> URL: https://issues.apache.org/jira/browse/YARN-11444
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> 1. Modify some typo errors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11663) [Federation] Add Cache Entity Nums Limit.

2024-04-01 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-11663.
---
   Fix Version/s: 3.4.1
  3.5.0
Target Version/s: 3.4.0
Assignee: Shilun Fan
  Resolution: Fixed

> [Federation] Add Cache Entity Nums Limit.
> -
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit？



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11668) Potential concurrent modification exception for node attributes of node manager

2024-03-28 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-11668.
---
   Fix Version/s: 3.4.1
  3.5.0
Hadoop Flags: Reviewed
Target Version/s: 3.4.1
Assignee: Junfan Zhang
  Resolution: Fixed

> Potential concurrent modification exception for node attributes of node 
> manager
> ---
>
> Key: YARN-11668
> URL: https://issues.apache.org/jira/browse/YARN-11668
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
> Attachments: img_v3_029c_55ac6b50-64aa-4cbe-81a0-5f8d22c623fg.jpg
>
>
> The RM crash when encoutering the following the stacktrace in the attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11668) Potential concurrent modification exception for node attributes of node manager

2024-03-28 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11668:
--
Affects Version/s: 3.4.0

> Potential concurrent modification exception for node attributes of node 
> manager
> ---
>
> Key: YARN-11668
> URL: https://issues.apache.org/jira/browse/YARN-11668
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
> Attachments: img_v3_029c_55ac6b50-64aa-4cbe-81a0-5f8d22c623fg.jpg
>
>
> The RM crash when encoutering the following the stacktrace in the attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11663) [Federation] Add Cache Entity Nums Limit.

2024-03-23 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11663:
--
Issue Type: Improvement  (was: Bug)

> [Federation] Add Cache Entity Nums Limit.
> -
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit？



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11663) [Federation] Add Cache Entity Nums Limit.

2024-03-23 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11663:
--
Summary: [Federation] Add Cache Entity Nums Limit.  (was: Router cache 
expansion issue)

> [Federation] Add Cache Entity Nums Limit.
> -
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit？



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-11387) [GPG] YARN GPG mistakenly deleted applicationid

2024-03-22 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830001#comment-17830001
 ] 

Shilun Fan edited comment on YARN-11387 at 3/22/24 11:08 PM:
-

I will resubmit PR to follow up on this issue.


was (Author: slfan1989):
I will resubmit PR to follow up on this issue.I will resubmit PR to follow up 
on this issue.

> [GPG] YARN GPG mistakenly deleted applicationid
> ---
>
> Key: YARN-11387
> URL: https://issues.apache.org/jira/browse/YARN-11387
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.2.1, 3.4.0
>Reporter: zhangjunj
>Assignee: Shilun Fan
>Priority: Major
>  Labels: federation, gpg, pull-request-available
> Attachments: YARN-11387-YARN-11387.v1.patch, 
> yarn-gpg-mistakenly-deleted-applicationid.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In [YARN-7599|https://issues.apache.org/jira/browse/YARN-7599], the 
> Federation can delete expired applicationid, but  YARN GPG uses getRouter() 
> method to obtain application information for multiple clusters. If there are 
> too many applicationids that more than 200,000 , it will not be possible to 
> pull all the applicationid information at one time, resulting in the 
> possibility of accidental deletion. The following error is reported for spark 
> component.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11387) [GPG] YARN GPG mistakenly deleted applicationid

2024-03-22 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830001#comment-17830001
 ] 

Shilun Fan commented on YARN-11387:
---

I will resubmit PR to follow up on this issue.I will resubmit PR to follow up 
on this issue.

> [GPG] YARN GPG mistakenly deleted applicationid
> ---
>
> Key: YARN-11387
> URL: https://issues.apache.org/jira/browse/YARN-11387
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.2.1, 3.4.0
>Reporter: zhangjunj
>Assignee: Shilun Fan
>Priority: Major
>  Labels: federation, gpg, pull-request-available
> Attachments: YARN-11387-YARN-11387.v1.patch, 
> yarn-gpg-mistakenly-deleted-applicationid.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In [YARN-7599|https://issues.apache.org/jira/browse/YARN-7599], the 
> Federation can delete expired applicationid, but  YARN GPG uses getRouter() 
> method to obtain application information for multiple clusters. If there are 
> too many applicationids that more than 200,000 , it will not be possible to 
> pull all the applicationid information at one time, resulting in the 
> possibility of accidental deletion. The following error is reported for spark 
> component.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11663) Router cache expansion issue

2024-03-14 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827300#comment-17827300
 ] 

Shilun Fan commented on YARN-11663:
---

[~luoyuan] Thank you for raising this question. how long the cache 
configuration time is set to? From monitoring, it appears that memory is being 
reclaimed.

> Router cache expansion issue
> 
>
> Key: YARN-11663
> URL: https://issues.apache.org/jira/browse/YARN-11663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, yarn
>Affects Versions: 3.4.0
>Reporter: Yuan Luo
>Priority: Major
> Attachments: image-2024-03-14-18-12-28-426.png, 
> image-2024-03-14-18-12-49-950.png
>
>
> !image-2024-03-14-18-12-28-426.png!
> !image-2024-03-14-18-12-49-950.png!
> hi [~slfan1989] After apply this feature to our prod env, I found the memory 
> of the router keeps growing over time. This is because after jobs finished, 
> we won't access the expired key to trigger cleanup mechanism. Is it better to 
> add cache maximum number limit？



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-11660) SingleConstraintAppPlacementAllocator performance regression

2024-03-14 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-11660.
---
Resolution: Fixed

> SingleConstraintAppPlacementAllocator performance regression
> 
>
> Key: YARN-11660
> URL: https://issues.apache.org/jira/browse/YARN-11660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 3.4.1
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11660) SingleConstraintAppPlacementAllocator performance regression

2024-03-14 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11660:
--
  Component/s: scheduler
Fix Version/s: 3.4.1
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.1
Affects Version/s: 3.4.1

> SingleConstraintAppPlacementAllocator performance regression
> 
>
> Key: YARN-11660
> URL: https://issues.apache.org/jira/browse/YARN-11660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 3.4.1
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11037) Add configurable logic to split resource request to least loaded SC

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11037:
--
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add configurable logic to split resource request to least loaded SC
> ---
>
> Key: YARN-11037
> URL: https://issues.apache.org/jira/browse/YARN-11037
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Add configurable logic to split resource request to least loaded subcluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11026) Make AppPlacementAllocator configurable in AppSchedulingInfo

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11026:
--
  Component/s: scheduler
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Make AppPlacementAllocator configurable in AppSchedulingInfo
> 
>
> Key: YARN-11026
> URL: https://issues.apache.org/jira/browse/YARN-11026
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: scheduler
>Affects Versions: 3.4.0
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10805) Replace Guava Lists usage by Hadoop's own Lists in hadoop-yarn-project

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10805:
--
  Component/s: yarn-common
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Replace Guava Lists usage by Hadoop's own Lists in hadoop-yarn-project
> --
>
> Key: YARN-10805
> URL: https://issues.apache.org/jira/browse/YARN-10805
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-common
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10750) TestMetricsInvariantChecker.testManyRuns is broken since HADOOP-17524

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10750:
--
  Component/s: test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> TestMetricsInvariantChecker.testManyRuns is broken since HADOOP-17524
> -
>
> Key: YARN-10750
> URL: https://issues.apache.org/jira/browse/YARN-10750
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Gergely Pollák
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10750.001.patch
>
>
> HADOOP-17524 removed the metrics:
>   LogFatal
>   LogError
>   LogWarn
>   LogInfo
> These needs to be reflected in the invariable list of the 
> TestMetricsInvariantChecker as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10746) RmWebApp add default-node-label-expression to the queue info

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10746:
--
  Component/s: resourcemanager
   webapp
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RmWebApp add default-node-label-expression to the queue info
> 
>
> Key: YARN-10746
> URL: https://issues.apache.org/jira/browse/YARN-10746
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: resourcemanager, webapp
>Affects Versions: 3.4.0
>Reporter: Gergely Pollák
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10746.001.patch, YARN-10746.002.patch, 
> YARN-10746.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10278) CapacityScheduler test framework ProportionalCapacityPreemptionPolicyMockFramework need some review

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10278:
--
  Component/s: capacity scheduler
   test
 Hadoop Flags: Reviewed
 Target Version/s: 3.2.3, 3.3.1, 3.4.0
Affects Version/s: 3.2.3
   3.3.1
   3.4.0

> CapacityScheduler test framework 
> ProportionalCapacityPreemptionPolicyMockFramework need some review
> ---
>
> Key: YARN-10278
> URL: https://issues.apache.org/jira/browse/YARN-10278
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler, test
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Gergely Pollák
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
> Attachments: YARN-10278.001.patch, YARN-10278.002.patch, 
> YARN-10278.002.patch, YARN-10278.002.patch, YARN-10278.branch-3.1.001.patch, 
> YARN-10278.branch-3.1.002.patch, YARN-10278.branch-3.1.003.patch, 
> YARN-10278.branch-3.2.001.patch, YARN-10278.branch-3.2.002.patch, 
> YARN-10278.branch-3.2.002.patch, YARN-10278.branch-3.3.001.patch
>
>
> This test framework class mocks a bit too heavily, and simulates CS internal 
> behaviour with the mock methods over a point it is reasonably maintainable, 
> any internal change in CS is a major headscratch.
> A lot of tests depend on this class, so we should approach it carefully, but 
> I think it's wroth to examine this class if it can be made a bit more 
> resilient to changes, and easier to maintain. Or at least document it better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10277) CapacityScheduler test TestUserGroupMappingPlacementRule should build proper hierarchy

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10277:
--
  Component/s: capacity scheduler
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> CapacityScheduler test TestUserGroupMappingPlacementRule should build proper 
> hierarchy
> --
>
> Key: YARN-10277
> URL: https://issues.apache.org/jira/browse/YARN-10277
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler
>Affects Versions: 3.4.0, 3.3.1
>Reporter: Gergely Pollák
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10277.001.patch, YARN-10277.002.patch, 
> YARN-10277.003.patch, YARN-10277.branch-3.3.001.patch
>
>
> Since the CapacityScheduler internal implementation depends more and more on 
> queue being hierarchical, the test gets really hard to maintain. A lot of 
> test cases were failing because they used non existing queues, but the older 
> placement rule solution ignored missing parents, but since the leaf queue 
> change in CS, we must be able to get a full path for any queue, since all 
> queues are referenced by their full path.
> This test should reflect this and instead of creating and expecting the 
> existance of fictional queues, it should create a proper queue hierarchy, 
> with a way to describe it better. 
> Currently we set up a bunch of mockito "when" statements to simulate the 
> queue behavior, but this is a hassle to maintain, and easy to miss a few 
> method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10279) Avoid unnecessary QueueMappingEntity creations

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10279:
--
  Component/s: resourcemanager
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> Avoid unnecessary QueueMappingEntity creations
> --
>
> Key: YARN-10279
> URL: https://issues.apache.org/jira/browse/YARN-10279
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: resourcemanager
>Affects Versions: 3.4.0, 3.3.1
>Reporter: Gergely Pollák
>Assignee: Hudáky Márton Gyula
>Priority: Minor
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10279.001.patch, YARN-10279.003.patch, 
> YARN-10279.004.patch, YARN-10279.005.patch, YARN-10279.006.patch
>
>
> In CS UserGroupMappingPlacementRule and AppNameMappingPlacementRule classes 
> we create new instances of QueueMappingEntity class. In some cases we simply 
> copy the already received class, so we just duplicate it, which is 
> unnecessary since the class is immutable.
> This is just a minor improvement, probably doesn't have much impact, but 
> still puts some unnecessary load on GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10281) Redundant QueuePath usage in UserGroupMappingPlacementRule and AppNameMappingPlacementRule

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10281:
--
  Component/s: capacity scheduler
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> Redundant QueuePath usage in UserGroupMappingPlacementRule and 
> AppNameMappingPlacementRule
> --
>
> Key: YARN-10281
> URL: https://issues.apache.org/jira/browse/YARN-10281
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: capacity scheduler
>Affects Versions: 3.4.0, 3.3.1
>Reporter: Gergely Pollák
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10281.001.patch, YARN-10281.002.patch, 
> YARN-10281.003.patch, YARN-10281.004.patch, YARN-10281.branch-3.3.001.patch
>
>
> We use the QueuePath and QueueMapping (or QueueMappingEntity) objects in the 
> aforementioned classes, but these technically store the same kind of 
> information, yet we keep converting between them, let's examine if we can use 
> only the QueueMapping(Entity) instead, since that holds more information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10746) RmWebApp add default-node-label-expression to the queue info

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned YARN-10746:
-

Assignee: Gergely Pollák

> RmWebApp add default-node-label-expression to the queue info
> 
>
> Key: YARN-10746
> URL: https://issues.apache.org/jira/browse/YARN-10746
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Gergely Pollák
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10746.001.patch, YARN-10746.002.patch, 
> YARN-10746.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10777) Bump node-sass from 4.13.0 to 4.14.1 in /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned YARN-10777:
-

Assignee: Wei-Chiu Chuang

> Bump node-sass from 4.13.0 to 4.14.1 in 
> /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp
> ---
>
> Key: YARN-10777
> URL: https://issues.apache.org/jira/browse/YARN-10777
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.4.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-1115) Provide optional means for a scheduler to check real user ACLs

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned YARN-1115:


Assignee: Eric Payne

> Provide optional means for a scheduler to check real user ACLs
> --
>
> Key: YARN-1115
> URL: https://issues.apache.org/jira/browse/YARN-1115
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler
>Affects Versions: 2.8.5
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
> Attachments: YARN-1115.001.patch, YARN-1115.002.patch, 
> YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-2.10.004.patch, 
> YARN-1115.branch-3.2.004.patch, YARN-1115.branch-3.3.004.patch
>
>
> In the framework for secure implementation using UserGroupInformation.doAs 
> (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html),
>  a trusted superuser can submit jobs on behalf of another user in a secure 
> way. In this framework, the superuser is referred to as the real user and the 
> proxied user is referred to as the effective user.
> Currently when a job is submitted as an effective user, the ACLs for the 
> effective user are checked against the queue on which the job is to be run. 
> Depending on an optional configuration, the scheduler should also check the 
> ACLs of the real user if the configuration to do so is set.
> For example, suppose my superuser name is super, and super is configured to 
> securely proxy as joe. Also suppose there is a Hadoop queue named ops which 
> only allows ACLs for super, not for joe.
> When super proxies to joe in order to submit a job to the ops queue, it will 
> fail because joe, as the effective user, does not have ACLs on the ops queue.
> In many cases this is what you want, in order to protect queues that joe 
> should not be using.
> However, there are times when super may need to proxy to many users, and the 
> client running as super just wants to use the ops queue because the ops queue 
> is already dedicated to the client's purpose, and, to keep the ops queue 
> dedicated to that purpose, super doesn't want to open up ACLs to joe in 
> general on the ops queue. Without this functionality, in this case, the 
> client running as super needs to figure out which queue each user has ACLs 
> opened up for, and then coordinate with other tasks using those queues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9048) Add znode hierarchy in Federation ZK State Store

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9048:
-
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add znode hierarchy in Federation ZK State Store
> 
>
> Key: YARN-9048
> URL: https://issues.apache.org/jira/browse/YARN-9048
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Bibin Chundatt
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Similar to YARN-2962 consider having hierarchy in ZK federation store for 
> applications



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8980) Mapreduce application container start fail after AM restart.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8980:
-
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Mapreduce application container start  fail after AM restart.
> -
>
> Key: YARN-8980
> URL: https://issues.apache.org/jira/browse/YARN-8980
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Bibin Chundatt
>Assignee: Chenyu Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> UAM to subclusters are always launched with keepContainers.
> On AM restart scenarios , UAM register again with RM . UAM receive running 
> containers with NMToken. NMToken received by UAM in 
> getPreviousAttemptContainersNMToken is never used by mapreduce application.  
> Federation Interceptor should take care of such scenarios too. Merge NMToken 
> received at registration to allocate response.
> Container allocation response on same node will have NMToken empty.
> issue credits : [~Nallasivan]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11090) [GPG] Support Secure Mode

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11090:
--
  Component/s: gpg
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [GPG] Support Secure Mode
> -
>
> Key: YARN-11090
> URL: https://issues.apache.org/jira/browse/YARN-11090
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: gpg
>Affects Versions: 3.4.0
>Reporter: tuyu
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-11090-YARN-7402.v1.patch, YARN-11090.001.patch
>
>
> GPG should support config keytab and principal to communication with router 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8898) Fix FederationInterceptor#allocate to set application priority in allocateResponse

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8898:
-
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix FederationInterceptor#allocate to set application priority in 
> allocateResponse
> --
>
> Key: YARN-8898
> URL: https://issues.apache.org/jira/browse/YARN-8898
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Bibin Chundatt
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-8898.wip.patch
>
>
> In case of FederationInterceptor#mergeAllocateResponses skips 
> application_priority in response returned



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7720) Race condition between second app attempt and UAM timeout when first attempt node is down

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-7720:
-
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Race condition between second app attempt and UAM timeout when first attempt 
> node is down
> -
>
> Key: YARN-7720
> URL: https://issues.apache.org/jira/browse/YARN-7720
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Botong Huang
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-7720.v1.patch, YARN-7720.v2.patch
>
>
> In Federation, multiple attempts of an application share the same UAM in each 
> secondary sub-cluster. When first attempt fails, we reply on the fact that 
> secondary RM won't kill the existing UAM before the AM heartbeat timeout 
> (default at 10 min). When second attempt comes up in the home sub-cluster, it 
> will pick up the UAM token from Yarn Registry and resume the UAM heartbeat to 
> secondary RMs. 
> The default heartbeat timeout for NM and AM are both 10 mins. The problem is 
> that when the first attempt node goes down or out of connection, only after 
> 10 mins will the home RM mark the first attempt as failed, and then schedule 
> the 2nd attempt in some other node. By then the UAMs in secondaries are 
> already timing out, and they might not survive until the second attempt comes 
> up. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5604) Add versioning for FederationStateStore

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5604:
-
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add versioning for FederationStateStore
> ---
>
> Key: YARN-5604
> URL: https://issues.apache.org/jira/browse/YARN-5604
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Subramaniam Krishnan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Currently we don't have versioning (null version) for the 
> FederationStateStore.This JIRA proposes add versioning support that is needed 
> to support upgrades.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8972:
-
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch, YARN-8972.v4.patch, YARN-8972.v5.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9049) Add application submit data to state store

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9049:
-
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add application submit data to state store
> --
>
> Key: YARN-9049
> URL: https://issues.apache.org/jira/browse/YARN-9049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Bibin Chundatt
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-9049.001.path
>
>
> As per the discussion in YARN-8898 we need to persist trimmend 
> ApplicationSubmissionContext details to federation State Store.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10946) AbstractCSQueue: Create separate class for constructing Queue API objects

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10946:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> AbstractCSQueue: Create separate class for constructing Queue API objects
> -
>
> Key: YARN-10946
> URL: https://issues.apache.org/jira/browse/YARN-10946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Peter Szucs
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Relevant methods are: 
> - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueConfigurations
> - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueInfo
> - 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueStatistics



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6667:
-
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Handle containerId duplicate without failing the heartbeat in Federation 
> Interceptor
> 
>
> Key: YARN-6667
> URL: https://issues.apache.org/jira/browse/YARN-6667
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Botong Huang
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> From the actual situation, the probability of this happening is very low. 
> It can only be caused by the master-slave fail-hover of YARN and the wrong 
> Epoch parameter configuration.
> We will try to be compatible with this situation and let the Application run 
> as much as possible, using the following measures:
> 1. Select a node whose heartbeat does not time out for allocation, and at the 
> same time require the node to be in the RUNNING state.
> 2. If the heartbeat of both RMs does not time out, and both are in the 
> RUNNING state, select the previously allocated RM for Container processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8482) [Router] Add cache for fast answers to getApps

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8482:
-
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [Router] Add cache for fast answers to getApps
> --
>
> Key: YARN-8482
> URL: https://issues.apache.org/jira/browse/YARN-8482
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6539:
-
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, 
> YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, 
> YARN-6539.006.patch, YARN-6539.007.patch, YARN-6539.008.patch, 
> YARN-6539_3.patch, YARN-6539_4.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6972) Adding RM ClusterId in AppInfo

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6972:
-
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Adding RM ClusterId in AppInfo
> --
>
> Key: YARN-6972
> URL: https://issues.apache.org/jira/browse/YARN-6972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Tanuj Nayak
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-6972.001.patch, YARN-6972.002.patch, 
> YARN-6972.003.patch, YARN-6972.004.patch, YARN-6972.005.patch, 
> YARN-6972.006.patch, YARN-6972.007.patch, YARN-6972.008.patch, 
> YARN-6972.009.patch, YARN-6972.010.patch, YARN-6972.011.patch, 
> YARN-6972.012.patch, YARN-6972.013.patch, YARN-6972.014.patch, 
> YARN-6972.015.patch, YARN-6972.016.patch, YARN-6972.016.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10793) Upgrade Junit from 4 to 5 in hadoop-yarn-server-applicationhistoryservice

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10793:
--
  Component/s: test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Upgrade Junit from 4 to 5 in hadoop-yarn-server-applicationhistoryservice
> -
>
> Key: YARN-10793
> URL: https://issues.apache.org/jira/browse/YARN-10793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: ANANDA G B
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Upgrade Junit from 4 to 5 in hadoop-yarn-server-applicationhistoryservice



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8973) [Router] Add missing methods in RMWebProtocol

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-8973:
-
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [Router] Add missing methods in RMWebProtocol
> -
>
> Key: YARN-8973
> URL: https://issues.apache.org/jira/browse/YARN-8973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Giovanni Matteo Fumarola
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-8973.v1.patch, YARN-8973.v2.patch, 
> YARN-8973.v3.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10883) [Router] Router Audit Log Add Client IP Address.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10883:
--
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> [Router] Router Audit Log Add Client IP Address.
> 
>
> Key: YARN-10883
> URL: https://issues.apache.org/jira/browse/YARN-10883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: chaosju
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> the Router should record the client address which killed the application
> Now the log information is printed as follows:
> {code:java}
> 2022-06-10 08:06:26,322 INFO  [main] router.RouterAuditLogger 
> (RouterAuditLogger.java:logSuccess(89)) - USER=test-user    OPERATION=Submit 
> New App    TARGET=RouterClientRMService    RESULT=SUCCESS    
> APPID=application_1654873569440_0001    SUBCLUSTERID=2{code}
> The log of adding IP information is as follows:
> {code:java}
> 2022-06-10 08:09:05,392 INFO  [main] router.RouterAuditLogger 
> (RouterAuditLogger.java:logSuccess(89)) - USER=test-user    IP=127.0.0.1    
> OPERATION=Submit New App    TARGET=RouterClientRMService    RESULT=SUCCESS    
> APPID=application_1654873732359_0001    SUBCLUSTERID=3 {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10487) Support getQueueUserAcls, listReservations, getApplicationAttempts, getContainerReport, getContainers, getResourceTypeInfo API's for Federation

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10487:
--
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Support getQueueUserAcls, listReservations, getApplicationAttempts, 
> getContainerReport, getContainers, getResourceTypeInfo API's for Federation
> ---
>
> Key: YARN-10487
> URL: https://issues.apache.org/jira/browse/YARN-10487
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: D M Murali Krishna Reddy
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10487.001.patch
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Support getQueueUserAcls, listReservations, getApplicationAttempts, 
> getContainerReport, getContainers, getResourceTypeInfo API's for Federation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10565) Follow-up to YARN-10504

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10565:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Follow-up to YARN-10504
> ---
>
> Key: YARN-10565
> URL: https://issues.apache.org/jira/browse/YARN-10565
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In YARN-10504 weight mode support was introduced to CS. This jira is a 
> followup to simplify and restructure the initialization, so that the weight 
> calculation/absolute/percentage mode is easier to understand and modify.
> To be refactored:
> * In ParentQueue.java#1099 the error message should be more specific, instead 
> of the {{LOG.error("Fatal issue found: e", e);}}
> * -AutoCreatedLeafQueue.clearConfigurableFields should clear 
> NORMALIZED_WEIGHT just to be on the safe side-
> * -Uncomment the commented assertions in 
> TestCapacitySchedulerAutoCreatedQueueBase.validateEffectiveMinResource-
> * -Check whether the assertion modification in TestRMWebServices is 
> absolutely necessary or could be hiding a bug.-
> * -Same for TestRMWebServicesForCSWithPartitions.java-
> Additional information:
> The original flow was modified to allow the dynamic weight-capacity 
> calculation. 
> This resulted in a new flow, which is now harder to understand.
> With a cleanup it could be made simpler, the duplicate calculations could be 
> avoided. 
> The changed functionality should either be explained (if deemed correct) or 
> fixed (see YARN-10590).
> Investigate how the CS reinit works, it could contain some possibly redundant 
> initialization code fragments.
> Note: Since most of the items were completed in other refactor items, only 
> the first one is being patched here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11036) Do not inherit from TestRMWebServicesCapacitySched

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11036:
--
  Component/s: capacity scheduler
   test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Do not inherit from TestRMWebServicesCapacitySched
> --
>
> Key: YARN-11036
> URL: https://issues.apache.org/jira/browse/YARN-11036
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, test
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code:java}
> public class TestRMWebServicesSchedulerActivities
> extends TestRMWebServicesCapacitySched { {code}
> This is a bad practice, the TestRMWebServicesCapacitySched's tests run 2 
> times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10049) FIFOOrderingPolicy Improvements

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10049:
--
  Component/s: scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 3.4.0
>Reporter: Manikandan R
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10049.001.patch, YARN-10049.002.patch, 
> YARN-10049.003.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10918) Simplify method: CapacitySchedulerQueueManager#parseQueue

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10918:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Simplify method: CapacitySchedulerQueueManager#parseQueue
> -
>
> Key: YARN-10918
> URL: https://issues.apache.org/jira/browse/YARN-10918
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Ideas for simplifying this method:
> - Define a queue factory
> - Separate validation logic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10945) Add javadoc to all methods of AbstractCSQueue

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10945:
--
  Component/s: capacity scheduler
   documentation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add javadoc to all methods of AbstractCSQueue
> -
>
> Key: YARN-10945
> URL: https://issues.apache.org/jira/browse/YARN-10945
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, documentation
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: András Győri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10947) Simplify AbstractCSQueue#initializeQueueState

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10947:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Simplify AbstractCSQueue#initializeQueueState
> -
>
> Key: YARN-10947
> URL: https://issues.apache.org/jira/browse/YARN-10947
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10995) Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10995:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy
> -
>
> Key: YARN-10995
> URL: https://issues.apache.org/jira/browse/YARN-10995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> GuaranteedOrZeroCapacityOverTimePolicy has a comparator class that orders 
> applications by their submit time. It gets the applications from the 
> RMContext and doesn't need any data from 
> GuaranteedOrZeroCapacityOverTimePolicy class, so this easily could be moved 
> to RMContext, so that the reference to the RMContext/SchedulerContext could 
> be removed from GuaranteedOrZeroCapacityOverTimePolicy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10944) AbstractCSQueue: Eliminate code duplication in overloaded versions of setMaxCapacity

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10944:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> AbstractCSQueue: Eliminate code duplication in overloaded versions of 
> setMaxCapacity
> 
>
> Key: YARN-10944
> URL: https://issues.apache.org/jira/browse/YARN-10944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Methods are:
> - AbstractCSQueue#setMaxCapacity(float)
> - AbstractCSQueue#setMaxCapacity(java.lang.String, float)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10963) Split TestCapacityScheduler by test categories

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10963:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Split TestCapacityScheduler by test categories
> --
>
> Key: YARN-10963
> URL: https://issues.apache.org/jira/browse/YARN-10963
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Tests in the TestCapacityScheduler can be categorised and split into multiple 
> test file, e.g.:
>  - refresh related tests
>  - app related tests (move, etc)
>  - node handling related tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11034) Add enhanced headroom in AllocateResponse

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11034:
--
  Component/s: federation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add enhanced headroom in AllocateResponse
> -
>
> Key: YARN-11034
> URL: https://issues.apache.org/jira/browse/YARN-11034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Add enhanced headroom in allocate response. This provides a channel for RMs 
> to return load information for AMRMProxy and decision making when rerouting 
> resource requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10632) Make auto queue creation maximum allowed depth configurable

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10632:
--
  Component/s: capacity scheduler
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Make auto queue creation maximum allowed depth configurable
> ---
>
> Key: YARN-10632
> URL: https://issues.apache.org/jira/browse/YARN-10632
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Qi Zhu
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10632.001.patch, YARN-10632.002.patch, 
> YARN-10632.003.patch, YARN-10632.004.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Now the max depth allowed are fixed to 2. But i think this should be 
> configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10907) Minimize usages of AbstractCSQueue#csContext

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10907:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Minimize usages of AbstractCSQueue#csContext
> 
>
> Key: YARN-10907
> URL: https://issues.apache.org/jira/browse/YARN-10907
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Context objects can be a sign of a code smell as they can contain many, 
> possible loosely related references to other objects.
> CapacitySchedulerContext seems like this.
> This task is to investigate how the field AbstractCSQueue#csContext is being 
> used from this class and possibly keeping the usage of this context class on 
> the bare minimum. 
> Related article: https://wiki.c2.com/?ContextObjectsAreEvil



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10929) Do not use a separate config in legacy CS AQC

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10929:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Do not use a separate config in legacy CS AQC
> -
>
> Key: YARN-10929
> URL: https://issues.apache.org/jira/browse/YARN-10929
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> AbstractManagedParentQueue#initializeLeafQueueConfigs creates a new 
> CapacitySchedulerConfiguration with templated configs only. We should stop 
> doing this. 
> Also, there is a sorting of config keys in this method, but in the end the 
> configs are added to the Configuration object which is an enhanced Map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11043) Clean up checkstyle warnings from YARN-11024/10907/10929

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11043:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Clean up checkstyle warnings from YARN-11024/10907/10929
> 
>
> Key: YARN-11043
> URL: https://issues.apache.org/jira/browse/YARN-11043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: checkstyle_warnings.txt
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> YARN-11024, YARN-10907, YARN-10929 are consecutive changes built on top of 
> each other. This jira is a followup to clean up the checkstyle warnings 
> present in the modified files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11031) Improve the maintainability of RM webapp tests like TestRMWebServicesCapacitySched

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11031:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Improve the maintainability of RM webapp tests like 
> TestRMWebServicesCapacitySched
> --
>
> Key: YARN-11031
> URL: https://issues.apache.org/jira/browse/YARN-11031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It's hard to maintain the asserts in TestRMWebServicesCapacitySched, 
> TestRMWebServicesCapacitySchedDynamicConfig test classes when the scheduler 
> response is modified. Currently only a subset of the scheduler response is 
> asserted in these tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11024) Create an AbstractLeafQueue to store the common LeafQueue + AutoCreatedLeafQueue functionality

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11024:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Create an AbstractLeafQueue to store the common LeafQueue + 
> AutoCreatedLeafQueue functionality
> --
>
> Key: YARN-11024
> URL: https://issues.apache.org/jira/browse/YARN-11024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> AbstractAutoCreatedLeafQueue extends the LeafQueue class which is an 
> instantiable class, so every time an AutoCreatedLeafQueue is created a normal 
> LeafQueue is configured as well. This setup results in some strange behaviour 
> like having to pass the template configs of an auto created queue to a leaf 
> queue. To make the whole structure more flexible an AbstractLeafQueue should 
> be created which stores the common methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11003) Make RMNode aware of all (OContainer inclusive) allocated resources

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-11003:
--
  Component/s: container
   resourcemanager
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Make RMNode aware of all (OContainer inclusive) allocated resources
> ---
>
> Key: YARN-11003
> URL: https://issues.apache.org/jira/browse/YARN-11003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: container, resourcemanager
>Affects Versions: 3.4.0
>Reporter: Andrew Chung
>Assignee: Andrew Chung
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In order to facilitate resource-aware Opportunistic container allocation, we 
> will need to pass allocated container information to {{ClusterNode}}, which 
> in turn gets its information from {{RMNode}}.
> However, {{RMNode}} currently only holds containers and node utilization 
> based on the actual physical resource utilization, not at the allocated 
> container level.
> This sub-task aims to allow {{RMNode}} to be aware of all allocated resources 
> on the node upon a node heartbeat.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10909) AbstractCSQueue: Annotate all methods with VisibleForTesting that are only used by test code

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10909:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> AbstractCSQueue: Annotate all methods with VisibleForTesting that are only 
> used by test code
> 
>
> Key: YARN-10909
> URL: https://issues.apache.org/jira/browse/YARN-10909
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> For example, AbstractCSQueue#setMaxCapacity(float) is only used for testing, 
> but not annotated. There can be other methods in this class like this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10998) Add YARN_ROUTER_HEAPSIZE to yarn-env for routers

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10998:
--
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add YARN_ROUTER_HEAPSIZE to yarn-env for routers
> 
>
> Key: YARN-10998
> URL: https://issues.apache.org/jira/browse/YARN-10998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Yarn services NM, RM etc have YARN_\{SERVICENAME}_HEAPSIZE variable defined, 
> we should have similar parameter for Router Service also.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10948) Rename SchedulerQueue#activeQueue to activateQueue

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10948:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Rename SchedulerQueue#activeQueue to activateQueue
> --
>
> Key: YARN-10948
> URL: https://issues.apache.org/jira/browse/YARN-10948
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Adam Antal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10958) Use correct configuration for Group service init in CSMappingPlacementRule

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10958:
--
  Component/s: capacity scheduler
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Use correct configuration for Group service init in CSMappingPlacementRule
> --
>
> Key: YARN-10958
> URL: https://issues.apache.org/jira/browse/YARN-10958
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Peter Bacsko
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> There is a potential problem in {{CSMappingPlacementRule.java}}:
> {noformat}
> if (groups == null) {
>   groups = Groups.getUserToGroupsMappingService(conf);
> }
> {noformat}
> The problem is, we're supposed to pass {{scheduler.getConf()}}. The "conf" 
> object is the config for capacity scheduler, which does not include the 
> property which selects the group service provider. Therefore, the current 
> code just works by chance, because Group mapping service is already 
> initialized at this point. See the original fix in YARN-10053.
> Also, need a unit test to verify it.
> Idea:
>  # Create a Configuration object in which the property 
> "hadoop.security.group.mapping" refers to an existing a test implementation.
>  # Add a new method to {{Groups}} which nulls out the singleton instance, eg. 
> {{Groups.reset()}}.
>  # Create a mock CapacityScheduler where {{getConf()}} and 
> {{getConfiguration()}} contain different settings for 
> "hadoop.security.group.mapping". Since {{getConf()}} is the service config, 
> this should return the config object created in step #1.
>  # Create an instance of {{CSMappingPlacementRule}} with a single primary 
> group rule.
>  # Run the placement evaluation.
>  # Expected: returned queue matches what is supposed to be coming from the 
> test group mapping service ("testuser" --> "testqueue").
>  # Modify "hadoop.security.group.mapping" in the config object created in 
> step #1.
>  # Call {{Groups.refresh()}} which changes the group mapping ("testuser" --> 
> "testqueue2"). This requires that the test group mapping service implement 
> {{GroupMappingServiceProvider.cacheGroupsRefresh()}}.
>  # Create a new instance of {{CSMappingPlacementRule}}.
>  # Run the placement evaluation again
>  # Expected: with the same user, the target queue has changed.
> This looks convoluted, but these steps make sure that:
>  # {{CSMappingPlacementRule}} will force the initialization of groups.
>  # We select the correct configuration for group service init.
>  # We don't create a new {{Groups}} instance if the singleton is initialized, 
> so we cover the original problem described in YARN-10597.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10954) Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10954:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf
> 
>
> Key: YARN-10954
> URL: https://issues.apache.org/jira/browse/YARN-10954
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10954) Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned YARN-10954:
-

Assignee: Andras Gyori

> Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf
> 
>
> Key: YARN-10954
> URL: https://issues.apache.org/jira/browse/YARN-10954
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10957) Use invokeConcurrent Overload with Collection in getClusterMetrics

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10957:
--
  Component/s: federation
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Use invokeConcurrent Overload with Collection in getClusterMetrics
> --
>
> Key: YARN-10957
> URL: https://issues.apache.org/jira/browse/YARN-10957
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Akshat Bordia
>Assignee: Akshat Bordia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In [PR #3135|https://github.com/apache/hadoop/pull/3135], we added a new 
> overload of invokeConcurrent to avoid ArrayList initialization at multiple 
> places. Update the same in getClusterMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10913) AbstractCSQueue: Group preemption methods and fields into a separate class

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10913:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> AbstractCSQueue: Group preemption methods and fields into a separate class 
> ---
>
> Key: YARN-10913
> URL: https://issues.apache.org/jira/browse/YARN-10913
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Relevant methods: isQueueHierarchyPreemptionDisabled, 
> isIntraQueueHierarchyPreemptionDisabled, getTotalKillableResource, 
> getKillableContainers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10917) Investigate and simplify CapacitySchedulerConfigValidator#validateQueueHierarchy

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10917:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Investigate and simplify 
> CapacitySchedulerConfigValidator#validateQueueHierarchy
> 
>
> Key: YARN-10917
> URL: https://issues.apache.org/jira/browse/YARN-10917
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10950) Code cleanup in QueueCapacities

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10950:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Code cleanup in QueueCapacities
> ---
>
> Key: YARN-10950
> URL: https://issues.apache.org/jira/browse/YARN-10950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Adam Antal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> - Make fields final: capacitiesMap, readLock, writeLock
> - Remove explicit type arguments, e.g. new HashMap();
> - Remove abbrevations and avoid string concatenation in 
> QueueCapacities.Capacities#toString
> - Remove unnecessary comments, e.g. "/* Used Capacity Getter and Setter */" & 
> "/* Absolute Used Capacity Getter and Setter */"
> - And probably many more..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10915) AbstractCSQueue: Simplify complex logic in methods: deriveCapacityFromAbsoluteConfigurations and updateEffectiveResources

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10915:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> AbstractCSQueue: Simplify complex logic in methods: 
> deriveCapacityFromAbsoluteConfigurations and updateEffectiveResources
> -
>
> Key: YARN-10915
> URL: https://issues.apache.org/jira/browse/YARN-10915
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10912) AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation logic from initialization logic

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10912:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation 
> logic from initialization logic
> --
>
> Key: YARN-10912
> URL: https://issues.apache.org/jira/browse/YARN-10912
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> AbstractCSQueue#updateConfigurableResourceRequirement contains initialization 
> + validation logic. The task is to factor out validation logic from this 
> method to a separate method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10910) AbstractCSQueue#setupQueueConfigs: Separate validation logic from initialization logic

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10910:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> AbstractCSQueue#setupQueueConfigs: Separate validation logic from 
> initialization logic
> --
>
> Key: YARN-10910
> URL: https://issues.apache.org/jira/browse/YARN-10910
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> AbstractCSQueue#setupQueueConfigs contains initialization + validation logic. 
> The task is to factor out validation logic from this method to a separate 
> method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10914) Simplify duplicated code for tracking ResourceUsage in AbstractCSQueue

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10914:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Simplify duplicated code for tracking ResourceUsage in AbstractCSQueue
> --
>
> Key: YARN-10914
> URL: https://issues.apache.org/jira/browse/YARN-10914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Alternatively, those could be moved to some computation class, too.
> Relevant methods: 
> incReservedResource, decReservedResource, incPendingResource, 
> decPendingResource, incUsedResource, decUsedResource



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10893) Add metrics for getClusterMetrics and getApplications APIs in FederationClientInterceptor

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10893:
--
  Component/s: federation
   metrics
   router
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add metrics for getClusterMetrics and getApplications APIs in 
> FederationClientInterceptor
> -
>
> Key: YARN-10893
> URL: https://issues.apache.org/jira/browse/YARN-10893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, metrics, router
>Affects Versions: 3.4.0
>Reporter: Akshat Bordia
>Assignee: Akshat Bordia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently getClusterMetrics and getApplications APIs in 
> FederationClientInterceptor do not have metrics being recorded. Need to add 
> the metrics for the latency, successful and failed attempt counts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10522) Document for Flexible Auto Queue Creation in Capacity Scheduler

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10522:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Document for Flexible Auto Queue Creation in Capacity Scheduler
> ---
>
> Key: YARN-10522
> URL: https://issues.apache.org/jira/browse/YARN-10522
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Qi Zhu
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10522.001.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We should update document to support this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10576) Update Capacity Scheduler documentation with JSON-based placement mapping

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10576:
--
  Component/s: capacity scheduler
   documentation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Update Capacity Scheduler documentation with JSON-based placement mapping
> -
>
> Key: YARN-10576
> URL: https://issues.apache.org/jira/browse/YARN-10576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, documentation
>Affects Versions: 3.4.0
>Reporter: Peter Bacsko
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10576-001.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The weight mode and AQC also affects how the new placement engine in CS works 
> and the documentation has to reflect that.
> Certain statements in the documentation are no longer valid, for example:
> * create flag: "Only applies to managed queue parents" - there is no 
> ManagedParentQueue in weight mode.
> * "The nested rules primaryGroupUser and secondaryGroupUser expects the 
> parent queues to exist, ie. they cannot be created automatically". This only 
> applies to the legacy absolute/percentage mode.
> Find all statements that mentions possible limitations and fix them if 
> necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10919) Remove LeafQueue#scheduler field

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10919:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove LeafQueue#scheduler field 
> -
>
> Key: YARN-10919
> URL: https://issues.apache.org/jira/browse/YARN-10919
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Szilard Nemeth
>Assignee: Benjamin Teke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As it is the same object as AbstractCSQueue#csContext (from parent class).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10838) Implement an optimised version of Configuration getPropsWithPrefix

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10838:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Implement an optimised version of Configuration getPropsWithPrefix
> --
>
> Key: YARN-10838
> URL: https://issues.apache.org/jira/browse/YARN-10838
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10838.001.patch, YARN-10838.002.patch, 
> YARN-10838.003.patch, YARN-10838.004.patch, YARN-10838.005.patch
>
>
> AutoCreatedQueueTemplate also has multiple call to 
> Configuration#getPropsWithPrefix. It must be eliminated in order to improve 
> the performance on reinitialisation. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10790) CS Flexible AQC: Add separate parent and leaf template property.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10790:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> CS Flexible AQC: Add separate parent and leaf template property.
> 
>
> Key: YARN-10790
> URL: https://issues.apache.org/jira/browse/YARN-10790
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10790.001.patch
>
>
> There are certain properties that makes sense only in Parent/Leaf context 
> (eg. ordering-policy). We need a way to limit the inheritance scope for the 
> new auto queue creation templates. The proposal is to have the following 
> template:
>  * auto-queue-creation-v2.template -> child ParentQueues and child LeafQueues 
> inherit this
>  * auto-queue-creation-v2.leaf-template -> only child LeafQueues inherit this
>  * auto-queue-creation-v2.parent-template -> only ParentQueues inherit this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10841) Fix token reset synchronization for UAM response token

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10841:
--
  Component/s: federation
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix token reset synchronization for UAM response token
> --
>
> Key: YARN-10841
> URL: https://issues.apache.org/jira/browse/YARN-10841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Affects Versions: 3.4.0
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: YARN-10841.v1.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *2021-06-24T10:11:39,465* [ERROR] [AMRM Heartbeater thread] 
> |impl.AMRMClientAsyncImpl|: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: amrmToken from UAM 
> cluster-0 should be null here
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor.allocate(FederationInterceptor.java:782)
>  
>  
> *2021-06-24T10:10:12,608* INFO  [616916] FederationInterceptor: Received new 
> UAM amrmToken with keyId 843616604 
> Hearbeatcallback sets token to null. But because of synchroniztion issue, it 
> happened after mergeAllocate is called. So, while allocate merge is happening 
> the value should get set to null and should have happened Inside lock



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10727) ParentQueue does not validate the queue on removal

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10727:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> ParentQueue does not validate the queue on removal
> --
>
> Key: YARN-10727
> URL: https://issues.apache.org/jira/browse/YARN-10727
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10727.001.patch
>
>
> With the addition of YARN-10532 ParentQueue has a public method, removeQueue, 
> which allows the deletion of a queue at runtime. However, there is no 
> validation regarding the queue which is to be removed, therefore it is 
> possible to remove a queue from the CSQueueManager that is not a child of the 
> ParentQueue. Since it is a public method, there must be validations such as:
>  * check, if the parent of the queue to be removed is the current ParentQueue
>  * check, if the parent actually contains the queue in its childQueues 
> collection



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10829) Support getApplications API in FederationClientInterceptor

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10829:
--
  Component/s: federation
   router
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Support getApplications API in FederationClientInterceptor
> --
>
> Key: YARN-10829
> URL: https://issues.apache.org/jira/browse/YARN-10829
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Affects Versions: 3.4.0
>Reporter: Akshat Bordia
>Assignee: Akshat Bordia
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Currently getApplications API is not supported in FederationClientInterceptor 
> and needs to be implemented in FederationClientInterceptor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10801) Fix Auto Queue template to properly set all configuration properties

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10801:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix Auto Queue template to properly set all configuration properties
> 
>
> Key: YARN-10801
> URL: https://issues.apache.org/jira/browse/YARN-10801
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10801.001.patch, YARN-10801.002.patch, 
> YARN-10801.003.patch, YARN-10801.004.patch, YARN-10801.005.patch, 
> YARN-10801.006.patch
>
>
> Currently Auto Queue templates set configuration properties only on 
> Configuration object passed in the constructor. Due to the fact, that a lot 
> of configuration values are ready from the Configuration object in csContext, 
> template properties are not set in every cases. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10780) Optimise retrieval of configured node labels in CS queues

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10780:
--
  Component/s: capacity scheduler
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Optimise retrieval of configured node labels in CS queues
> -
>
> Key: YARN-10780
> URL: https://issues.apache.org/jira/browse/YARN-10780
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10780.001.patch, YARN-10780.002.patch, 
> YARN-10780.003.patch, YARN-10780.004.patch, YARN-10780.005.patch
>
>
> CapacitySchedulerConfiguration#getConfiguredNodeLabels scales poorly with 
> respect to queue numbers (its O(n*m), where n is the number of queues and m 
> is the number of properties set by each queue). During CS reinit, the node 
> labels are often queried, however looking at the code:
> {code:java}
> for (Entry stringStringEntry : this) {
>   e = stringStringEntry;
>   String key = e.getKey();
>   if (key.startsWith(getQueuePrefix(queuePath) + ACCESSIBLE_NODE_LABELS
>   + DOT)) {
> // Find  in
> // .accessible-node-labels..property
> int labelStartIdx =
> key.indexOf(ACCESSIBLE_NODE_LABELS)
> + ACCESSIBLE_NODE_LABELS.length() + 1;
> int labelEndIndx = key.indexOf('.', labelStartIdx);
> String labelName = key.substring(labelStartIdx, labelEndIndx);
> configuredNodeLabels.add(labelName);
>   }
> }
> {code}
>  This method iterates through ALL properties set in the configuration. For 
> example in case of initialising 2500 queues, each having at least 2 
> properties:
> 2500 * 5000 ~= over 12 million iteration + additional properties
> There are some ways to resolve this issue while keeping backward 
> compatibility:
>  # Create a property like the original accessible-node-labels, which contains 
> predefined labels. If it is set, then getConfiguredNodeLabels get the value 
> of this property, otherwise it falls back to the old logic. I think 
> accessible-node-labels are not used for this purpose (though I have a feeling 
> that it should have been).
>  # Collect node labels for all queues at the beginning of parseQueue and only 
> iterate through the properties once. This will increase the space complexity 
> in exchange of not requiring intervention from user's perspective. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10807) Parents node labels are incorrectly added to child queues in weight mode

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10807:
--
  Component/s: capacity scheduler
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Parents node labels are incorrectly added to child queues in weight mode 
> -
>
> Key: YARN-10807
> URL: https://issues.apache.org/jira/browse/YARN-10807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10807.001.patch, YARN-10807.002.patch
>
>
> In ParentQueue.updateClusterResource when calculating the normalized weights 
> CS will iterate through the parent's nodelabels. 
> If the parent has a node label that a specific child doesn't, it will 
> incorrectly add it to the child's node label list through the 
> queueCapacities.setNormalizedWeights(label, weight) call:
> {code:java}
> // Normalize weight of children
>   if (getCapacityConfigurationTypeForQueues(childQueues)
>   == QueueCapacityType.WEIGHT) {
> for (String nodeLabel : queueCapacities.getExistingNodeLabels()) {
>   float sumOfWeight = 0;
>   for (CSQueue queue : childQueues) {
> float weight = Math.max(0,
> queue.getQueueCapacities().getWeight(nodeLabel));
> sumOfWeight += weight;
>   }
>   // When sum of weight == 0, skip setting normalized_weight (so
>   // normalized weight will be 0).
>   if (Math.abs(sumOfWeight) > 1e-6) {
> for (CSQueue queue : childQueues) {
> queue.getQueueCapacities().setNormalizedWeight(nodeLabel,
> queue.getQueueCapacities().getWeight(nodeLabel) / 
> sumOfWeight);
> }
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10771) Add cluster metric for size of SchedulerEventQueue and RMEventQueue

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10771:
--
  Component/s: metrics
   resourcemanager
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add cluster metric for size of SchedulerEventQueue and RMEventQueue
> ---
>
> Key: YARN-10771
> URL: https://issues.apache.org/jira/browse/YARN-10771
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: metrics, resourcemanager
>Affects Versions: 3.4.0
>Reporter: chaosju
>Assignee: chaosju
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10763.001.patch, YARN-10771.002.patch, 
> YARN-10771.003.patch, YARN-10771.004.patch, YARN-10771.005.patch
>
>
> Add cluster metric for size of Scheduler event queue and RM event queue, This 
> lets us know the load of the RM and convenient monitoring the metrics.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10783) Allow definition of auto queue template properties in root

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10783:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Allow definition of auto queue template properties in root
> --
>
> Key: YARN-10783
> URL: https://issues.apache.org/jira/browse/YARN-10783
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10783.001.patch
>
>
> YARN-10564 introduced template properties set on auto queue creation eligible 
> queues, however root does not take it into consideration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10571) Refactor dynamic queue handling logic

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10571:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Refactor dynamic queue handling logic
> -
>
> Key: YARN-10571
> URL: https://issues.apache.org/jira/browse/YARN-10571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: YARN-10571.001.patch, YARN-10571.002.patch, 
> YARN-10571.003.patch, YARN-10571.004.patch
>
>
> As per YARN-10506 we have introduced an other mode for auto queue creation 
> and a new class, which handles it. We should move the old, managed queue 
> related logic to CSAutoQueueHandler as well, and do additional cleanup 
> regarding queue management.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9615:
-
  Component/s: metrics
   resourcemanager
 Target Version/s: 2.10.2, 3.3.1, 3.4.0  (was: 2.10.2)
Affects Version/s: 3.3.1
   3.4.0

> Add dispatcher metrics to RM
> 
>
> Key: YARN-9615
> URL: https://issues.apache.org/jira/browse/YARN-9615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: metrics, resourcemanager
>Affects Versions: 3.4.0, 3.3.1
>Reporter: Jonathan Hung
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-9615-branch-3.3-001.patch, YARN-9615.001.patch, 
> YARN-9615.002.patch, YARN-9615.003.patch, YARN-9615.004.patch, 
> YARN-9615.005.patch, YARN-9615.006.patch, YARN-9615.007.patch, 
> YARN-9615.008.patch, YARN-9615.009.patch, YARN-9615.010.patch, 
> YARN-9615.011.patch, YARN-9615.011.patch, YARN-9615.poc.patch, 
> image-2021-03-04-10-35-10-626.png, image-2021-03-04-10-36-12-441.png, 
> screenshot-1.png
>
>
> It'd be good to have counts/processing times for each event type in RM async 
> dispatcher and scheduler async dispatcher.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10637) fs2cs: add queue autorefresh policy during conversion

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10637:
--
  Component/s: fairscheduler
   fs-cs
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> fs2cs: add queue autorefresh policy during conversion
> -
>
> Key: YARN-10637
> URL: https://issues.apache.org/jira/browse/YARN-10637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler, fs-cs
>Affects Versions: 3.4.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: fs2cs
> Fix For: 3.4.0
>
> Attachments: YARN-10637.001.patch, YARN-10637.002.patch, 
> YARN-10637.003.patch, YARN-10637.004.patch
>
>
> cc [~pbacsko] [~gandras] [~bteke]
> We should also fill this, when  YARN-10623 finished.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10654) Dots '.' in CSMappingRule path variables should be replaced

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10654:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Dots '.' in CSMappingRule path variables should be replaced
> ---
>
> Key: YARN-10654
> URL: https://issues.apache.org/jira/browse/YARN-10654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Gergely Pollák
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10654-001.patch
>
>
> Dots are used as separators, so we should escape them somehow in the 
> variables when substituting them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10564) Support Auto Queue Creation template configurations

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10564:
--
  Component/s: capacity scheduler
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Support Auto Queue Creation template configurations
> ---
>
> Key: YARN-10564
> URL: https://issues.apache.org/jira/browse/YARN-10564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10564.001.patch, YARN-10564.002.patch, 
> YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, 
> YARN-10564.006.patch, YARN-10564.poc.001.patch
>
>
> Similar to how the template configuration works for ManagedParents, we need 
> to support templates for the new auto queue creation logic. Proposition is to 
> allow wildcards in template configs such as:
> {noformat}
> yarn.scheduler.capacity.root.*.*.weight 10{noformat}
> which would mean, that set weight to 10 of every leaf of every parent under 
> root.
> We should possibly take an approach, that could support arbitrary depth of 
> template configuration, because we might need to lift the limitation of auto 
> queue nesting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10714) Remove dangling dynamic queues on reinitialization

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10714:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove dangling dynamic queues on reinitialization
> --
>
> Key: YARN-10714
> URL: https://issues.apache.org/jira/browse/YARN-10714
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10714.001.patch, YARN-10714.002.patch, 
> YARN-10714.003.patch
>
>
> Current logic does not handle orphaned auto created child queues. The 
> following example steps show a scenario in which it is possible to submit 
> applications to an orphaned queue, that has an invalid (already removed) 
> ParentQueue.
>  # Auto create a queue root.a.a-auto
>  # Remove root.a from the config
>  # Reinitialize CS without restarting it (possible via mutation API)
>  # Submit application to root.a.a-auto, while root.a is a non-existent queue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9618) NodesListManager event improvement

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9618:
-
  Component/s: resourcemanager
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> NodesListManager event improvement
> --
>
> Key: YARN-9618
> URL: https://issues.apache.org/jira/browse/YARN-9618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.4.0
>Reporter: Bibin Chundatt
>Assignee: Qi Zhu
>Priority: Critical
> Fix For: 3.4.0
>
> Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, 
> YARN-9618.006.patch, YARN-9618.007.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10597:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> CSMappingPlacementRule should not create new instance of Groups
> ---
>
> Key: YARN-10597
> URL: https://issues.apache.org/jira/browse/YARN-10597
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Gergely Pollák
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10597.001.patch, YARN-10597.002.patch
>
>
> As [~ahussein] pointed out in YARN-10425, no new Groups instance should be 
> created.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10713) ClusterMetrics should support custom resource capacity related metrics.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10713:
--
  Component/s: metrics
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.1, 3.4.0
Affects Version/s: 3.3.1
   3.4.0

> ClusterMetrics should support custom resource capacity related metrics.
> ---
>
> Key: YARN-10713
> URL: https://issues.apache.org/jira/browse/YARN-10713
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 3.4.0, 3.3.1
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10713.001.patch, YARN-10713.002.patch
>
>
> YARN-10688
> Only add gpu resource capacity related metrics, i think we should improve it 
> to support custom resources as [~ebadger] suggested.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10641) Refactor the max app related update, and fix maxApplications update error when add new queues.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10641:
--
  Component/s: capacity scheduler
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Refactor the max app related update, and fix maxApplications update error 
> when add new queues.
> --
>
> Key: YARN-10641
> URL: https://issues.apache.org/jira/browse/YARN-10641
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
> Fix For: 3.4.0
>
> Attachments: YARN-10641.001.patch, YARN-10641.002.patch, 
> YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, 
> YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, 
> image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, 
> image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png
>
>
> When refactor the update logic in YARN-10504 .
> The update max applications based abs/cap is wrong, this should be fixed, 
> because the max applications is key part to limit applications in CS.
> For example: 
> When adding a dynamic queue, the other children's max app of parent queue are 
> not updated correctly:
> !image-2021-02-20-15-53-51-099.png|width=639,height=509!  
> The new added queue's max app will updated correctly:
> !image-2021-02-20-15-55-44-780.png|width=542,height=426!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10659) Improve CS MappingRule %secondary_group evaluation

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10659:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Improve CS MappingRule %secondary_group evaluation
> --
>
> Key: YARN-10659
> URL: https://issues.apache.org/jira/browse/YARN-10659
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Gergely Pollák
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10659.001.patch, YARN-10659.002.patch, 
> YARN-10659.003.patch
>
>
> Since the leaf queue names are not unique, there are a lot of use cases where 
> %secondary_group evaluation fail, or behave inconsistently.
> We should extend it's behavior, when it's under a defined parent, 
> %secondary_group evaluation should only check for queue existence under that 
> queue. Egy root.group.%secondary_group, should only evaluate to groups which 
> exist under root.group, while the legacy %secondary_group.%user should still 
> look for groups by their leaf name globally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10689) Fix the findbugs issues in extractFloatValueFromWeightConfig.

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10689:
--
  Component/s: capacity scheduler
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix the findbugs issues in extractFloatValueFromWeightConfig.
> -
>
> Key: YARN-10689
> URL: https://issues.apache.org/jira/browse/YARN-10689
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Float.valueOf causes the finding bugs.
> I will help fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10686) Fix TestCapacitySchedulerAutoQueueCreation#testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode

2024-02-11 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-10686:
--
  Component/s: capacity scheduler
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix 
> TestCapacitySchedulerAutoQueueCreation#testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode
> -
>
> Key: YARN-10686
> URL: https://issues.apache.org/jira/browse/YARN-10686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.4.0
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10686.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1245 matches

Mail list logo