[jira] [Updated] (YARN-11684) PriorityQueueComparator violates general contract
[ https://issues.apache.org/jira/browse/YARN-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11684: -- Fix Version/s: 3.4.1 3.5.0 > PriorityQueueComparator violates general contract > - > > Key: YARN-11684 > URL: https://issues.apache.org/jira/browse/YARN-11684 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.5.0 >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1, 3.5.0 > > > YARN-10178 tried to fix the issue but there are still 2 property that might > change during sorting which causes an exception. > {code} > 2024-04-10 12:36:56,420 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[Thread-28,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeCollapse(TimSort.java:441) > at java.util.TimSort.sort(TimSort.java:245) > at java.util.Arrays.sort(Arrays.java:1512) > at > java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:348) > at java.util.stream.Sink$ChainedReference.end(Sink.java:258) > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:483) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:260) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:1100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:942) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1719) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1654) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1811) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1557) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:539) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:591) > {code} > The `queue.getAccessibleNodeLabels()` and `queue.getPriority()` could change > in another thread while the `queues` are being sorted. Those should be saved > when constructing the PriorityQueueResourcesForSorting helper object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11444) Improve YARN md documentation format
[ https://issues.apache.org/jira/browse/YARN-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan resolved YARN-11444. --- Fix Version/s: 3.4.1 3.5.0 Hadoop Flags: Reviewed Target Version/s: 3.4.1, 3.5.0 (was: 3.5.0) Resolution: Fixed > Improve YARN md documentation format > > > Key: YARN-11444 > URL: https://issues.apache.org/jira/browse/YARN-11444 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1, 3.5.0 > > > 1. Modify some typo errors -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11444) Improve YARN md documentation format
[ https://issues.apache.org/jira/browse/YARN-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11444: -- Description: 1. Modify some typo errors (was: 1. Improve the table format to make the readability better 2. Modify some typo errors 3. Modify the list number to display correctly) > Improve YARN md documentation format > > > Key: YARN-11444 > URL: https://issues.apache.org/jira/browse/YARN-11444 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.4.0 >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > 1. Modify some typo errors -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11663) [Federation] Add Cache Entity Nums Limit.
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan resolved YARN-11663. --- Fix Version/s: 3.4.1 3.5.0 Target Version/s: 3.4.0 Assignee: Shilun Fan Resolution: Fixed > [Federation] Add Cache Entity Nums Limit. > - > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1, 3.5.0 > > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11668) Potential concurrent modification exception for node attributes of node manager
[ https://issues.apache.org/jira/browse/YARN-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan resolved YARN-11668. --- Fix Version/s: 3.4.1 3.5.0 Hadoop Flags: Reviewed Target Version/s: 3.4.1 Assignee: Junfan Zhang Resolution: Fixed > Potential concurrent modification exception for node attributes of node > manager > --- > > Key: YARN-11668 > URL: https://issues.apache.org/jira/browse/YARN-11668 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1, 3.5.0 > > Attachments: img_v3_029c_55ac6b50-64aa-4cbe-81a0-5f8d22c623fg.jpg > > > The RM crash when encoutering the following the stacktrace in the attachment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11668) Potential concurrent modification exception for node attributes of node manager
[ https://issues.apache.org/jira/browse/YARN-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11668: -- Affects Version/s: 3.4.0 > Potential concurrent modification exception for node attributes of node > manager > --- > > Key: YARN-11668 > URL: https://issues.apache.org/jira/browse/YARN-11668 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1, 3.5.0 > > Attachments: img_v3_029c_55ac6b50-64aa-4cbe-81a0-5f8d22c623fg.jpg > > > The RM crash when encoutering the following the stacktrace in the attachment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) [Federation] Add Cache Entity Nums Limit.
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11663: -- Issue Type: Improvement (was: Bug) > [Federation] Add Cache Entity Nums Limit. > - > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Labels: pull-request-available > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11663) [Federation] Add Cache Entity Nums Limit.
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11663: -- Summary: [Federation] Add Cache Entity Nums Limit. (was: Router cache expansion issue) > [Federation] Add Cache Entity Nums Limit. > - > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Labels: pull-request-available > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png, image-2024-03-15-10-50-32-860.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-11387) [GPG] YARN GPG mistakenly deleted applicationid
[ https://issues.apache.org/jira/browse/YARN-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830001#comment-17830001 ] Shilun Fan edited comment on YARN-11387 at 3/22/24 11:08 PM: - I will resubmit PR to follow up on this issue. was (Author: slfan1989): I will resubmit PR to follow up on this issue.I will resubmit PR to follow up on this issue. > [GPG] YARN GPG mistakenly deleted applicationid > --- > > Key: YARN-11387 > URL: https://issues.apache.org/jira/browse/YARN-11387 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.2.1, 3.4.0 >Reporter: zhangjunj >Assignee: Shilun Fan >Priority: Major > Labels: federation, gpg, pull-request-available > Attachments: YARN-11387-YARN-11387.v1.patch, > yarn-gpg-mistakenly-deleted-applicationid.png > > Original Estimate: 168h > Remaining Estimate: 168h > > In [YARN-7599|https://issues.apache.org/jira/browse/YARN-7599], the > Federation can delete expired applicationid, but YARN GPG uses getRouter() > method to obtain application information for multiple clusters. If there are > too many applicationids that more than 200,000 , it will not be possible to > pull all the applicationid information at one time, resulting in the > possibility of accidental deletion. The following error is reported for spark > component. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11387) [GPG] YARN GPG mistakenly deleted applicationid
[ https://issues.apache.org/jira/browse/YARN-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830001#comment-17830001 ] Shilun Fan commented on YARN-11387: --- I will resubmit PR to follow up on this issue.I will resubmit PR to follow up on this issue. > [GPG] YARN GPG mistakenly deleted applicationid > --- > > Key: YARN-11387 > URL: https://issues.apache.org/jira/browse/YARN-11387 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.2.1, 3.4.0 >Reporter: zhangjunj >Assignee: Shilun Fan >Priority: Major > Labels: federation, gpg, pull-request-available > Attachments: YARN-11387-YARN-11387.v1.patch, > yarn-gpg-mistakenly-deleted-applicationid.png > > Original Estimate: 168h > Remaining Estimate: 168h > > In [YARN-7599|https://issues.apache.org/jira/browse/YARN-7599], the > Federation can delete expired applicationid, but YARN GPG uses getRouter() > method to obtain application information for multiple clusters. If there are > too many applicationids that more than 200,000 , it will not be possible to > pull all the applicationid information at one time, resulting in the > possibility of accidental deletion. The following error is reported for spark > component. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11663) Router cache expansion issue
[ https://issues.apache.org/jira/browse/YARN-11663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827300#comment-17827300 ] Shilun Fan commented on YARN-11663: --- [~luoyuan] Thank you for raising this question. how long the cache configuration time is set to? From monitoring, it appears that memory is being reclaimed. > Router cache expansion issue > > > Key: YARN-11663 > URL: https://issues.apache.org/jira/browse/YARN-11663 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.4.0 >Reporter: Yuan Luo >Priority: Major > Attachments: image-2024-03-14-18-12-28-426.png, > image-2024-03-14-18-12-49-950.png > > > !image-2024-03-14-18-12-28-426.png! > !image-2024-03-14-18-12-49-950.png! > hi [~slfan1989] After apply this feature to our prod env, I found the memory > of the router keeps growing over time. This is because after jobs finished, > we won't access the expired key to trigger cleanup mechanism. Is it better to > add cache maximum number limit? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11660) SingleConstraintAppPlacementAllocator performance regression
[ https://issues.apache.org/jira/browse/YARN-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan resolved YARN-11660. --- Resolution: Fixed > SingleConstraintAppPlacementAllocator performance regression > > > Key: YARN-11660 > URL: https://issues.apache.org/jira/browse/YARN-11660 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 3.4.1 >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11660) SingleConstraintAppPlacementAllocator performance regression
[ https://issues.apache.org/jira/browse/YARN-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11660: -- Component/s: scheduler Fix Version/s: 3.4.1 Hadoop Flags: Reviewed Target Version/s: 3.4.1 Affects Version/s: 3.4.1 > SingleConstraintAppPlacementAllocator performance regression > > > Key: YARN-11660 > URL: https://issues.apache.org/jira/browse/YARN-11660 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 3.4.1 >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11037) Add configurable logic to split resource request to least loaded SC
[ https://issues.apache.org/jira/browse/YARN-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11037: -- Component/s: federation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add configurable logic to split resource request to least loaded SC > --- > > Key: YARN-11037 > URL: https://issues.apache.org/jira/browse/YARN-11037 > Project: Hadoop YARN > Issue Type: Task > Components: federation >Affects Versions: 3.4.0 >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Add configurable logic to split resource request to least loaded subcluster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11026) Make AppPlacementAllocator configurable in AppSchedulingInfo
[ https://issues.apache.org/jira/browse/YARN-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11026: -- Component/s: scheduler Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Make AppPlacementAllocator configurable in AppSchedulingInfo > > > Key: YARN-11026 > URL: https://issues.apache.org/jira/browse/YARN-11026 > Project: Hadoop YARN > Issue Type: Task > Components: scheduler >Affects Versions: 3.4.0 >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10805) Replace Guava Lists usage by Hadoop's own Lists in hadoop-yarn-project
[ https://issues.apache.org/jira/browse/YARN-10805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10805: -- Component/s: yarn-common Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Replace Guava Lists usage by Hadoop's own Lists in hadoop-yarn-project > -- > > Key: YARN-10805 > URL: https://issues.apache.org/jira/browse/YARN-10805 > Project: Hadoop YARN > Issue Type: Task > Components: yarn-common >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10750) TestMetricsInvariantChecker.testManyRuns is broken since HADOOP-17524
[ https://issues.apache.org/jira/browse/YARN-10750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10750: -- Component/s: test Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > TestMetricsInvariantChecker.testManyRuns is broken since HADOOP-17524 > - > > Key: YARN-10750 > URL: https://issues.apache.org/jira/browse/YARN-10750 > Project: Hadoop YARN > Issue Type: Task > Components: test >Affects Versions: 3.4.0 >Reporter: Gergely Pollák >Assignee: Gergely Pollák >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10750.001.patch > > > HADOOP-17524 removed the metrics: > LogFatal > LogError > LogWarn > LogInfo > These needs to be reflected in the invariable list of the > TestMetricsInvariantChecker as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10746) RmWebApp add default-node-label-expression to the queue info
[ https://issues.apache.org/jira/browse/YARN-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10746: -- Component/s: resourcemanager webapp Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RmWebApp add default-node-label-expression to the queue info > > > Key: YARN-10746 > URL: https://issues.apache.org/jira/browse/YARN-10746 > Project: Hadoop YARN > Issue Type: Task > Components: resourcemanager, webapp >Affects Versions: 3.4.0 >Reporter: Gergely Pollák >Assignee: Gergely Pollák >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10746.001.patch, YARN-10746.002.patch, > YARN-10746.003.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10278) CapacityScheduler test framework ProportionalCapacityPreemptionPolicyMockFramework need some review
[ https://issues.apache.org/jira/browse/YARN-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10278: -- Component/s: capacity scheduler test Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.1, 3.4.0 Affects Version/s: 3.2.3 3.3.1 3.4.0 > CapacityScheduler test framework > ProportionalCapacityPreemptionPolicyMockFramework need some review > --- > > Key: YARN-10278 > URL: https://issues.apache.org/jira/browse/YARN-10278 > Project: Hadoop YARN > Issue Type: Task > Components: capacity scheduler, test >Affects Versions: 3.4.0, 3.3.1, 3.2.3 >Reporter: Gergely Pollák >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.4.0, 3.3.1, 3.2.3 > > Attachments: YARN-10278.001.patch, YARN-10278.002.patch, > YARN-10278.002.patch, YARN-10278.002.patch, YARN-10278.branch-3.1.001.patch, > YARN-10278.branch-3.1.002.patch, YARN-10278.branch-3.1.003.patch, > YARN-10278.branch-3.2.001.patch, YARN-10278.branch-3.2.002.patch, > YARN-10278.branch-3.2.002.patch, YARN-10278.branch-3.3.001.patch > > > This test framework class mocks a bit too heavily, and simulates CS internal > behaviour with the mock methods over a point it is reasonably maintainable, > any internal change in CS is a major headscratch. > A lot of tests depend on this class, so we should approach it carefully, but > I think it's wroth to examine this class if it can be made a bit more > resilient to changes, and easier to maintain. Or at least document it better. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10277) CapacityScheduler test TestUserGroupMappingPlacementRule should build proper hierarchy
[ https://issues.apache.org/jira/browse/YARN-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10277: -- Component/s: capacity scheduler Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > CapacityScheduler test TestUserGroupMappingPlacementRule should build proper > hierarchy > -- > > Key: YARN-10277 > URL: https://issues.apache.org/jira/browse/YARN-10277 > Project: Hadoop YARN > Issue Type: Task > Components: capacity scheduler >Affects Versions: 3.4.0, 3.3.1 >Reporter: Gergely Pollák >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10277.001.patch, YARN-10277.002.patch, > YARN-10277.003.patch, YARN-10277.branch-3.3.001.patch > > > Since the CapacityScheduler internal implementation depends more and more on > queue being hierarchical, the test gets really hard to maintain. A lot of > test cases were failing because they used non existing queues, but the older > placement rule solution ignored missing parents, but since the leaf queue > change in CS, we must be able to get a full path for any queue, since all > queues are referenced by their full path. > This test should reflect this and instead of creating and expecting the > existance of fictional queues, it should create a proper queue hierarchy, > with a way to describe it better. > Currently we set up a bunch of mockito "when" statements to simulate the > queue behavior, but this is a hassle to maintain, and easy to miss a few > method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10279) Avoid unnecessary QueueMappingEntity creations
[ https://issues.apache.org/jira/browse/YARN-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10279: -- Component/s: resourcemanager Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > Avoid unnecessary QueueMappingEntity creations > -- > > Key: YARN-10279 > URL: https://issues.apache.org/jira/browse/YARN-10279 > Project: Hadoop YARN > Issue Type: Task > Components: resourcemanager >Affects Versions: 3.4.0, 3.3.1 >Reporter: Gergely Pollák >Assignee: Hudáky Márton Gyula >Priority: Minor > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10279.001.patch, YARN-10279.003.patch, > YARN-10279.004.patch, YARN-10279.005.patch, YARN-10279.006.patch > > > In CS UserGroupMappingPlacementRule and AppNameMappingPlacementRule classes > we create new instances of QueueMappingEntity class. In some cases we simply > copy the already received class, so we just duplicate it, which is > unnecessary since the class is immutable. > This is just a minor improvement, probably doesn't have much impact, but > still puts some unnecessary load on GC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10281) Redundant QueuePath usage in UserGroupMappingPlacementRule and AppNameMappingPlacementRule
[ https://issues.apache.org/jira/browse/YARN-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10281: -- Component/s: capacity scheduler Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > Redundant QueuePath usage in UserGroupMappingPlacementRule and > AppNameMappingPlacementRule > -- > > Key: YARN-10281 > URL: https://issues.apache.org/jira/browse/YARN-10281 > Project: Hadoop YARN > Issue Type: Task > Components: capacity scheduler >Affects Versions: 3.4.0, 3.3.1 >Reporter: Gergely Pollák >Assignee: Gergely Pollák >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10281.001.patch, YARN-10281.002.patch, > YARN-10281.003.patch, YARN-10281.004.patch, YARN-10281.branch-3.3.001.patch > > > We use the QueuePath and QueueMapping (or QueueMappingEntity) objects in the > aforementioned classes, but these technically store the same kind of > information, yet we keep converting between them, let's examine if we can use > only the QueueMapping(Entity) instead, since that holds more information. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10746) RmWebApp add default-node-label-expression to the queue info
[ https://issues.apache.org/jira/browse/YARN-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan reassigned YARN-10746: - Assignee: Gergely Pollák > RmWebApp add default-node-label-expression to the queue info > > > Key: YARN-10746 > URL: https://issues.apache.org/jira/browse/YARN-10746 > Project: Hadoop YARN > Issue Type: Task >Reporter: Gergely Pollák >Assignee: Gergely Pollák >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10746.001.patch, YARN-10746.002.patch, > YARN-10746.003.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10777) Bump node-sass from 4.13.0 to 4.14.1 in /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp
[ https://issues.apache.org/jira/browse/YARN-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan reassigned YARN-10777: - Assignee: Wei-Chiu Chuang > Bump node-sass from 4.13.0 to 4.14.1 in > /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp > --- > > Key: YARN-10777 > URL: https://issues.apache.org/jira/browse/YARN-10777 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.4.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-1115) Provide optional means for a scheduler to check real user ACLs
[ https://issues.apache.org/jira/browse/YARN-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan reassigned YARN-1115: Assignee: Eric Payne > Provide optional means for a scheduler to check real user ACLs > -- > > Key: YARN-1115 > URL: https://issues.apache.org/jira/browse/YARN-1115 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, scheduler >Affects Versions: 2.8.5 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4 > > Attachments: YARN-1115.001.patch, YARN-1115.002.patch, > YARN-1115.003.patch, YARN-1115.004.patch, YARN-1115.branch-2.10.004.patch, > YARN-1115.branch-3.2.004.patch, YARN-1115.branch-3.3.004.patch > > > In the framework for secure implementation using UserGroupInformation.doAs > (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html), > a trusted superuser can submit jobs on behalf of another user in a secure > way. In this framework, the superuser is referred to as the real user and the > proxied user is referred to as the effective user. > Currently when a job is submitted as an effective user, the ACLs for the > effective user are checked against the queue on which the job is to be run. > Depending on an optional configuration, the scheduler should also check the > ACLs of the real user if the configuration to do so is set. > For example, suppose my superuser name is super, and super is configured to > securely proxy as joe. Also suppose there is a Hadoop queue named ops which > only allows ACLs for super, not for joe. > When super proxies to joe in order to submit a job to the ops queue, it will > fail because joe, as the effective user, does not have ACLs on the ops queue. > In many cases this is what you want, in order to protect queues that joe > should not be using. > However, there are times when super may need to proxy to many users, and the > client running as super just wants to use the ops queue because the ops queue > is already dedicated to the client's purpose, and, to keep the ops queue > dedicated to that purpose, super doesn't want to open up ACLs to joe in > general on the ops queue. Without this functionality, in this case, the > client running as super needs to figure out which queue each user has ACLs > opened up for, and then coordinate with other tasks using those queues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9048) Add znode hierarchy in Federation ZK State Store
[ https://issues.apache.org/jira/browse/YARN-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9048: - Component/s: federation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add znode hierarchy in Federation ZK State Store > > > Key: YARN-9048 > URL: https://issues.apache.org/jira/browse/YARN-9048 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Bibin Chundatt >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Similar to YARN-2962 consider having hierarchy in ZK federation store for > applications -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8980) Mapreduce application container start fail after AM restart.
[ https://issues.apache.org/jira/browse/YARN-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8980: - Component/s: federation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Mapreduce application container start fail after AM restart. > - > > Key: YARN-8980 > URL: https://issues.apache.org/jira/browse/YARN-8980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Bibin Chundatt >Assignee: Chenyu Zheng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > UAM to subclusters are always launched with keepContainers. > On AM restart scenarios , UAM register again with RM . UAM receive running > containers with NMToken. NMToken received by UAM in > getPreviousAttemptContainersNMToken is never used by mapreduce application. > Federation Interceptor should take care of such scenarios too. Merge NMToken > received at registration to allocate response. > Container allocation response on same node will have NMToken empty. > issue credits : [~Nallasivan] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11090) [GPG] Support Secure Mode
[ https://issues.apache.org/jira/browse/YARN-11090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11090: -- Component/s: gpg Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [GPG] Support Secure Mode > - > > Key: YARN-11090 > URL: https://issues.apache.org/jira/browse/YARN-11090 > Project: Hadoop YARN > Issue Type: Sub-task > Components: gpg >Affects Versions: 3.4.0 >Reporter: tuyu >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-11090-YARN-7402.v1.patch, YARN-11090.001.patch > > > GPG should support config keytab and principal to communication with router -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8898) Fix FederationInterceptor#allocate to set application priority in allocateResponse
[ https://issues.apache.org/jira/browse/YARN-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8898: - Component/s: federation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix FederationInterceptor#allocate to set application priority in > allocateResponse > -- > > Key: YARN-8898 > URL: https://issues.apache.org/jira/browse/YARN-8898 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Bibin Chundatt >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-8898.wip.patch > > > In case of FederationInterceptor#mergeAllocateResponses skips > application_priority in response returned -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7720) Race condition between second app attempt and UAM timeout when first attempt node is down
[ https://issues.apache.org/jira/browse/YARN-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-7720: - Component/s: federation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Race condition between second app attempt and UAM timeout when first attempt > node is down > - > > Key: YARN-7720 > URL: https://issues.apache.org/jira/browse/YARN-7720 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Botong Huang >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-7720.v1.patch, YARN-7720.v2.patch > > > In Federation, multiple attempts of an application share the same UAM in each > secondary sub-cluster. When first attempt fails, we reply on the fact that > secondary RM won't kill the existing UAM before the AM heartbeat timeout > (default at 10 min). When second attempt comes up in the home sub-cluster, it > will pick up the UAM token from Yarn Registry and resume the UAM heartbeat to > secondary RMs. > The default heartbeat timeout for NM and AM are both 10 mins. The problem is > that when the first attempt node goes down or out of connection, only after > 10 mins will the home RM mark the first attempt as failed, and then schedule > the 2nd attempt in some other node. By then the UAMs in secondaries are > already timing out, and they might not survive until the second attempt comes > up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5604) Add versioning for FederationStateStore
[ https://issues.apache.org/jira/browse/YARN-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-5604: - Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add versioning for FederationStateStore > --- > > Key: YARN-5604 > URL: https://issues.apache.org/jira/browse/YARN-5604 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Subramaniam Krishnan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Currently we don't have versioning (null version) for the > FederationStateStore.This JIRA proposes add versioning support that is needed > to support upgrades. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8972: - Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch, YARN-8972.v4.patch, YARN-8972.v5.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9049) Add application submit data to state store
[ https://issues.apache.org/jira/browse/YARN-9049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9049: - Component/s: federation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add application submit data to state store > -- > > Key: YARN-9049 > URL: https://issues.apache.org/jira/browse/YARN-9049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Bibin Chundatt >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-9049.001.path > > > As per the discussion in YARN-8898 we need to persist trimmend > ApplicationSubmissionContext details to federation State Store. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10946) AbstractCSQueue: Create separate class for constructing Queue API objects
[ https://issues.apache.org/jira/browse/YARN-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10946: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > AbstractCSQueue: Create separate class for constructing Queue API objects > - > > Key: YARN-10946 > URL: https://issues.apache.org/jira/browse/YARN-10946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Peter Szucs >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Relevant methods are: > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueConfigurations > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueInfo > - > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueStatistics -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor
[ https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-6667: - Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Handle containerId duplicate without failing the heartbeat in Federation > Interceptor > > > Key: YARN-6667 > URL: https://issues.apache.org/jira/browse/YARN-6667 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Botong Huang >Assignee: Shilun Fan >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > From the actual situation, the probability of this happening is very low. > It can only be caused by the master-slave fail-hover of YARN and the wrong > Epoch parameter configuration. > We will try to be compatible with this situation and let the Application run > as much as possible, using the following measures: > 1. Select a node whose heartbeat does not time out for allocation, and at the > same time require the node to be in the RUNNING state. > 2. If the heartbeat of both RMs does not time out, and both are in the > RUNNING state, select the previously allocated RM for Container processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8482) [Router] Add cache for fast answers to getApps
[ https://issues.apache.org/jira/browse/YARN-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8482: - Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [Router] Add cache for fast answers to getApps > -- > > Key: YARN-8482 > URL: https://issues.apache.org/jira/browse/YARN-8482 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-6539: - Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, > YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, > YARN-6539.006.patch, YARN-6539.007.patch, YARN-6539.008.patch, > YARN-6539_3.patch, YARN-6539_4.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6972) Adding RM ClusterId in AppInfo
[ https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-6972: - Component/s: federation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Adding RM ClusterId in AppInfo > -- > > Key: YARN-6972 > URL: https://issues.apache.org/jira/browse/YARN-6972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Tanuj Nayak >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-6972.001.patch, YARN-6972.002.patch, > YARN-6972.003.patch, YARN-6972.004.patch, YARN-6972.005.patch, > YARN-6972.006.patch, YARN-6972.007.patch, YARN-6972.008.patch, > YARN-6972.009.patch, YARN-6972.010.patch, YARN-6972.011.patch, > YARN-6972.012.patch, YARN-6972.013.patch, YARN-6972.014.patch, > YARN-6972.015.patch, YARN-6972.016.patch, YARN-6972.016.patch > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10793) Upgrade Junit from 4 to 5 in hadoop-yarn-server-applicationhistoryservice
[ https://issues.apache.org/jira/browse/YARN-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10793: -- Component/s: test Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Upgrade Junit from 4 to 5 in hadoop-yarn-server-applicationhistoryservice > - > > Key: YARN-10793 > URL: https://issues.apache.org/jira/browse/YARN-10793 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: ANANDA G B >Assignee: Ashutosh Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Upgrade Junit from 4 to 5 in hadoop-yarn-server-applicationhistoryservice -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8973) [Router] Add missing methods in RMWebProtocol
[ https://issues.apache.org/jira/browse/YARN-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-8973: - Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [Router] Add missing methods in RMWebProtocol > - > > Key: YARN-8973 > URL: https://issues.apache.org/jira/browse/YARN-8973 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Giovanni Matteo Fumarola >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-8973.v1.patch, YARN-8973.v2.patch, > YARN-8973.v3.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10883) [Router] Router Audit Log Add Client IP Address.
[ https://issues.apache.org/jira/browse/YARN-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10883: -- Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [Router] Router Audit Log Add Client IP Address. > > > Key: YARN-10883 > URL: https://issues.apache.org/jira/browse/YARN-10883 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: chaosju >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > the Router should record the client address which killed the application > Now the log information is printed as follows: > {code:java} > 2022-06-10 08:06:26,322 INFO [main] router.RouterAuditLogger > (RouterAuditLogger.java:logSuccess(89)) - USER=test-user OPERATION=Submit > New App TARGET=RouterClientRMService RESULT=SUCCESS > APPID=application_1654873569440_0001 SUBCLUSTERID=2{code} > The log of adding IP information is as follows: > {code:java} > 2022-06-10 08:09:05,392 INFO [main] router.RouterAuditLogger > (RouterAuditLogger.java:logSuccess(89)) - USER=test-user IP=127.0.0.1 > OPERATION=Submit New App TARGET=RouterClientRMService RESULT=SUCCESS > APPID=application_1654873732359_0001 SUBCLUSTERID=3 {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10487) Support getQueueUserAcls, listReservations, getApplicationAttempts, getContainerReport, getContainers, getResourceTypeInfo API's for Federation
[ https://issues.apache.org/jira/browse/YARN-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10487: -- Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Support getQueueUserAcls, listReservations, getApplicationAttempts, > getContainerReport, getContainers, getResourceTypeInfo API's for Federation > --- > > Key: YARN-10487 > URL: https://issues.apache.org/jira/browse/YARN-10487 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: D M Murali Krishna Reddy >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-10487.001.patch > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Support getQueueUserAcls, listReservations, getApplicationAttempts, > getContainerReport, getContainers, getResourceTypeInfo API's for Federation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10565) Follow-up to YARN-10504
[ https://issues.apache.org/jira/browse/YARN-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10565: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Follow-up to YARN-10504 > --- > > Key: YARN-10565 > URL: https://issues.apache.org/jira/browse/YARN-10565 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In YARN-10504 weight mode support was introduced to CS. This jira is a > followup to simplify and restructure the initialization, so that the weight > calculation/absolute/percentage mode is easier to understand and modify. > To be refactored: > * In ParentQueue.java#1099 the error message should be more specific, instead > of the {{LOG.error("Fatal issue found: e", e);}} > * -AutoCreatedLeafQueue.clearConfigurableFields should clear > NORMALIZED_WEIGHT just to be on the safe side- > * -Uncomment the commented assertions in > TestCapacitySchedulerAutoCreatedQueueBase.validateEffectiveMinResource- > * -Check whether the assertion modification in TestRMWebServices is > absolutely necessary or could be hiding a bug.- > * -Same for TestRMWebServicesForCSWithPartitions.java- > Additional information: > The original flow was modified to allow the dynamic weight-capacity > calculation. > This resulted in a new flow, which is now harder to understand. > With a cleanup it could be made simpler, the duplicate calculations could be > avoided. > The changed functionality should either be explained (if deemed correct) or > fixed (see YARN-10590). > Investigate how the CS reinit works, it could contain some possibly redundant > initialization code fragments. > Note: Since most of the items were completed in other refactor items, only > the first one is being patched here. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11036) Do not inherit from TestRMWebServicesCapacitySched
[ https://issues.apache.org/jira/browse/YARN-11036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11036: -- Component/s: capacity scheduler test Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Do not inherit from TestRMWebServicesCapacitySched > -- > > Key: YARN-11036 > URL: https://issues.apache.org/jira/browse/YARN-11036 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, test >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > {code:java} > public class TestRMWebServicesSchedulerActivities > extends TestRMWebServicesCapacitySched { {code} > This is a bad practice, the TestRMWebServicesCapacitySched's tests run 2 > times. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10049) FIFOOrderingPolicy Improvements
[ https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10049: -- Component/s: scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > FIFOOrderingPolicy Improvements > --- > > Key: YARN-10049 > URL: https://issues.apache.org/jira/browse/YARN-10049 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 3.4.0 >Reporter: Manikandan R >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-10049.001.patch, YARN-10049.002.patch, > YARN-10049.003.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > FIFOPolicy of FS does the following comparisons in addition to app priority > comparison: > 1. Using Start time > 2. Using Name > Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy > of CS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10918) Simplify method: CapacitySchedulerQueueManager#parseQueue
[ https://issues.apache.org/jira/browse/YARN-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10918: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Simplify method: CapacitySchedulerQueueManager#parseQueue > - > > Key: YARN-10918 > URL: https://issues.apache.org/jira/browse/YARN-10918 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Ideas for simplifying this method: > - Define a queue factory > - Separate validation logic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10945) Add javadoc to all methods of AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10945: -- Component/s: capacity scheduler documentation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add javadoc to all methods of AbstractCSQueue > - > > Key: YARN-10945 > URL: https://issues.apache.org/jira/browse/YARN-10945 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, documentation >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: András Győri >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10947) Simplify AbstractCSQueue#initializeQueueState
[ https://issues.apache.org/jira/browse/YARN-10947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10947: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Simplify AbstractCSQueue#initializeQueueState > - > > Key: YARN-10947 > URL: https://issues.apache.org/jira/browse/YARN-10947 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10995) Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy
[ https://issues.apache.org/jira/browse/YARN-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10995: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy > - > > Key: YARN-10995 > URL: https://issues.apache.org/jira/browse/YARN-10995 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > GuaranteedOrZeroCapacityOverTimePolicy has a comparator class that orders > applications by their submit time. It gets the applications from the > RMContext and doesn't need any data from > GuaranteedOrZeroCapacityOverTimePolicy class, so this easily could be moved > to RMContext, so that the reference to the RMContext/SchedulerContext could > be removed from GuaranteedOrZeroCapacityOverTimePolicy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10944) AbstractCSQueue: Eliminate code duplication in overloaded versions of setMaxCapacity
[ https://issues.apache.org/jira/browse/YARN-10944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10944: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > AbstractCSQueue: Eliminate code duplication in overloaded versions of > setMaxCapacity > > > Key: YARN-10944 > URL: https://issues.apache.org/jira/browse/YARN-10944 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Methods are: > - AbstractCSQueue#setMaxCapacity(float) > - AbstractCSQueue#setMaxCapacity(java.lang.String, float) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10963) Split TestCapacityScheduler by test categories
[ https://issues.apache.org/jira/browse/YARN-10963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10963: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Split TestCapacityScheduler by test categories > -- > > Key: YARN-10963 > URL: https://issues.apache.org/jira/browse/YARN-10963 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Tests in the TestCapacityScheduler can be categorised and split into multiple > test file, e.g.: > - refresh related tests > - app related tests (move, etc) > - node handling related tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11034) Add enhanced headroom in AllocateResponse
[ https://issues.apache.org/jira/browse/YARN-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11034: -- Component/s: federation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add enhanced headroom in AllocateResponse > - > > Key: YARN-11034 > URL: https://issues.apache.org/jira/browse/YARN-11034 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Add enhanced headroom in allocate response. This provides a channel for RMs > to return load information for AMRMProxy and decision making when rerouting > resource requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10632) Make auto queue creation maximum allowed depth configurable
[ https://issues.apache.org/jira/browse/YARN-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10632: -- Component/s: capacity scheduler Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Make auto queue creation maximum allowed depth configurable > --- > > Key: YARN-10632 > URL: https://issues.apache.org/jira/browse/YARN-10632 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Qi Zhu >Assignee: Andras Gyori >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-10632.001.patch, YARN-10632.002.patch, > YARN-10632.003.patch, YARN-10632.004.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Now the max depth allowed are fixed to 2. But i think this should be > configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10907) Minimize usages of AbstractCSQueue#csContext
[ https://issues.apache.org/jira/browse/YARN-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10907: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Minimize usages of AbstractCSQueue#csContext > > > Key: YARN-10907 > URL: https://issues.apache.org/jira/browse/YARN-10907 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Context objects can be a sign of a code smell as they can contain many, > possible loosely related references to other objects. > CapacitySchedulerContext seems like this. > This task is to investigate how the field AbstractCSQueue#csContext is being > used from this class and possibly keeping the usage of this context class on > the bare minimum. > Related article: https://wiki.c2.com/?ContextObjectsAreEvil -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10929) Do not use a separate config in legacy CS AQC
[ https://issues.apache.org/jira/browse/YARN-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10929: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Do not use a separate config in legacy CS AQC > - > > Key: YARN-10929 > URL: https://issues.apache.org/jira/browse/YARN-10929 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > AbstractManagedParentQueue#initializeLeafQueueConfigs creates a new > CapacitySchedulerConfiguration with templated configs only. We should stop > doing this. > Also, there is a sorting of config keys in this method, but in the end the > configs are added to the Configuration object which is an enhanced Map. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11043) Clean up checkstyle warnings from YARN-11024/10907/10929
[ https://issues.apache.org/jira/browse/YARN-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11043: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Clean up checkstyle warnings from YARN-11024/10907/10929 > > > Key: YARN-11043 > URL: https://issues.apache.org/jira/browse/YARN-11043 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: checkstyle_warnings.txt > > Time Spent: 1h 10m > Remaining Estimate: 0h > > YARN-11024, YARN-10907, YARN-10929 are consecutive changes built on top of > each other. This jira is a followup to clean up the checkstyle warnings > present in the modified files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11031) Improve the maintainability of RM webapp tests like TestRMWebServicesCapacitySched
[ https://issues.apache.org/jira/browse/YARN-11031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11031: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Improve the maintainability of RM webapp tests like > TestRMWebServicesCapacitySched > -- > > Key: YARN-11031 > URL: https://issues.apache.org/jira/browse/YARN-11031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > It's hard to maintain the asserts in TestRMWebServicesCapacitySched, > TestRMWebServicesCapacitySchedDynamicConfig test classes when the scheduler > response is modified. Currently only a subset of the scheduler response is > asserted in these tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11024) Create an AbstractLeafQueue to store the common LeafQueue + AutoCreatedLeafQueue functionality
[ https://issues.apache.org/jira/browse/YARN-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11024: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Create an AbstractLeafQueue to store the common LeafQueue + > AutoCreatedLeafQueue functionality > -- > > Key: YARN-11024 > URL: https://issues.apache.org/jira/browse/YARN-11024 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > AbstractAutoCreatedLeafQueue extends the LeafQueue class which is an > instantiable class, so every time an AutoCreatedLeafQueue is created a normal > LeafQueue is configured as well. This setup results in some strange behaviour > like having to pass the template configs of an auto created queue to a leaf > queue. To make the whole structure more flexible an AbstractLeafQueue should > be created which stores the common methods. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11003) Make RMNode aware of all (OContainer inclusive) allocated resources
[ https://issues.apache.org/jira/browse/YARN-11003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-11003: -- Component/s: container resourcemanager Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Make RMNode aware of all (OContainer inclusive) allocated resources > --- > > Key: YARN-11003 > URL: https://issues.apache.org/jira/browse/YARN-11003 > Project: Hadoop YARN > Issue Type: Sub-task > Components: container, resourcemanager >Affects Versions: 3.4.0 >Reporter: Andrew Chung >Assignee: Andrew Chung >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > In order to facilitate resource-aware Opportunistic container allocation, we > will need to pass allocated container information to {{ClusterNode}}, which > in turn gets its information from {{RMNode}}. > However, {{RMNode}} currently only holds containers and node utilization > based on the actual physical resource utilization, not at the allocated > container level. > This sub-task aims to allow {{RMNode}} to be aware of all allocated resources > on the node upon a node heartbeat. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10909) AbstractCSQueue: Annotate all methods with VisibleForTesting that are only used by test code
[ https://issues.apache.org/jira/browse/YARN-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10909: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > AbstractCSQueue: Annotate all methods with VisibleForTesting that are only > used by test code > > > Key: YARN-10909 > URL: https://issues.apache.org/jira/browse/YARN-10909 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > For example, AbstractCSQueue#setMaxCapacity(float) is only used for testing, > but not annotated. There can be other methods in this class like this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10998) Add YARN_ROUTER_HEAPSIZE to yarn-env for routers
[ https://issues.apache.org/jira/browse/YARN-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10998: -- Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add YARN_ROUTER_HEAPSIZE to yarn-env for routers > > > Key: YARN-10998 > URL: https://issues.apache.org/jira/browse/YARN-10998 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Yarn services NM, RM etc have YARN_\{SERVICENAME}_HEAPSIZE variable defined, > we should have similar parameter for Router Service also. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10948) Rename SchedulerQueue#activeQueue to activateQueue
[ https://issues.apache.org/jira/browse/YARN-10948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10948: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Rename SchedulerQueue#activeQueue to activateQueue > -- > > Key: YARN-10948 > URL: https://issues.apache.org/jira/browse/YARN-10948 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Adam Antal >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10958) Use correct configuration for Group service init in CSMappingPlacementRule
[ https://issues.apache.org/jira/browse/YARN-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10958: -- Component/s: capacity scheduler Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Use correct configuration for Group service init in CSMappingPlacementRule > -- > > Key: YARN-10958 > URL: https://issues.apache.org/jira/browse/YARN-10958 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Peter Bacsko >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h > Remaining Estimate: 0h > > There is a potential problem in {{CSMappingPlacementRule.java}}: > {noformat} > if (groups == null) { > groups = Groups.getUserToGroupsMappingService(conf); > } > {noformat} > The problem is, we're supposed to pass {{scheduler.getConf()}}. The "conf" > object is the config for capacity scheduler, which does not include the > property which selects the group service provider. Therefore, the current > code just works by chance, because Group mapping service is already > initialized at this point. See the original fix in YARN-10053. > Also, need a unit test to verify it. > Idea: > # Create a Configuration object in which the property > "hadoop.security.group.mapping" refers to an existing a test implementation. > # Add a new method to {{Groups}} which nulls out the singleton instance, eg. > {{Groups.reset()}}. > # Create a mock CapacityScheduler where {{getConf()}} and > {{getConfiguration()}} contain different settings for > "hadoop.security.group.mapping". Since {{getConf()}} is the service config, > this should return the config object created in step #1. > # Create an instance of {{CSMappingPlacementRule}} with a single primary > group rule. > # Run the placement evaluation. > # Expected: returned queue matches what is supposed to be coming from the > test group mapping service ("testuser" --> "testqueue"). > # Modify "hadoop.security.group.mapping" in the config object created in > step #1. > # Call {{Groups.refresh()}} which changes the group mapping ("testuser" --> > "testqueue2"). This requires that the test group mapping service implement > {{GroupMappingServiceProvider.cacheGroupsRefresh()}}. > # Create a new instance of {{CSMappingPlacementRule}}. > # Run the placement evaluation again > # Expected: with the same user, the target queue has changed. > This looks convoluted, but these steps make sure that: > # {{CSMappingPlacementRule}} will force the initialization of groups. > # We select the correct configuration for group service init. > # We don't create a new {{Groups}} instance if the singleton is initialized, > so we cover the original problem described in YARN-10597. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10954) Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf
[ https://issues.apache.org/jira/browse/YARN-10954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10954: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf > > > Key: YARN-10954 > URL: https://issues.apache.org/jira/browse/YARN-10954 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10954) Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf
[ https://issues.apache.org/jira/browse/YARN-10954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan reassigned YARN-10954: - Assignee: Andras Gyori > Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf > > > Key: YARN-10954 > URL: https://issues.apache.org/jira/browse/YARN-10954 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Andras Gyori >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10957) Use invokeConcurrent Overload with Collection in getClusterMetrics
[ https://issues.apache.org/jira/browse/YARN-10957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10957: -- Component/s: federation router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Use invokeConcurrent Overload with Collection in getClusterMetrics > -- > > Key: YARN-10957 > URL: https://issues.apache.org/jira/browse/YARN-10957 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Akshat Bordia >Assignee: Akshat Bordia >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > In [PR #3135|https://github.com/apache/hadoop/pull/3135], we added a new > overload of invokeConcurrent to avoid ArrayList initialization at multiple > places. Update the same in getClusterMetrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10913) AbstractCSQueue: Group preemption methods and fields into a separate class
[ https://issues.apache.org/jira/browse/YARN-10913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10913: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > AbstractCSQueue: Group preemption methods and fields into a separate class > --- > > Key: YARN-10913 > URL: https://issues.apache.org/jira/browse/YARN-10913 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Relevant methods: isQueueHierarchyPreemptionDisabled, > isIntraQueueHierarchyPreemptionDisabled, getTotalKillableResource, > getKillableContainers -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10917) Investigate and simplify CapacitySchedulerConfigValidator#validateQueueHierarchy
[ https://issues.apache.org/jira/browse/YARN-10917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10917: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Investigate and simplify > CapacitySchedulerConfigValidator#validateQueueHierarchy > > > Key: YARN-10917 > URL: https://issues.apache.org/jira/browse/YARN-10917 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10950) Code cleanup in QueueCapacities
[ https://issues.apache.org/jira/browse/YARN-10950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10950: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Code cleanup in QueueCapacities > --- > > Key: YARN-10950 > URL: https://issues.apache.org/jira/browse/YARN-10950 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Adam Antal >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > - Make fields final: capacitiesMap, readLock, writeLock > - Remove explicit type arguments, e.g. new HashMap(); > - Remove abbrevations and avoid string concatenation in > QueueCapacities.Capacities#toString > - Remove unnecessary comments, e.g. "/* Used Capacity Getter and Setter */" & > "/* Absolute Used Capacity Getter and Setter */" > - And probably many more.. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10915) AbstractCSQueue: Simplify complex logic in methods: deriveCapacityFromAbsoluteConfigurations and updateEffectiveResources
[ https://issues.apache.org/jira/browse/YARN-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10915: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > AbstractCSQueue: Simplify complex logic in methods: > deriveCapacityFromAbsoluteConfigurations and updateEffectiveResources > - > > Key: YARN-10915 > URL: https://issues.apache.org/jira/browse/YARN-10915 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10912) AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation logic from initialization logic
[ https://issues.apache.org/jira/browse/YARN-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10912: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation > logic from initialization logic > -- > > Key: YARN-10912 > URL: https://issues.apache.org/jira/browse/YARN-10912 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > AbstractCSQueue#updateConfigurableResourceRequirement contains initialization > + validation logic. The task is to factor out validation logic from this > method to a separate method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10910) AbstractCSQueue#setupQueueConfigs: Separate validation logic from initialization logic
[ https://issues.apache.org/jira/browse/YARN-10910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10910: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > AbstractCSQueue#setupQueueConfigs: Separate validation logic from > initialization logic > -- > > Key: YARN-10910 > URL: https://issues.apache.org/jira/browse/YARN-10910 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > AbstractCSQueue#setupQueueConfigs contains initialization + validation logic. > The task is to factor out validation logic from this method to a separate > method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10914) Simplify duplicated code for tracking ResourceUsage in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10914: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Simplify duplicated code for tracking ResourceUsage in AbstractCSQueue > -- > > Key: YARN-10914 > URL: https://issues.apache.org/jira/browse/YARN-10914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Alternatively, those could be moved to some computation class, too. > Relevant methods: > incReservedResource, decReservedResource, incPendingResource, > decPendingResource, incUsedResource, decUsedResource -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10893) Add metrics for getClusterMetrics and getApplications APIs in FederationClientInterceptor
[ https://issues.apache.org/jira/browse/YARN-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10893: -- Component/s: federation metrics router Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add metrics for getClusterMetrics and getApplications APIs in > FederationClientInterceptor > - > > Key: YARN-10893 > URL: https://issues.apache.org/jira/browse/YARN-10893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, metrics, router >Affects Versions: 3.4.0 >Reporter: Akshat Bordia >Assignee: Akshat Bordia >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently getClusterMetrics and getApplications APIs in > FederationClientInterceptor do not have metrics being recorded. Need to add > the metrics for the latency, successful and failed attempt counts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10522) Document for Flexible Auto Queue Creation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10522: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Document for Flexible Auto Queue Creation in Capacity Scheduler > --- > > Key: YARN-10522 > URL: https://issues.apache.org/jira/browse/YARN-10522 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Qi Zhu >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-10522.001.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > We should update document to support this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10576) Update Capacity Scheduler documentation with JSON-based placement mapping
[ https://issues.apache.org/jira/browse/YARN-10576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10576: -- Component/s: capacity scheduler documentation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Update Capacity Scheduler documentation with JSON-based placement mapping > - > > Key: YARN-10576 > URL: https://issues.apache.org/jira/browse/YARN-10576 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, documentation >Affects Versions: 3.4.0 >Reporter: Peter Bacsko >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-10576-001.patch > > Time Spent: 1h > Remaining Estimate: 0h > > The weight mode and AQC also affects how the new placement engine in CS works > and the documentation has to reflect that. > Certain statements in the documentation are no longer valid, for example: > * create flag: "Only applies to managed queue parents" - there is no > ManagedParentQueue in weight mode. > * "The nested rules primaryGroupUser and secondaryGroupUser expects the > parent queues to exist, ie. they cannot be created automatically". This only > applies to the legacy absolute/percentage mode. > Find all statements that mentions possible limitations and fix them if > necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10919) Remove LeafQueue#scheduler field
[ https://issues.apache.org/jira/browse/YARN-10919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10919: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Remove LeafQueue#scheduler field > - > > Key: YARN-10919 > URL: https://issues.apache.org/jira/browse/YARN-10919 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Szilard Nemeth >Assignee: Benjamin Teke >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > As it is the same object as AbstractCSQueue#csContext (from parent class). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10838) Implement an optimised version of Configuration getPropsWithPrefix
[ https://issues.apache.org/jira/browse/YARN-10838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10838: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Implement an optimised version of Configuration getPropsWithPrefix > -- > > Key: YARN-10838 > URL: https://issues.apache.org/jira/browse/YARN-10838 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10838.001.patch, YARN-10838.002.patch, > YARN-10838.003.patch, YARN-10838.004.patch, YARN-10838.005.patch > > > AutoCreatedQueueTemplate also has multiple call to > Configuration#getPropsWithPrefix. It must be eliminated in order to improve > the performance on reinitialisation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10790) CS Flexible AQC: Add separate parent and leaf template property.
[ https://issues.apache.org/jira/browse/YARN-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10790: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > CS Flexible AQC: Add separate parent and leaf template property. > > > Key: YARN-10790 > URL: https://issues.apache.org/jira/browse/YARN-10790 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10790.001.patch > > > There are certain properties that makes sense only in Parent/Leaf context > (eg. ordering-policy). We need a way to limit the inheritance scope for the > new auto queue creation templates. The proposal is to have the following > template: > * auto-queue-creation-v2.template -> child ParentQueues and child LeafQueues > inherit this > * auto-queue-creation-v2.leaf-template -> only child LeafQueues inherit this > * auto-queue-creation-v2.parent-template -> only ParentQueues inherit this -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10841) Fix token reset synchronization for UAM response token
[ https://issues.apache.org/jira/browse/YARN-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10841: -- Component/s: federation Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix token reset synchronization for UAM response token > -- > > Key: YARN-10841 > URL: https://issues.apache.org/jira/browse/YARN-10841 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-10841.v1.patch > > Time Spent: 20m > Remaining Estimate: 0h > > *2021-06-24T10:11:39,465* [ERROR] [AMRM Heartbeater thread] > |impl.AMRMClientAsyncImpl|: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: amrmToken from UAM > cluster-0 should be null here > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor.allocate(FederationInterceptor.java:782) > > > *2021-06-24T10:10:12,608* INFO [616916] FederationInterceptor: Received new > UAM amrmToken with keyId 843616604 > Hearbeatcallback sets token to null. But because of synchroniztion issue, it > happened after mergeAllocate is called. So, while allocate merge is happening > the value should get set to null and should have happened Inside lock -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10727) ParentQueue does not validate the queue on removal
[ https://issues.apache.org/jira/browse/YARN-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10727: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > ParentQueue does not validate the queue on removal > -- > > Key: YARN-10727 > URL: https://issues.apache.org/jira/browse/YARN-10727 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10727.001.patch > > > With the addition of YARN-10532 ParentQueue has a public method, removeQueue, > which allows the deletion of a queue at runtime. However, there is no > validation regarding the queue which is to be removed, therefore it is > possible to remove a queue from the CSQueueManager that is not a child of the > ParentQueue. Since it is a public method, there must be validations such as: > * check, if the parent of the queue to be removed is the current ParentQueue > * check, if the parent actually contains the queue in its childQueues > collection -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10829) Support getApplications API in FederationClientInterceptor
[ https://issues.apache.org/jira/browse/YARN-10829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10829: -- Component/s: federation router Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Support getApplications API in FederationClientInterceptor > -- > > Key: YARN-10829 > URL: https://issues.apache.org/jira/browse/YARN-10829 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Affects Versions: 3.4.0 >Reporter: Akshat Bordia >Assignee: Akshat Bordia >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 9.5h > Remaining Estimate: 0h > > Currently getApplications API is not supported in FederationClientInterceptor > and needs to be implemented in FederationClientInterceptor. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10801) Fix Auto Queue template to properly set all configuration properties
[ https://issues.apache.org/jira/browse/YARN-10801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10801: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix Auto Queue template to properly set all configuration properties > > > Key: YARN-10801 > URL: https://issues.apache.org/jira/browse/YARN-10801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10801.001.patch, YARN-10801.002.patch, > YARN-10801.003.patch, YARN-10801.004.patch, YARN-10801.005.patch, > YARN-10801.006.patch > > > Currently Auto Queue templates set configuration properties only on > Configuration object passed in the constructor. Due to the fact, that a lot > of configuration values are ready from the Configuration object in csContext, > template properties are not set in every cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10780) Optimise retrieval of configured node labels in CS queues
[ https://issues.apache.org/jira/browse/YARN-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10780: -- Component/s: capacity scheduler Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Optimise retrieval of configured node labels in CS queues > - > > Key: YARN-10780 > URL: https://issues.apache.org/jira/browse/YARN-10780 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10780.001.patch, YARN-10780.002.patch, > YARN-10780.003.patch, YARN-10780.004.patch, YARN-10780.005.patch > > > CapacitySchedulerConfiguration#getConfiguredNodeLabels scales poorly with > respect to queue numbers (its O(n*m), where n is the number of queues and m > is the number of properties set by each queue). During CS reinit, the node > labels are often queried, however looking at the code: > {code:java} > for (Entry stringStringEntry : this) { > e = stringStringEntry; > String key = e.getKey(); > if (key.startsWith(getQueuePrefix(queuePath) + ACCESSIBLE_NODE_LABELS > + DOT)) { > // Find in > // .accessible-node-labels..property > int labelStartIdx = > key.indexOf(ACCESSIBLE_NODE_LABELS) > + ACCESSIBLE_NODE_LABELS.length() + 1; > int labelEndIndx = key.indexOf('.', labelStartIdx); > String labelName = key.substring(labelStartIdx, labelEndIndx); > configuredNodeLabels.add(labelName); > } > } > {code} > This method iterates through ALL properties set in the configuration. For > example in case of initialising 2500 queues, each having at least 2 > properties: > 2500 * 5000 ~= over 12 million iteration + additional properties > There are some ways to resolve this issue while keeping backward > compatibility: > # Create a property like the original accessible-node-labels, which contains > predefined labels. If it is set, then getConfiguredNodeLabels get the value > of this property, otherwise it falls back to the old logic. I think > accessible-node-labels are not used for this purpose (though I have a feeling > that it should have been). > # Collect node labels for all queues at the beginning of parseQueue and only > iterate through the properties once. This will increase the space complexity > in exchange of not requiring intervention from user's perspective. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10807) Parents node labels are incorrectly added to child queues in weight mode
[ https://issues.apache.org/jira/browse/YARN-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10807: -- Component/s: capacity scheduler Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Parents node labels are incorrectly added to child queues in weight mode > - > > Key: YARN-10807 > URL: https://issues.apache.org/jira/browse/YARN-10807 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10807.001.patch, YARN-10807.002.patch > > > In ParentQueue.updateClusterResource when calculating the normalized weights > CS will iterate through the parent's nodelabels. > If the parent has a node label that a specific child doesn't, it will > incorrectly add it to the child's node label list through the > queueCapacities.setNormalizedWeights(label, weight) call: > {code:java} > // Normalize weight of children > if (getCapacityConfigurationTypeForQueues(childQueues) > == QueueCapacityType.WEIGHT) { > for (String nodeLabel : queueCapacities.getExistingNodeLabels()) { > float sumOfWeight = 0; > for (CSQueue queue : childQueues) { > float weight = Math.max(0, > queue.getQueueCapacities().getWeight(nodeLabel)); > sumOfWeight += weight; > } > // When sum of weight == 0, skip setting normalized_weight (so > // normalized weight will be 0). > if (Math.abs(sumOfWeight) > 1e-6) { > for (CSQueue queue : childQueues) { > queue.getQueueCapacities().setNormalizedWeight(nodeLabel, > queue.getQueueCapacities().getWeight(nodeLabel) / > sumOfWeight); > } > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10771) Add cluster metric for size of SchedulerEventQueue and RMEventQueue
[ https://issues.apache.org/jira/browse/YARN-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10771: -- Component/s: metrics resourcemanager Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add cluster metric for size of SchedulerEventQueue and RMEventQueue > --- > > Key: YARN-10771 > URL: https://issues.apache.org/jira/browse/YARN-10771 > Project: Hadoop YARN > Issue Type: Sub-task > Components: metrics, resourcemanager >Affects Versions: 3.4.0 >Reporter: chaosju >Assignee: chaosju >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10763.001.patch, YARN-10771.002.patch, > YARN-10771.003.patch, YARN-10771.004.patch, YARN-10771.005.patch > > > Add cluster metric for size of Scheduler event queue and RM event queue, This > lets us know the load of the RM and convenient monitoring the metrics. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10783) Allow definition of auto queue template properties in root
[ https://issues.apache.org/jira/browse/YARN-10783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10783: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Allow definition of auto queue template properties in root > -- > > Key: YARN-10783 > URL: https://issues.apache.org/jira/browse/YARN-10783 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10783.001.patch > > > YARN-10564 introduced template properties set on auto queue creation eligible > queues, however root does not take it into consideration. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10571) Refactor dynamic queue handling logic
[ https://issues.apache.org/jira/browse/YARN-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10571: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Refactor dynamic queue handling logic > - > > Key: YARN-10571 > URL: https://issues.apache.org/jira/browse/YARN-10571 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Minor > Fix For: 3.4.0 > > Attachments: YARN-10571.001.patch, YARN-10571.002.patch, > YARN-10571.003.patch, YARN-10571.004.patch > > > As per YARN-10506 we have introduced an other mode for auto queue creation > and a new class, which handles it. We should move the old, managed queue > related logic to CSAutoQueueHandler as well, and do additional cleanup > regarding queue management. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9615) Add dispatcher metrics to RM
[ https://issues.apache.org/jira/browse/YARN-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9615: - Component/s: metrics resourcemanager Target Version/s: 2.10.2, 3.3.1, 3.4.0 (was: 2.10.2) Affects Version/s: 3.3.1 3.4.0 > Add dispatcher metrics to RM > > > Key: YARN-9615 > URL: https://issues.apache.org/jira/browse/YARN-9615 > Project: Hadoop YARN > Issue Type: Sub-task > Components: metrics, resourcemanager >Affects Versions: 3.4.0, 3.3.1 >Reporter: Jonathan Hung >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-9615-branch-3.3-001.patch, YARN-9615.001.patch, > YARN-9615.002.patch, YARN-9615.003.patch, YARN-9615.004.patch, > YARN-9615.005.patch, YARN-9615.006.patch, YARN-9615.007.patch, > YARN-9615.008.patch, YARN-9615.009.patch, YARN-9615.010.patch, > YARN-9615.011.patch, YARN-9615.011.patch, YARN-9615.poc.patch, > image-2021-03-04-10-35-10-626.png, image-2021-03-04-10-36-12-441.png, > screenshot-1.png > > > It'd be good to have counts/processing times for each event type in RM async > dispatcher and scheduler async dispatcher. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10637) fs2cs: add queue autorefresh policy during conversion
[ https://issues.apache.org/jira/browse/YARN-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10637: -- Component/s: fairscheduler fs-cs Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > fs2cs: add queue autorefresh policy during conversion > - > > Key: YARN-10637 > URL: https://issues.apache.org/jira/browse/YARN-10637 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler, fs-cs >Affects Versions: 3.4.0 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Fix For: 3.4.0 > > Attachments: YARN-10637.001.patch, YARN-10637.002.patch, > YARN-10637.003.patch, YARN-10637.004.patch > > > cc [~pbacsko] [~gandras] [~bteke] > We should also fill this, when YARN-10623 finished. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10654) Dots '.' in CSMappingRule path variables should be replaced
[ https://issues.apache.org/jira/browse/YARN-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10654: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Dots '.' in CSMappingRule path variables should be replaced > --- > > Key: YARN-10654 > URL: https://issues.apache.org/jira/browse/YARN-10654 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Gergely Pollák >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10654-001.patch > > > Dots are used as separators, so we should escape them somehow in the > variables when substituting them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10564) Support Auto Queue Creation template configurations
[ https://issues.apache.org/jira/browse/YARN-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10564: -- Component/s: capacity scheduler Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Support Auto Queue Creation template configurations > --- > > Key: YARN-10564 > URL: https://issues.apache.org/jira/browse/YARN-10564 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10564.001.patch, YARN-10564.002.patch, > YARN-10564.003.patch, YARN-10564.004.patch, YARN-10564.005.patch, > YARN-10564.006.patch, YARN-10564.poc.001.patch > > > Similar to how the template configuration works for ManagedParents, we need > to support templates for the new auto queue creation logic. Proposition is to > allow wildcards in template configs such as: > {noformat} > yarn.scheduler.capacity.root.*.*.weight 10{noformat} > which would mean, that set weight to 10 of every leaf of every parent under > root. > We should possibly take an approach, that could support arbitrary depth of > template configuration, because we might need to lift the limitation of auto > queue nesting. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10714) Remove dangling dynamic queues on reinitialization
[ https://issues.apache.org/jira/browse/YARN-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10714: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Remove dangling dynamic queues on reinitialization > -- > > Key: YARN-10714 > URL: https://issues.apache.org/jira/browse/YARN-10714 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10714.001.patch, YARN-10714.002.patch, > YARN-10714.003.patch > > > Current logic does not handle orphaned auto created child queues. The > following example steps show a scenario in which it is possible to submit > applications to an orphaned queue, that has an invalid (already removed) > ParentQueue. > # Auto create a queue root.a.a-auto > # Remove root.a from the config > # Reinitialize CS without restarting it (possible via mutation API) > # Submit application to root.a.a-auto, while root.a is a non-existent queue -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9618) NodesListManager event improvement
[ https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-9618: - Component/s: resourcemanager Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > NodesListManager event improvement > -- > > Key: YARN-9618 > URL: https://issues.apache.org/jira/browse/YARN-9618 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.4.0 >Reporter: Bibin Chundatt >Assignee: Qi Zhu >Priority: Critical > Fix For: 3.4.0 > > Attachments: YARN-9618.001.patch, YARN-9618.002.patch, > YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch, > YARN-9618.006.patch, YARN-9618.007.patch > > > Current implementation nodelistmanager event blocks async dispacher and can > cause RM crash and slowing down event processing. > # Cluster restart with 1K running apps . Each usable event will create 1K > events over all events could be 5k*1k events for 5K cluster > # Event processing is blocked till new events are added to queue. > Solution : > # Add another async Event handler similar to scheduler. > # Instead of adding events to dispatcher directly call RMApp event handler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups
[ https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10597: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > CSMappingPlacementRule should not create new instance of Groups > --- > > Key: YARN-10597 > URL: https://issues.apache.org/jira/browse/YARN-10597 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Gergely Pollák >Assignee: Gergely Pollák >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10597.001.patch, YARN-10597.002.patch > > > As [~ahussein] pointed out in YARN-10425, no new Groups instance should be > created. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10713) ClusterMetrics should support custom resource capacity related metrics.
[ https://issues.apache.org/jira/browse/YARN-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10713: -- Component/s: metrics Hadoop Flags: Reviewed Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > ClusterMetrics should support custom resource capacity related metrics. > --- > > Key: YARN-10713 > URL: https://issues.apache.org/jira/browse/YARN-10713 > Project: Hadoop YARN > Issue Type: Sub-task > Components: metrics >Affects Versions: 3.4.0, 3.3.1 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10713.001.patch, YARN-10713.002.patch > > > YARN-10688 > Only add gpu resource capacity related metrics, i think we should improve it > to support custom resources as [~ebadger] suggested. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10641) Refactor the max app related update, and fix maxApplications update error when add new queues.
[ https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10641: -- Component/s: capacity scheduler Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Refactor the max app related update, and fix maxApplications update error > when add new queues. > -- > > Key: YARN-10641 > URL: https://issues.apache.org/jira/browse/YARN-10641 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Fix For: 3.4.0 > > Attachments: YARN-10641.001.patch, YARN-10641.002.patch, > YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, > YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, > image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, > image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png > > > When refactor the update logic in YARN-10504 . > The update max applications based abs/cap is wrong, this should be fixed, > because the max applications is key part to limit applications in CS. > For example: > When adding a dynamic queue, the other children's max app of parent queue are > not updated correctly: > !image-2021-02-20-15-53-51-099.png|width=639,height=509! > The new added queue's max app will updated correctly: > !image-2021-02-20-15-55-44-780.png|width=542,height=426! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10659) Improve CS MappingRule %secondary_group evaluation
[ https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10659: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Improve CS MappingRule %secondary_group evaluation > -- > > Key: YARN-10659 > URL: https://issues.apache.org/jira/browse/YARN-10659 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Gergely Pollák >Assignee: Gergely Pollák >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10659.001.patch, YARN-10659.002.patch, > YARN-10659.003.patch > > > Since the leaf queue names are not unique, there are a lot of use cases where > %secondary_group evaluation fail, or behave inconsistently. > We should extend it's behavior, when it's under a defined parent, > %secondary_group evaluation should only check for queue existence under that > queue. Egy root.group.%secondary_group, should only evaluate to groups which > exist under root.group, while the legacy %secondary_group.%user should still > look for groups by their leaf name globally. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10689) Fix the findbugs issues in extractFloatValueFromWeightConfig.
[ https://issues.apache.org/jira/browse/YARN-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10689: -- Component/s: capacity scheduler Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix the findbugs issues in extractFloatValueFromWeightConfig. > - > > Key: YARN-10689 > URL: https://issues.apache.org/jira/browse/YARN-10689 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Float.valueOf causes the finding bugs. > I will help fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10686) Fix TestCapacitySchedulerAutoQueueCreation#testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode
[ https://issues.apache.org/jira/browse/YARN-10686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated YARN-10686: -- Component/s: capacity scheduler Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix > TestCapacitySchedulerAutoQueueCreation#testAutoQueueCreationFailsForEmptyPathWithAQCAndWeightMode > - > > Key: YARN-10686 > URL: https://issues.apache.org/jira/browse/YARN-10686 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.4.0 >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10686.001.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org