[jira] [Updated] (YARN-10934) LeafQueue activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan LUO updated YARN-10934: Summary: LeafQueue activateApplications NPE (was: activateApplications NPE) > LeafQueue activateApplications NPE > -- > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10934) activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411636#comment-17411636 ] Yuan LUO commented on YARN-10934: - [~snemeth] Thanks for your reply, have fixed title, it is a NPE Error. I will add some information in the attachment. > activateApplications NPE > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) activateApplications NPE
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan LUO updated YARN-10934: Summary: activateApplications NPE (was: activateApplications NPL) > activateApplications NPE > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10929) Refrain from creating new Configuration object in AbstractManagedParentQueue#initializeLeafQueueConfigs
[ https://issues.apache.org/jira/browse/YARN-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411624#comment-17411624 ] jackwangcs commented on YARN-10929: --- Hi [~snemeth], seems that I don't have the permission to assign this to me. Could you help to assign it to me? Thanks. > Refrain from creating new Configuration object in > AbstractManagedParentQueue#initializeLeafQueueConfigs > --- > > Key: YARN-10929 > URL: https://issues.apache.org/jira/browse/YARN-10929 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > AbstractManagedParentQueue#initializeLeafQueueConfigs creates a new > CapacitySchedulerConfiguration with templated configs only. We should stop > doing this. > Also, there is a sorting of config keys in this method, but in the end the > configs are added to the Configuration object which is an enhanced Map. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10935: -- Attachment: YARN-10935.001.patch > AM Total Queue Limit goes below per-user AM Limit if parent is full. > > > Key: YARN-10935 > URL: https://issues.apache.org/jira/browse/YARN-10935 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: Screen Shot 2021-09-07 at 12.49.52 PM.png, Screen Shot > 2021-09-07 at 12.55.37 PM.png, YARN-10935.001.patch > > > This happens when DRF is enabled and all of one resource is consumed but the > second resources still has plenty available. > This is reproduceable by setting up a parent queue where the capacity and max > capacity are the same, with 2 or more sub-queues whose max capacity is 100%. > In one of the sub-queues, start a long-running app that consumes all > resources in the parent queue's hieararchy. This app will consume all of the > memory but not vary many vcores (for example) > In a second queue, submit an app. The *{{Max Application Master Resources Per > User}}* limit is much more than the *{{Max Application Master Resources}}* > limit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411398#comment-17411398 ] Eric Payne commented on YARN-10935: --- For example, In the following screenshot, the advertising queue is a child of root and a parent of 3 sub-queues. One of the sub-queues has consumed all of the advertising parent queue's resources. The second sub-queue has submitted two apps. One of them is schedulable and one is non-schedulable. The second app is non-schedulable because starting the app would put the queue above the queue's AM limit: !Screen Shot 2021-09-07 at 12.49.52 PM.png! See that the second app can't start because of the following: !Screen Shot 2021-09-07 at 12.55.37 PM.png! Note that, in this example, the max queue AM limit should never go below 2GB memory and 16 vCores. > AM Total Queue Limit goes below per-user AM Limit if parent is full. > > > Key: YARN-10935 > URL: https://issues.apache.org/jira/browse/YARN-10935 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: Screen Shot 2021-09-07 at 12.49.52 PM.png, Screen Shot > 2021-09-07 at 12.55.37 PM.png > > > This happens when DRF is enabled and all of one resource is consumed but the > second resources still has plenty available. > This is reproduceable by setting up a parent queue where the capacity and max > capacity are the same, with 2 or more sub-queues whose max capacity is 100%. > In one of the sub-queues, start a long-running app that consumes all > resources in the parent queue's hieararchy. This app will consume all of the > memory but not vary many vcores (for example) > In a second queue, submit an app. The *{{Max Application Master Resources Per > User}}* limit is much more than the *{{Max Application Master Resources}}* > limit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10935: -- Attachment: Screen Shot 2021-09-07 at 12.55.37 PM.png > AM Total Queue Limit goes below per-user AM Limit if parent is full. > > > Key: YARN-10935 > URL: https://issues.apache.org/jira/browse/YARN-10935 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: Screen Shot 2021-09-07 at 12.49.52 PM.png, Screen Shot > 2021-09-07 at 12.55.37 PM.png > > > This happens when DRF is enabled and all of one resource is consumed but the > second resources still has plenty available. > This is reproduceable by setting up a parent queue where the capacity and max > capacity are the same, with 2 or more sub-queues whose max capacity is 100%. > In one of the sub-queues, start a long-running app that consumes all > resources in the parent queue's hieararchy. This app will consume all of the > memory but not vary many vcores (for example) > In a second queue, submit an app. The *{{Max Application Master Resources Per > User}}* limit is much more than the *{{Max Application Master Resources}}* > limit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10935: -- Attachment: Screen Shot 2021-09-07 at 12.49.52 PM.png > AM Total Queue Limit goes below per-user AM Limit if parent is full. > > > Key: YARN-10935 > URL: https://issues.apache.org/jira/browse/YARN-10935 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: Screen Shot 2021-09-07 at 12.49.52 PM.png > > > This happens when DRF is enabled and all of one resource is consumed but the > second resources still has plenty available. > This is reproduceable by setting up a parent queue where the capacity and max > capacity are the same, with 2 or more sub-queues whose max capacity is 100%. > In one of the sub-queues, start a long-running app that consumes all > resources in the parent queue's hieararchy. This app will consume all of the > memory but not vary many vcores (for example) > In a second queue, submit an app. The *{{Max Application Master Resources Per > User}}* limit is much more than the *{{Max Application Master Resources}}* > limit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10935) AM Total Queue Limit goes below per-user AM Limit if parent is full.
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-10935: -- Summary: AM Total Queue Limit goes below per-user AM Limit if parent is full. (was: AM Total Queue Limit goes below per-uwer AM Limit if parent is full.) > AM Total Queue Limit goes below per-user AM Limit if parent is full. > > > Key: YARN-10935 > URL: https://issues.apache.org/jira/browse/YARN-10935 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > > This happens when DRF is enabled and all of one resource is consumed but the > second resources still has plenty available. > This is reproduceable by setting up a parent queue where the capacity and max > capacity are the same, with 2 or more sub-queues whose max capacity is 100%. > In one of the sub-queues, start a long-running app that consumes all > resources in the parent queue's hieararchy. This app will consume all of the > memory but not vary many vcores (for example) > In a second queue, submit an app. The *{{Max Application Master Resources Per > User}}* limit is much more than the *{{Max Application Master Resources}}* > limit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10935) AM Total Queue Limit goes below per-uwer AM Limit if parent is full.
Eric Payne created YARN-10935: - Summary: AM Total Queue Limit goes below per-uwer AM Limit if parent is full. Key: YARN-10935 URL: https://issues.apache.org/jira/browse/YARN-10935 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler, capacityscheduler Reporter: Eric Payne This happens when DRF is enabled and all of one resource is consumed but the second resources still has plenty available. This is reproduceable by setting up a parent queue where the capacity and max capacity are the same, with 2 or more sub-queues whose max capacity is 100%. In one of the sub-queues, start a long-running app that consumes all resources in the parent queue's hieararchy. This app will consume all of the memory but not vary many vcores (for example) In a second queue, submit an app. The *{{Max Application Master Resources Per User}}* limit is much more than the *{{Max Application Master Resources}}* limit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10935) AM Total Queue Limit goes below per-uwer AM Limit if parent is full.
[ https://issues.apache.org/jira/browse/YARN-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned YARN-10935: - Assignee: Eric Payne > AM Total Queue Limit goes below per-uwer AM Limit if parent is full. > > > Key: YARN-10935 > URL: https://issues.apache.org/jira/browse/YARN-10935 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, capacityscheduler >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > > This happens when DRF is enabled and all of one resource is consumed but the > second resources still has plenty available. > This is reproduceable by setting up a parent queue where the capacity and max > capacity are the same, with 2 or more sub-queues whose max capacity is 100%. > In one of the sub-queues, start a long-running app that consumes all > resources in the parent queue's hieararchy. This app will consume all of the > memory but not vary many vcores (for example) > In a second queue, submit an app. The *{{Max Application Master Resources Per > User}}* limit is much more than the *{{Max Application Master Resources}}* > limit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10928) Support default queue properties of capacity scheduler to simplify configuration management
[ https://issues.apache.org/jira/browse/YARN-10928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411357#comment-17411357 ] Weiwei Yang commented on YARN-10928: Sure, granted the contributor role to [~Weihao Zheng]. Thanks > Support default queue properties of capacity scheduler to simplify > configuration management > --- > > Key: YARN-10928 > URL: https://issues.apache.org/jira/browse/YARN-10928 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Weihao Zheng >Assignee: Weihao Zheng >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > There are many user cases that one user owns many queues in his > organization's cluster for different business usages in practice. These > queues often share the same properties, such as minimum-user-limit-percent > and user-limit-factor. Users have to write one property for every queue they > use if they want to use customized these shared properties. Adding default > queue properties for these cases will simplify capacity scheduler's > configuration file and make it easy to adjust queue's common properties. > > CHANGES: > Add two properties as queue's default value in capacity scheduler's > configuration: > * {{yarn.scheduler.capacity.minimum-user-limit-percent}} > * {{yarn.scheduler.capacity.user-limit-factor}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10928) Support default queue properties of capacity scheduler to simplify configuration management
[ https://issues.apache.org/jira/browse/YARN-10928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned YARN-10928: -- Assignee: Weihao Zheng > Support default queue properties of capacity scheduler to simplify > configuration management > --- > > Key: YARN-10928 > URL: https://issues.apache.org/jira/browse/YARN-10928 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Weihao Zheng >Assignee: Weihao Zheng >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > There are many user cases that one user owns many queues in his > organization's cluster for different business usages in practice. These > queues often share the same properties, such as minimum-user-limit-percent > and user-limit-factor. Users have to write one property for every queue they > use if they want to use customized these shared properties. Adding default > queue properties for these cases will simplify capacity scheduler's > configuration file and make it easy to adjust queue's common properties. > > CHANGES: > Add two properties as queue's default value in capacity scheduler's > configuration: > * {{yarn.scheduler.capacity.minimum-user-limit-percent}} > * {{yarn.scheduler.capacity.user-limit-factor}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10872) Replace getPropsWithPrefix calls in AutoCreatedQueueTemplate
[ https://issues.apache.org/jira/browse/YARN-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-10872: -- Labels: pull-request-available (was: ) > Replace getPropsWithPrefix calls in AutoCreatedQueueTemplate > > > Key: YARN-10872 > URL: https://issues.apache.org/jira/browse/YARN-10872 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Andras Gyori >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > With the introduction of YARN-10838, it is now possible to optimise > AutoCreatedQueueTemplate and replace calls of getPropsWithPrefix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10884) EntityGroupFSTimelineStore fails to parse log files which has empty owner
[ https://issues.apache.org/jira/browse/YARN-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411333#comment-17411333 ] Prabhu Joseph commented on YARN-10884: -- Thanks [~Swathi Chandrashekar] for the patch. Have committed it in trunk. > EntityGroupFSTimelineStore fails to parse log files which has empty owner > - > > Key: YARN-10884 > URL: https://issues.apache.org/jira/browse/YARN-10884 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 3.3.1 >Reporter: Prabhu Joseph >Assignee: SwathiChandrashekar >Priority: Major > Fix For: 3.3.1 > > Time Spent: 1h > Remaining Estimate: 0h > > Due to [HADOOP-17848|https://issues.apache.org/jira/browse/HADOOP-17848] - > Wasb FileSystem sets owner as empty during append operation. > ATS1.5 fails to read such files with below error > {code:java} > java.lang.IllegalArgumentException: Null user > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1271) > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1258) > at > org.apache.hadoop.yarn.server.timeline.LogInfo.parsePath(LogInfo.java:141) > at > org.apache.hadoop.yarn.server.timeline.LogInfo.parseForStore(LogInfo.java:114) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:701) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:675) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$ActiveLogParser.run(EntityGroupFSTimelineStore.java:888) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} > It gets ownership of the file to check ACL. In case of disabled ACL check, > this is not required. Will suggest to add anonymous user in case of empty > user. > {code} > if (owner.isEmpty()) { > user = "anonymous"; > } else { > user = owner; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10884) EntityGroupFSTimelineStore fails to parse log files which has empty owner
[ https://issues.apache.org/jira/browse/YARN-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-10884: - Labels: (was: pull-request-available) > EntityGroupFSTimelineStore fails to parse log files which has empty owner > - > > Key: YARN-10884 > URL: https://issues.apache.org/jira/browse/YARN-10884 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 3.3.1 >Reporter: Prabhu Joseph >Assignee: SwathiChandrashekar >Priority: Major > Fix For: 3.3.1 > > Time Spent: 1h > Remaining Estimate: 0h > > Due to [HADOOP-17848|https://issues.apache.org/jira/browse/HADOOP-17848] - > Wasb FileSystem sets owner as empty during append operation. > ATS1.5 fails to read such files with below error > {code:java} > java.lang.IllegalArgumentException: Null user > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1271) > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1258) > at > org.apache.hadoop.yarn.server.timeline.LogInfo.parsePath(LogInfo.java:141) > at > org.apache.hadoop.yarn.server.timeline.LogInfo.parseForStore(LogInfo.java:114) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:701) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:675) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$ActiveLogParser.run(EntityGroupFSTimelineStore.java:888) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} > It gets ownership of the file to check ACL. In case of disabled ACL check, > this is not required. Will suggest to add anonymous user in case of empty > user. > {code} > if (owner.isEmpty()) { > user = "anonymous"; > } else { > user = owner; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10934) activateApplications NPL
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411281#comment-17411281 ] Szilard Nemeth edited comment on YARN-10934 at 9/7/21, 2:38 PM: Hi [~luoyuan], Can you attach a full yarn-site.xml config file here? Probably there's also something else than the DominantResourceCalculator that comes into play here. If you have sensitive info like queues names or something like that, you may mask or replace the data with some dummy values. A question: What's "NPL" in the title? Did you want to refer to NPE (NullPointerException) or something else? Thanks. was (Author: snemeth): Hi [~luoyuan], Can you attach a full yarn-site.xml config file here? Probably there's also something else than the DominantResourceCalculator that comes into play here. If you have sensitive info like queues names or something like that, you may mask or replace the data with some dummy values. > activateApplications NPL > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10934) activateApplications NPL
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411281#comment-17411281 ] Szilard Nemeth commented on YARN-10934: --- Hi [~luoyuan], Can you attach a full yarn-site.xml config file here? Probably there's also something else than the DominantResourceCalculator that comes into play here. If you have sensitive info like queues names or something like that, you may mask or replace the data with some dummy values. > activateApplications NPL > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10917) Investigate and simplify CapacitySchedulerConfigValidator#validateQueueHierarchy
[ https://issues.apache.org/jira/browse/YARN-10917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-10917: -- Labels: pull-request-available (was: ) > Investigate and simplify > CapacitySchedulerConfigValidator#validateQueueHierarchy > > > Key: YARN-10917 > URL: https://issues.apache.org/jira/browse/YARN-10917 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Szilard Nemeth >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10934) activateApplications NPL
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411200#comment-17411200 ] Yuan LUO commented on YARN-10934: - Hi [~zhuqi] [~gandras] [~bteke] [~taoyang] Could you have a look at this issue, thanks! > activateApplications NPL > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10934) activateApplications NPL
[ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan LUO updated YARN-10934: Description: Our prod Yarn cluster is hadoop version 3.3.1 , we changed DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then our RM crashed, the Exception stack like below. I think this is a serious bug and hope someone can follow up and fix it. 2021-08-30 21:00:59,114 ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) at java.base/java.lang.Thread.run(Thread.java:834) was: Our prod Yarn cluster is hadoop version 3.3.1 , we changed DefaultResourceCalculator -> DominantResourceCalculator, then our RM crashed, the Exception stack like below: 2021-08-30 21:00:59,114 ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) at java.base/java.lang.Thread.run(Thread.java:834) > activateApplications NPL > > > Key: YARN-10934 > URL: https://issues.apache.org/jira/browse/YARN-10934 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 3.3.1 >Reporter: Yuan LUO >Priority: Major > > Our prod Yarn cluster is hadoop version 3.3.1 , we changed > DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then > our RM crashed, the Exception stack like below. I think this is a serious > bug and hope someone can follow up and fix it. > 2021-08-30 21:00:59,114 ERROR event.EventDispatcher > (MarkerIgnoringBase.java:error(159)) - Error in handling event type > APP_ATTEMPT_REMOVED to the Event Dispatcher > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) > at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands,
[jira] [Created] (YARN-10934) activateApplications NPL
Yuan LUO created YARN-10934: --- Summary: activateApplications NPL Key: YARN-10934 URL: https://issues.apache.org/jira/browse/YARN-10934 Project: Hadoop YARN Issue Type: Bug Components: RM Affects Versions: 3.3.1 Reporter: Yuan LUO Our prod Yarn cluster is hadoop version 3.3.1 , we changed DefaultResourceCalculator -> DominantResourceCalculator, then our RM crashed, the Exception stack like below: 2021-08-30 21:00:59,114 ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) at java.base/java.lang.Thread.run(Thread.java:834) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8958) Schedulable entities leak in fair ordering policy when recovering containers between remove app attempt and remove app
[ https://issues.apache.org/jira/browse/YARN-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410967#comment-17410967 ] Hadoop QA commented on YARN-8958: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red}{color} | {color:red} YARN-8958 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8958 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12946245/YARN-8958.002.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/1203/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Schedulable entities leak in fair ordering policy when recovering containers > between remove app attempt and remove app > -- > > Key: YARN-8958 > URL: https://issues.apache.org/jira/browse/YARN-8958 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-8958.001.patch, YARN-8958.002.patch > > > We found a NPE in ClientRMService#getApplications when querying apps with > specified queue. The cause is that there is one app which can't be found by > calling RMContextImpl#getRMApps(is finished and swapped out of memory) but > still can be queried from fair ordering policy. > To reproduce schedulable entities leak in fair ordering policy: > (1) create app1 and launch container1 on node1 > (2) restart RM > (3) remove app1 attempt, app1 is removed from the schedulable entities. > (4) recover container1 after node1 reconnected to RM, then the state of > contianer1 is changed to COMPLETED, app1 is bring back to entitiesToReorder > after container released, then app1 will be added back into schedulable > entities after calling FairOrderingPolicy#getAssignmentIterator by scheduler. > (5) remove app1 > To solve this problem, we should make sure schedulableEntities can only be > affected by add or remove app attempt, new entity should not be added into > schedulableEntities by reordering process. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > schedulableEntities.add(schedulableEntity); > } > {code} > Related codes above can be improved as follow to make sure only existent > entity can be re-add into schedulableEntities. > {code:java} > protected void reorderSchedulableEntity(S schedulableEntity) { > //remove, update comparable data, and reinsert to update position in order > boolean exists = schedulableEntities.remove(schedulableEntity); > updateSchedulingResourceUsage( > schedulableEntity.getSchedulingResourceUsage()); > if (exists) { > schedulableEntities.add(schedulableEntity); > } else { > LOG.info("Skip reordering non-existent schedulable entity: " > + schedulableEntity.getId()); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org