[ 
https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607172#comment-16607172
 ] 

Eric Payne commented on YARN-8709:
----------------------------------

[~Tao Yang], thanks for the patch. The changes look good. One small nit:
Unless I am mis-counting, the amount of pending resources for queue b should be 
50 and not 60 in 
{{TestProportionalCapacityPreemptionPolicyIntraQueue#testIntraQueuePreemptionAfterQueueDropped}}

> intra-queue preemption checker always fail since one under-served queue was 
> deleted
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-8709
>                 URL: https://issues.apache.org/jira/browse/YARN-8709
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler, scheduler preemption
>    Affects Versions: 3.2.0
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-8709.001.patch
>
>
> After some queues deleted, the preemption checker in SchedulingMonitor was 
> always skipped  because of YarnRuntimeException for every run.
> Error logs:
> {noformat}
> ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: 
> Exception raised while executing preemption checker, skip this run..., 
> exception=
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't 
> happen, cannot find TempQueuePerPartition for queueName=1535075839208
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>         at java.lang.Thread.run(Thread.java:834)
> {noformat}
> I think there is something wrong with partitionToUnderServedQueues field in 
> ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues 
> can be add but never be removed, except rebuilding this policy. For example, 
> once under-served queue "a" is added into this structure, it will always be 
> there and never be removed, intra-queue preemption checker will try to get 
> all queues info for partitionToUnderServedQueues in 
> IntraQueueCandidatesSelector#selectCandidates and will throw 
> YarnRuntimeException if not found. So that after queue "a" is deleted from 
> queue structure, the preemption checker will always fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to