[ 
https://issues.apache.org/jira/browse/YARN-11801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susheel Gupta updated YARN-11801:
---------------------------------
    Description: 
When enabling the ProportionalCapacityPreemptionPolicy in the YARN 
SchedulingMonitor, I encountered a NullPointerException in 
{{{}FifoCandidatesSelector.selectCandidates{}}}. This happens when an 
auto-created queue exists but does not have any child queues.

A childless ParentQueue will throw a NPE in 
FifoCandidatesSelector#selectCandidates:
{code:java}
        LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName,
          RMNodeLabelsManager.NO_LABEL).leafQueue;{code}
NullPointer stack trace:
{code:java}
2025-03-24 08:36:12,593 ERROR monitor.SchedulingMonitor: Exception raised while 
executing preemption checker, skip this run..., exception=
java.lang.NullPointerException
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.selectCandidates(FifoCandidatesSelector.java:104)
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:515)
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:344)
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:100)
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:112)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748){code}
Capacity-scheduler used:
{noformat}
<configuration>
    <property>
        <name>yarn.scheduler.capacity.mapping-rule-json</name>
        <value/>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.client.capacity</name>
        <value>50</value>
    </property>
    <property>
        
<name>yarn.scheduler.capacity.root.client.leaf-queue-template.capacity</name>
        <value>0</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.client.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        
<name>yarn.scheduler.capacity.root.client.auto-create-child-queue.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>50</value>
    </property>
    <property>
        
<name>yarn.scheduler.capacity.root.client.leaf-queue-template.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default,client</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.schedule-asynchronously.enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.webservice.mutation-api.version</name>
        <value>1742806178771</value>
    </property>
    <property>
        
<name>yarn.scheduler.capacity.root.default.maximum-am-resource-percent</name>
        <value>0.2</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.mapping-rule-format</name>
        <value>json</value>
    </property>
</configuration>{noformat}
Add this in yarn-site.xml:
 
{code:java}
 <property>
    <name>yarn.resourcemanager.scheduler.monitor.enable</name>
    <value>true</value>
 </property>
  <property>
<name>yarn.resourcemanager.scheduler.monitor.policies</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueManagementDynamicEditPolicy</value>
  </property>{code}

  was:
When enabling the ProportionalCapacityPreemptionPolicy in the YARN 
SchedulingMonitor, I encountered a NullPointerException in 
{{{}FifoCandidatesSelector.selectCandidates{}}}. This happens when an 
auto-created queue exists but does not have any child queues.

NullPointer stack trace:
{code:java}
2025-03-24 08:36:12,593 ERROR monitor.SchedulingMonitor: Exception raised while 
executing preemption checker, skip this run..., exception=
java.lang.NullPointerException
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.selectCandidates(FifoCandidatesSelector.java:104)
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:515)
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:344)
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:100)
    at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:112)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748){code}
Capacity-scheduler used:
{noformat}
<configuration>
    <property>
        <name>yarn.scheduler.capacity.mapping-rule-json</name>
        <value/>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.client.capacity</name>
        <value>50</value>
    </property>
    <property>
        
<name>yarn.scheduler.capacity.root.client.leaf-queue-template.capacity</name>
        <value>0</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.client.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        
<name>yarn.scheduler.capacity.root.client.auto-create-child-queue.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>50</value>
    </property>
    <property>
        
<name>yarn.scheduler.capacity.root.client.leaf-queue-template.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default,client</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
        <value>100</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.schedule-asynchronously.enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.webservice.mutation-api.version</name>
        <value>1742806178771</value>
    </property>
    <property>
        
<name>yarn.scheduler.capacity.root.default.maximum-am-resource-percent</name>
        <value>0.2</value>
    </property>
    <property>
        <name>yarn.scheduler.capacity.mapping-rule-format</name>
        <value>json</value>
    </property>
</configuration>{noformat}
Add this in yarn-site.xml:
 
{code:java}
 <property>
    <name>yarn.resourcemanager.scheduler.monitor.enable</name>
    <value>true</value>
 </property>
  <property>
<name>yarn.resourcemanager.scheduler.monitor.policies</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueManagementDynamicEditPolicy</value>
  </property>{code}


> NPE in FifoCandidatesSelector.selectCandidates when preempting resources for 
> an auto-created queue without child queues
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-11801
>                 URL: https://issues.apache.org/jira/browse/YARN-11801
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 3.4.0, 3.5.0
>            Reporter: Susheel Gupta
>            Assignee: Susheel Gupta
>            Priority: Major
>
> When enabling the ProportionalCapacityPreemptionPolicy in the YARN 
> SchedulingMonitor, I encountered a NullPointerException in 
> {{{}FifoCandidatesSelector.selectCandidates{}}}. This happens when an 
> auto-created queue exists but does not have any child queues.
> A childless ParentQueue will throw a NPE in 
> FifoCandidatesSelector#selectCandidates:
> {code:java}
>         LeafQueue leafQueue = preemptionContext.getQueueByPartition(queueName,
>           RMNodeLabelsManager.NO_LABEL).leafQueue;{code}
> NullPointer stack trace:
> {code:java}
> 2025-03-24 08:36:12,593 ERROR monitor.SchedulingMonitor: Exception raised 
> while executing preemption checker, skip this run..., exception=
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.FifoCandidatesSelector.selectCandidates(FifoCandidatesSelector.java:104)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:515)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:344)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:100)
>     at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:112)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748){code}
> Capacity-scheduler used:
> {noformat}
> <configuration>
>     <property>
>         <name>yarn.scheduler.capacity.mapping-rule-json</name>
>         <value/>
>     </property>
>     <property>
>         <name>yarn.scheduler.capacity.root.client.capacity</name>
>         <value>50</value>
>     </property>
>     <property>
>         
> <name>yarn.scheduler.capacity.root.client.leaf-queue-template.capacity</name>
>         <value>0</value>
>     </property>
>     <property>
>         <name>yarn.scheduler.capacity.root.client.maximum-capacity</name>
>         <value>100</value>
>     </property>
>     <property>
>         
> <name>yarn.scheduler.capacity.root.client.auto-create-child-queue.enabled</name>
>         <value>true</value>
>     </property>
>     <property>
>         <name>yarn.scheduler.capacity.root.default.capacity</name>
>         <value>50</value>
>     </property>
>     <property>
>         
> <name>yarn.scheduler.capacity.root.client.leaf-queue-template.maximum-capacity</name>
>         <value>100</value>
>     </property>
>     <property>
>         <name>yarn.scheduler.capacity.root.queues</name>
>         <value>default,client</value>
>     </property>
>     <property>
>         <name>yarn.scheduler.capacity.root.capacity</name>
>         <value>100</value>
>     </property>
>     <property>
>         <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
>         <value>100</value>
>     </property>
>     <property>
>         <name>yarn.scheduler.capacity.schedule-asynchronously.enable</name>
>         <value>true</value>
>     </property>
>     <property>
>         <name>yarn.webservice.mutation-api.version</name>
>         <value>1742806178771</value>
>     </property>
>     <property>
>         
> <name>yarn.scheduler.capacity.root.default.maximum-am-resource-percent</name>
>         <value>0.2</value>
>     </property>
>     <property>
>         <name>yarn.scheduler.capacity.mapping-rule-format</name>
>         <value>json</value>
>     </property>
> </configuration>{noformat}
> Add this in yarn-site.xml:
>  
> {code:java}
>  <property>
>     <name>yarn.resourcemanager.scheduler.monitor.enable</name>
>     <value>true</value>
>  </property>
>   <property>
> <name>yarn.resourcemanager.scheduler.monitor.policies</name>
> <value>org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy,org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueManagementDynamicEditPolicy</value>
>   </property>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to