[
https://issues.apache.org/jira/browse/YARN-7427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236807#comment-16236807
]
Jonathan Hung commented on YARN-7427:
-------------------------------------
So the issue only appears to be when a queue's label is unused so far (e.g. on
startup), {noformat} public PartitionResourcesInfo
getPartitionResourceUsageInfo(
String partitionName) {
for (PartitionResourcesInfo partitionResourceUsageInfo :
resourceUsagesByPartition) {
if (partitionResourceUsageInfo.getPartitionName().equals(partitionName)) {
return partitionResourceUsageInfo;
}
}
return new PartitionResourcesInfo();
}{noformat} will return empty PartitionResourcesInfo if partitionName is not
found in resourceUsagesByPartition. resourceUsagesByPartition is populated by
the ResourcesInfo constructor (via resourceUsage.getNodePartitionsSet()), but
an AbstractCSQueue's queueUsage map only contains a label if this queue has
allocated/reserved/set AM limit/etc. for this label.
I couldn't repro this issue with one queue, since it seems when
AbstractCSQueue#getNodeLabelsForQueue is called in
LeafQueue#activateApplications (when node registration causes
LeafQueue#updateClusterResource to be called), it grabs the labels from the
AbstractCSQueue's queueCapacities object, which should be configured since this
queue should be configured with 100 "x" capacity. Since "x" is in the
queueCapacities object, when activateApplications is called, it will setAMLimit
for this queue's queueUsages for "x". But if two queues are configured, and
one queue does not have capacity for this label, this label is not in the
queueCapacities object, and thus not in the queueUsage map either.
Since the exception is thrown here in CapacitySchedulerPage: {noformat}
__("Used Resources:", resourceUsages.getUsed().toString()).{noformat}
and ResourceInfo.resources is null from the empty ResourceInfo constructor, I
think the easiest solution is to just change the ResourceInfo constructor.
Attached 001 for this
> NullPointerException in ResourceInfo when queue has not used label
> ------------------------------------------------------------------
>
> Key: YARN-7427
> URL: https://issues.apache.org/jira/browse/YARN-7427
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Jonathan Hung
> Priority: Major
> Attachments: YARN-7427.001.patch
>
>
> {noformat}Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.ResourceInfo.toString(ResourceInfo.java:65)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:164)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:107)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:96)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
> at
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:301)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
> at
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:470)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:86)
> ... 56 more{noformat}
> For example, configure: {noformat} <property>
> <name>yarn.scheduler.capacity.root.queues</name>
> <value>default,a</value>
> </property>
> <property>
> <name>yarn.scheduler.capacity.root.accessible-node-labels</name>
> <value>x</value>
> </property>
> <property>
> <name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
> <value>x</value>
> </property>
> <property>
>
> <name>yarn.scheduler.capacity.root.default.accessible-node-labels.x.maximum-capacity</name>
> <value>100</value>
> </property>
> <property>
>
> <name>yarn.scheduler.capacity.root.default.accessible-node-labels.x.capacity</name>
> <value>100</value>
> </property>{noformat}
> , then the above exception is thrown when refreshing the scheduler UI
> (/cluster/scheduler)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]