[ https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299824#comment-17299824 ]
Eric Badger commented on YARN-10688: ------------------------------------ {noformat} 2021-03-11 19:25:11,183 ERROR [SchedulerEventDispatcher:Event Processor] event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type NODE_ADDED to the Event Dispatcher org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: The resource manager encountered a problem that should not occur under normal circumstances. Please report this error to the Hadoop community by opening a JIRA ticket at http://issues.apache.org/jira and including the following information: * Resource type requested: yarn.io/gpu * Resource object: <memory:30000, vCores:300> * The stack trace for this exception: java.lang.Exception at org.apache.hadoop.yarn.exceptions.ResourceNotFoundException.<init>(ResourceNotFoundException.java:47) at org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:263) at org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.incrCapability(ClusterMetrics.java:222) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.addNode(ClusterNodeTracker.java:110) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addNode(CapacityScheduler.java:2201) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1937) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79) at java.lang.Thread.run(Thread.java:748) {noformat} This is the error I get when I start up the RM in a cluster without any GPUs > ClusterMetrics should support GPU related metrics. > -------------------------------------------------- > > Key: YARN-10688 > URL: https://issues.apache.org/jira/browse/YARN-10688 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager > Affects Versions: 3.2.2, 3.4.0 > Reporter: Qi Zhu > Assignee: Qi Zhu > Priority: Major > Attachments: YARN-10688.001.patch, image-2021-03-11-15-35-49-625.png > > > Now the ClusterMetrics only support memory and Vcore related metrics. > > {code:java} > @Metric("Memory Utilization") MutableGaugeLong utilizedMB; > @Metric("Vcore Utilization") MutableGaugeLong utilizedVirtualCores; > @Metric("Memory Capability") MutableGaugeLong capabilityMB; > @Metric("Vcore Capability") MutableGaugeLong capabilityVirtualCores; > {code} > > > !image-2021-03-11-15-35-49-625.png|width=593,height=253! > In our cluster, we added GPU supported, so i think the GPU related metrics > should also be supported by ClusterMetrics. > > cc [~pbacsko] [~Jim_Brennan] [~ebadger] [~gandras] > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org