[ 
https://issues.apache.org/jira/browse/YARN-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299824#comment-17299824
 ] 

Eric Badger commented on YARN-10688:
------------------------------------

{noformat}
2021-03-11 19:25:11,183 ERROR [SchedulerEventDispatcher:Event Processor] 
event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling 
event type NODE_ADDED to the Event Dispatcher
org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: The resource 
manager encountered a problem that should not occur under normal circumstances. 
Please report this error to the Hadoop community by opening a JIRA ticket at 
http://issues.apache.org/jira and including the following information:
* Resource type requested: yarn.io/gpu
* Resource object: <memory:30000, vCores:300>
* The stack trace for this exception: java.lang.Exception
        at 
org.apache.hadoop.yarn.exceptions.ResourceNotFoundException.<init>(ResourceNotFoundException.java:47)
        at 
org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:263)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.incrCapability(ClusterMetrics.java:222)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.addNode(ClusterNodeTracker.java:110)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addNode(CapacityScheduler.java:2201)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1937)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
        at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

This is the error I get when I start up the RM in a cluster without any GPUs

> ClusterMetrics should support GPU related metrics.
> --------------------------------------------------
>
>                 Key: YARN-10688
>                 URL: https://issues.apache.org/jira/browse/YARN-10688
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: metrics, resourcemanager
>    Affects Versions: 3.2.2, 3.4.0
>            Reporter: Qi Zhu
>            Assignee: Qi Zhu
>            Priority: Major
>         Attachments: YARN-10688.001.patch, image-2021-03-11-15-35-49-625.png
>
>
> Now the ClusterMetrics only support memory and Vcore related metrics.
>  
> {code:java}
> @Metric("Memory Utilization") MutableGaugeLong utilizedMB;
> @Metric("Vcore Utilization") MutableGaugeLong utilizedVirtualCores;
> @Metric("Memory Capability") MutableGaugeLong capabilityMB;
> @Metric("Vcore Capability") MutableGaugeLong capabilityVirtualCores;
> {code}
>  
>  
> !image-2021-03-11-15-35-49-625.png|width=593,height=253!
> In our cluster, we added GPU supported, so i think the GPU related metrics 
> should also be supported by ClusterMetrics.
>  
> cc [~pbacsko]  [~Jim_Brennan]  [~ebadger]  [~gandras]  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to