[ 
https://issues.apache.org/jira/browse/YARN-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089188#comment-18089188
 ] 

ASF GitHub Bot commented on YARN-11965:
---------------------------------------

K0K0V0K commented on PR #8547:
URL: https://github.com/apache/hadoop/pull/8547#issuecomment-4709807027

   Thanks @zhengchenyu for the change!
   Overall, this looks good to me.
   
   One thing that came to mind: if I understand correctly, the 
`/ws/v1/cluster/metrics` endpoint currently has a fairly predictable response 
size and response time. With this change, both could become dependent on the 
number of partitions in the cluster.
   
   As far as I know, a single ResourceManager can handle around 10,000 
NodeManagers. In a hypothetical worst-case scenario where each NodeManager 
belongs to a different partition, could you measure how the response time is 
affected?
   
   I’m not familiar with your use case, but have you considered exposing this 
information through a separate endpoint instead? For example, something like 
`/ws/v1/cluster/partitions/metrics` or` /ws/v1/cluster/partition-metrics` might 
avoid introducing any regression in the response size or latency of the 
existing /ws/v1/cluster/metrics endpoint while still making the additional data 
available.




> Support partition-aware resource metrics in RM cluster metrics REST API
> -----------------------------------------------------------------------
>
>                 Key: YARN-11965
>                 URL: https://issues.apache.org/jira/browse/YARN-11965
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Chenyu Zheng
>            Assignee: Chenyu Zheng
>            Priority: Major
>              Labels: pull-request-available
>
> When node labels are enabled, different labels represent separate resource 
> pools. However, the RM REST API `/ws/v1/cluster/metrics` currently exposes 
> fields such as totalMB and totalVirtualCores based on the default partition 
> only. As a result, resources from non-default partitions are not visible to 
> external resource management systems, which may incorrectly determine that 
> the cluster has no available capacity after new labels are added.
> The API should expose cluster resource metrics across all partitions and 
> provide partition-level metrics so clients can distinguish capacity and usage 
> by node label.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to