[
https://issues.apache.org/jira/browse/YARN-11965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089000#comment-18089000
]
ASF GitHub Bot commented on YARN-11965:
---------------------------------------
zhengchenyu commented on PR #8547:
URL: https://github.com/apache/hadoop/pull/8547#issuecomment-4705229462
After this PR, curl `http://{rm-host}:{rm-port/ws/v1/cluster/metrics` will
fix the issue where parameters such as `totalMB` and `totalVirtualCores` are
cluster-specific, instead of the default label parameters. Additionally,
`partitionClusterMetrics` has been added to represent metrics for different
partitions. The updated results are shown below:
```
{
"clusterMetrics": {
"appsSubmitted": 15,
"appsCompleted": 13,
"appsPending": 1,
"appsRunning": 1,
"appsFailed": 0,
"appsKilled": 0,
"reservedMB": 0,
"availableMB": 840704,
"allocatedMB": 183296,
"pendingMB": 5983232,
"reservedVirtualCores": 0,
"availableVirtualCores": 188,
"allocatedVirtualCores": 60,
"pendingVirtualCores": 1948,
"aggregateContainersAllocated": 214,
"aggregateContainersReleased": 199,
"containersAllocated": 15,
"containersReserved": 0,
"containersPending": 487,
"totalMB": 1024000,
"totalVirtualCores": 248,
"totalNodes": 4,
"lostNodes": 0,
"unhealthyNodes": 0,
"unscheduledNodes": 0,
"decommissioningNodes": 0,
"decommissionedNodes": 0,
"rebootedNodes": 0,
"activeNodes": 4,
"shutdownNodes": 0,
"totalUsedResourcesAcrossPartition": {
"memory": 183296,
"vCores": 60,
"resourceInformations": {
"resourceInformation": [
{
"maximumAllocation": 9223372036854775807,
"minimumAllocation": 0,
"name": "memory-mb",
"resourceType": "COUNTABLE",
"units": "Mi",
"value": 183296
},
{
"maximumAllocation": 9223372036854775807,
"minimumAllocation": 0,
"name": "vcores",
"resourceType": "COUNTABLE",
"units": "",
"value": 60
}
]
}
},
"totalClusterResourcesAcrossPartition": {
"memory": 1024000,
"vCores": 248,
"resourceInformations": {
"resourceInformation": [
{
"maximumAllocation": 9223372036854775807,
"minimumAllocation": 0,
"name": "memory-mb",
"resourceType": "COUNTABLE",
"units": "Mi",
"value": 1024000
},
{
"maximumAllocation": 9223372036854775807,
"minimumAllocation": 0,
"name": "vcores",
"resourceType": "COUNTABLE",
"units": "",
"value": 248
}
]
}
},
"totalReservedResourcesAcrossPartition": {
"memory": 0,
"vCores": 0,
"resourceInformations": {
"resourceInformation": [
{
"maximumAllocation": 9223372036854775807,
"minimumAllocation": 0,
"name": "memory-mb",
"resourceType": "COUNTABLE",
"units": "Mi",
"value": 0
},
{
"maximumAllocation": 9223372036854775807,
"minimumAllocation": 0,
"name": "vcores",
"resourceType": "COUNTABLE",
"units": "",
"value": 0
}
]
}
},
"totalAllocatedContainersAcrossPartition": 15,
"partitionClusterMetrics": [
{
"partitionName": "",
"totalMB": 256000,
"totalVirtualCores": 62,
"availableMB": 72704,
"availableVirtualCores": 2,
"allocatedMB": 183296,
"allocatedVirtualCores": 60,
"reservedMB": 0,
"reservedVirtualCores": 0,
"pendingMB": 5983232,
"pendingVirtualCores": 1948,
"containersAllocated": 15,
"containersReserved": 0,
"containersPending": 487
},
{
"partitionName": "part-a",
"totalMB": 256000,
"totalVirtualCores": 62,
"availableMB": 256000,
"availableVirtualCores": 62,
"allocatedMB": 0,
"allocatedVirtualCores": 0,
"reservedMB": 0,
"reservedVirtualCores": 0,
"pendingMB": 0,
"pendingVirtualCores": 0,
"containersAllocated": 0,
"containersReserved": 0,
"containersPending": 0
},
{
"partitionName": "part-b",
"totalMB": 256000,
"totalVirtualCores": 62,
"availableMB": 256000,
"availableVirtualCores": 62,
"allocatedMB": 0,
"allocatedVirtualCores": 0,
"reservedMB": 0,
"reservedVirtualCores": 0,
"pendingMB": 0,
"pendingVirtualCores": 0,
"containersAllocated": 0,
"containersReserved": 0,
"containersPending": 0
},
{
"partitionName": "part-c",
"totalMB": 256000,
"totalVirtualCores": 62,
"availableMB": 256000,
"availableVirtualCores": 62,
"allocatedMB": 0,
"allocatedVirtualCores": 0,
"reservedMB": 0,
"reservedVirtualCores": 0,
"pendingMB": 0,
"pendingVirtualCores": 0,
"containersAllocated": 0,
"containersReserved": 0,
"containersPending": 0
}
],
"crossPartitionMetricsAvailable": true
}
}
```
> Support partition-aware resource metrics in RM cluster metrics REST API
> -----------------------------------------------------------------------
>
> Key: YARN-11965
> URL: https://issues.apache.org/jira/browse/YARN-11965
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Chenyu Zheng
> Assignee: Chenyu Zheng
> Priority: Major
> Labels: pull-request-available
>
> When node labels are enabled, different labels represent separate resource
> pools. However, the RM REST API `/ws/v1/cluster/metrics` currently exposes
> fields such as totalMB and totalVirtualCores based on the default partition
> only. As a result, resources from non-default partitions are not visible to
> external resource management systems, which may incorrectly determine that
> the cluster has no available capacity after new labels are added.
> The API should expose cluster resource metrics across all partitions and
> provide partition-level metrics so clients can distinguish capacity and usage
> by node label.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]