[
https://issues.apache.org/jira/browse/HDDS-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marton Elek updated HDDS-2860:
------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Cluster disk space metrics should reflect decommission and maintenance states
> -----------------------------------------------------------------------------
>
> Key: HDDS-2860
> URL: https://issues.apache.org/jira/browse/HDDS-2860
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: SCM
> Affects Versions: 0.5.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
>
> Now we have decommission states, we need to adjust the cluster capacity,
> space used and available metrics which are exposed via JMX.
> For a node decommissioning, the space used on the node effectively needs to
> be transfer to other nodes via container replication before decommission can
> complete, but this is difficult to track from a space usage perspective. When
> a node completes decommission, we can assume it provides no capacity to the
> cluster and uses none. Therefore, for decommissioning + decommissioned nodes,
> the simplest calculation is to exclude the node completely in a similar way
> to a dead node.
> For maintenance nodes, things are even less clear. For a maintenance node, it
> is read only so it cannot provide capacity to the cluster, but it is expected
> to return to service, so excluding it completely probably does not make
> sense. However, perhaps the simplest solution is to do the following:
> 1. For any node not IN_SERVICE, do not include its usage or space in the
> cluster capacity totals.
> 2. Introduce some new metrics to account for the maintenance and perhaps
> decommission capacity, so it is not lost eg:
> {code}
> # Existing metrics
> "DiskCapacity" : 62725623808,
> "DiskUsed" : 4096,
> "DiskRemaining" : 50459619328,
> # Suggested additional new ones, with the above only considering IN_SERVICE
> nodes:
> "MaintenanceDiskCapacity": 0
> "MaintenanceDiskUsed": 0
> "MaintenanceDiskRemaining": 0
> "DecommissionedDiskCapacity": 0
> "DecommissionedDiskUsed": 0
> "DecommissionedDiskRemaining": 0
> ...
> {code}
> That way, the cluster totals are only what is currently "online", but we have
> the other metrics to track what has been removed etc. The key advantage of
> this, is that it is easy to understand.
> There could also be an argument that the new decommissionedDisk metrics are
> not needed as that capacity is technically lost from the cluster forever.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]