[ https://issues.apache.org/jira/browse/HDDS-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marton Elek updated HDDS-2860: ------------------------------ Resolution: Fixed Status: Resolved (was: Patch Available) > Cluster disk space metrics should reflect decommission and maintenance states > ----------------------------------------------------------------------------- > > Key: HDDS-2860 > URL: https://issues.apache.org/jira/browse/HDDS-2860 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM > Affects Versions: 0.5.0 > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Priority: Major > > Now we have decommission states, we need to adjust the cluster capacity, > space used and available metrics which are exposed via JMX. > For a node decommissioning, the space used on the node effectively needs to > be transfer to other nodes via container replication before decommission can > complete, but this is difficult to track from a space usage perspective. When > a node completes decommission, we can assume it provides no capacity to the > cluster and uses none. Therefore, for decommissioning + decommissioned nodes, > the simplest calculation is to exclude the node completely in a similar way > to a dead node. > For maintenance nodes, things are even less clear. For a maintenance node, it > is read only so it cannot provide capacity to the cluster, but it is expected > to return to service, so excluding it completely probably does not make > sense. However, perhaps the simplest solution is to do the following: > 1. For any node not IN_SERVICE, do not include its usage or space in the > cluster capacity totals. > 2. Introduce some new metrics to account for the maintenance and perhaps > decommission capacity, so it is not lost eg: > {code} > # Existing metrics > "DiskCapacity" : 62725623808, > "DiskUsed" : 4096, > "DiskRemaining" : 50459619328, > # Suggested additional new ones, with the above only considering IN_SERVICE > nodes: > "MaintenanceDiskCapacity": 0 > "MaintenanceDiskUsed": 0 > "MaintenanceDiskRemaining": 0 > "DecommissionedDiskCapacity": 0 > "DecommissionedDiskUsed": 0 > "DecommissionedDiskRemaining": 0 > ... > {code} > That way, the cluster totals are only what is currently "online", but we have > the other metrics to track what has been removed etc. The key advantage of > this, is that it is easy to understand. > There could also be an argument that the new decommissionedDisk metrics are > not needed as that capacity is technically lost from the cluster forever. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org