[jira] [Updated] (HDDS-2860) Cluster disk space metrics should reflect decommission and maintenance states

Marton Elek (Jira) Mon, 10 Feb 2020 02:38:56 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marton Elek updated HDDS-2860:
------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Cluster disk space metrics should reflect decommission and maintenance states
> -----------------------------------------------------------------------------
>
>                 Key: HDDS-2860
>                 URL: https://issues.apache.org/jira/browse/HDDS-2860
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> Now we have decommission states, we need to adjust the cluster capacity, 
> space used and available metrics which are exposed via JMX.
> For a node decommissioning, the space used on the node effectively needs to 
> be transfer to other nodes via container replication before decommission can 
> complete, but this is difficult to track from a space usage perspective. When 
> a node completes decommission, we can assume it provides no capacity to the 
> cluster and uses none. Therefore, for decommissioning + decommissioned nodes, 
> the simplest calculation is to exclude the node completely in a similar way 
> to a dead node.
> For maintenance nodes, things are even less clear. For a maintenance node, it 
> is read only so it cannot provide capacity to the cluster, but it is expected 
> to return to service, so excluding it completely probably does not make 
> sense. However, perhaps the simplest solution is to do the following:
> 1. For any node not IN_SERVICE, do not include its usage or space in the 
> cluster capacity totals.
> 2. Introduce some new metrics to account for the maintenance and perhaps 
> decommission capacity, so it is not lost eg:
> {code}
> # Existing metrics
> "DiskCapacity" : 62725623808,
> "DiskUsed" : 4096,
> "DiskRemaining" : 50459619328,
> # Suggested additional new ones, with the above only considering IN_SERVICE 
> nodes:
> "MaintenanceDiskCapacity": 0
> "MaintenanceDiskUsed": 0
> "MaintenanceDiskRemaining": 0
> "DecommissionedDiskCapacity": 0
> "DecommissionedDiskUsed": 0
> "DecommissionedDiskRemaining": 0
> ...
> {code}
> That way, the cluster totals are only what is currently "online", but we have 
> the other metrics to track what has been removed etc. The key advantage of 
> this, is that it is easy to understand.
> There could also be an argument that the new decommissionedDisk metrics are 
> not needed as that capacity is technically lost from the cluster forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-2860) Cluster disk space metrics should reflect decommission and maintenance states

Reply via email to