[ 
https://issues.apache.org/jira/browse/HDDS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDDS-2592:
------------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Add Datanode command to allow the datanode to persist its admin state 
> ----------------------------------------------------------------------
>
>                 Key: HDDS-2592
>                 URL: https://issues.apache.org/jira/browse/HDDS-2592
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Datanode, SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the operational state of a datanode changes, an async command should be 
> triggered to persist the new state on the datanodes. For maintenance mode, 
> the datanode should also store the maintenance end time. The datanode will 
> then report the new state (and optional maintenance end time) back via its 
> heartbeat.
> The purpose of the DN persisting this information and heartbeating it back to 
> SCM is to allow the operation state to be recovered after a SCM reboot, as 
> SCM does not persist any of this information. It also allows "Recon" to learn 
> the datanode states.
> If SCM is restarted, then it will forget all knowledge of the datanodes. When 
> they register, their operational state will be reported and SCM can set it 
> correctly.
> Outside of registration (ie during normal heartbeats), the SCM state is the 
> source of truth for the operational state and if the DN heartbeat reports a 
> state that is not the same as SCM, SCM should issue another command to the 
> datanode to set its state to the SCM value. There is a chance the state miss 
> match is due to an unprocessed command triggered by the SCM state change, but 
> the worst case is an extra command sent to the datanode. This is a very 
> lightweight command, so that is not an issue.
> One open question is whether to persist intermediate states on the DN. Ie for 
> decommissioning, the DN will first persist "Decommissioning" and then 
> transition to "Decommissioned" when SCM is satisfied all containers are 
> replicated. It would be possible to persist both these states in turn on the 
> datanode quite easily in turn. Or, we set the end state (Decommissioned) on 
> the datanode and allow SCM to get the node to that state. For the latter, if 
> SCM is restarted, then the DN will report "Decommissioned" on registration, 
> but SCM will set its internal state to Decommissioning and then ensure all 
> containers are replicated before transitioning the node to Decommissioned. 
> This seems like a safer approach, but there are advantages of tracking the 
> intermediate states on the DNs too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to