[
https://issues.apache.org/jira/browse/HDDS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell updated HDDS-2592:
------------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Add Datanode command to allow the datanode to persist its admin state
> ----------------------------------------------------------------------
>
> Key: HDDS-2592
> URL: https://issues.apache.org/jira/browse/HDDS-2592
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: Ozone Datanode, SCM
> Affects Versions: 0.5.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> When the operational state of a datanode changes, an async command should be
> triggered to persist the new state on the datanodes. For maintenance mode,
> the datanode should also store the maintenance end time. The datanode will
> then report the new state (and optional maintenance end time) back via its
> heartbeat.
> The purpose of the DN persisting this information and heartbeating it back to
> SCM is to allow the operation state to be recovered after a SCM reboot, as
> SCM does not persist any of this information. It also allows "Recon" to learn
> the datanode states.
> If SCM is restarted, then it will forget all knowledge of the datanodes. When
> they register, their operational state will be reported and SCM can set it
> correctly.
> Outside of registration (ie during normal heartbeats), the SCM state is the
> source of truth for the operational state and if the DN heartbeat reports a
> state that is not the same as SCM, SCM should issue another command to the
> datanode to set its state to the SCM value. There is a chance the state miss
> match is due to an unprocessed command triggered by the SCM state change, but
> the worst case is an extra command sent to the datanode. This is a very
> lightweight command, so that is not an issue.
> One open question is whether to persist intermediate states on the DN. Ie for
> decommissioning, the DN will first persist "Decommissioning" and then
> transition to "Decommissioned" when SCM is satisfied all containers are
> replicated. It would be possible to persist both these states in turn on the
> datanode quite easily in turn. Or, we set the end state (Decommissioned) on
> the datanode and allow SCM to get the node to that state. For the latter, if
> SCM is restarted, then the DN will report "Decommissioned" on registration,
> but SCM will set its internal state to Decommissioning and then ensure all
> containers are replicated before transitioning the node to Decommissioned.
> This seems like a safer approach, but there are advantages of tracking the
> intermediate states on the DNs too.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]