[
https://issues.apache.org/jira/browse/HDDS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephen O'Donnell updated HDDS-2592:
------------------------------------
Description:
When the operational state of a datanode changes, an async command should be
triggered to persist the new state on the datanodes. For maintenance mode, the
datanode should also store the maintenance end time. The datanode will then
report the new state (and optional maintenance end time) back via its heartbeat.
The purpose of the DN persisting this information and heartbeating it back to
SCM is to allow the operation state to be recovered after a SCM reboot, as SCM
does not persist any of this information. It also allows "Recon" to learn the
datanode states.
If SCM is restarted, then it will forget all knowledge of the datanodes. When
they register, their operational state will be reported and SCM can set it
correctly.
Outside of registration (ie during normal heartbeats), the SCM state is the
source of truth for the operational state and if the DN heartbeat reports a
state that is not the same as SCM, SCM should issue another command to the
datanode to set its state to the SCM value. There is a chance the state miss
match is due to an unprocessed command triggered by the SCM state change, but
the worst case is an extra command sent to the datanode. This is a very
lightweight command, so that is not an issue.
One open question is whether to persist intermediate states on the DN. Ie for
decommissioning, the DN will first persist "Decommissioning" and then
transition to "Decommissioned" when SCM is satisfied all containers are
replicated. It would be possible to persist both these states in turn on the
datanode quite easily in turn. Or, we set the end state (Decommissioned) on the
datanode and allow SCM to get the node to that state. For the latter, if SCM is
restarted, then the DN will report "Decommissioned" on registration, but SCM
will set its internal state to Decommissioning and then ensure all containers
are replicated before transitioning the node to Decommissioned. This seems like
a safer approach, but there are advantages of tracking the intermediate states
on the DNs too.
was:
When a node is decommissioned or put into maintenance, SCM will receive the
command to kick off the workflow. As part of that workflow, it should issue a
further command to the datanode to set the datanode as either:
maintenance
decommissioned
in_service (this is the default state)
This state should be persisted in the datanode yaml file so it survives reboots.
Upon receiving this command, the datanode will return a new state for all its
containers in the next container report.
For all closed containers it should return a state of DECOMMISSIONED or
MAINTENANCE accordingly, while non-closed container should return their
original value until they are closed. That way SCM can monitor for unclosed
containers as part of the decommission flow.
I don't believe there is any need for the datanode to have multiple states for
each admin state (eg decommissioning + decommissioned / entering_maintenance +
in_maintenance) as those are only really relevant to SCM. Instead it should be
enough to set the datanode state once and assume SCM will cause it to
eventually reach that state.
These states will be added via HDDS-2459 to progress the changes in the
Replication Manager on the SCM side:
{code}
ContainerReplicaProto.State.DECOMMISSIONED
ContainerReplicaProto.State.MAINTENANCE
{code}
> Add Datanode command to allow the datanode to persist its admin state
> ----------------------------------------------------------------------
>
> Key: HDDS-2592
> URL: https://issues.apache.org/jira/browse/HDDS-2592
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Components: Ozone Datanode, SCM
> Affects Versions: 0.5.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
>
> When the operational state of a datanode changes, an async command should be
> triggered to persist the new state on the datanodes. For maintenance mode,
> the datanode should also store the maintenance end time. The datanode will
> then report the new state (and optional maintenance end time) back via its
> heartbeat.
> The purpose of the DN persisting this information and heartbeating it back to
> SCM is to allow the operation state to be recovered after a SCM reboot, as
> SCM does not persist any of this information. It also allows "Recon" to learn
> the datanode states.
> If SCM is restarted, then it will forget all knowledge of the datanodes. When
> they register, their operational state will be reported and SCM can set it
> correctly.
> Outside of registration (ie during normal heartbeats), the SCM state is the
> source of truth for the operational state and if the DN heartbeat reports a
> state that is not the same as SCM, SCM should issue another command to the
> datanode to set its state to the SCM value. There is a chance the state miss
> match is due to an unprocessed command triggered by the SCM state change, but
> the worst case is an extra command sent to the datanode. This is a very
> lightweight command, so that is not an issue.
> One open question is whether to persist intermediate states on the DN. Ie for
> decommissioning, the DN will first persist "Decommissioning" and then
> transition to "Decommissioned" when SCM is satisfied all containers are
> replicated. It would be possible to persist both these states in turn on the
> datanode quite easily in turn. Or, we set the end state (Decommissioned) on
> the datanode and allow SCM to get the node to that state. For the latter, if
> SCM is restarted, then the DN will report "Decommissioned" on registration,
> but SCM will set its internal state to Decommissioning and then ensure all
> containers are replicated before transitioning the node to Decommissioned.
> This seems like a safer approach, but there are advantages of tracking the
> intermediate states on the DNs too.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]