[
https://issues.apache.org/jira/browse/HDDS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shashikant Banerjee updated HDDS-2696:
--------------------------------------
Target Version/s: 0.7.0 (was: 0.6.0)
> Document recovery from RATIS-677
> --------------------------------
>
> Key: HDDS-2696
> URL: https://issues.apache.org/jira/browse/HDDS-2696
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: Ozone Datanode
> Reporter: Istvan Fajth
> Priority: Critical
> Labels: Triaged
>
> As RATIS-677 is solved in a way where a setting needs to be changed, and set
> for the RatisServer implementation to ignore the corruption, and at the
> moment due to HDDS-2647, we do not have a clear recovery path from a ratis
> corruption in the pipeline data.
> We should document how this can be recovered. I have an idea which includes
> closing the pipeline in SCM and remove the ratis metadata for the pipeline in
> the DataNodes, which effectively clears out the corrupted pipeline from the
> system.
> There are two problems I have with finding a recovery path, and document it:
> - I am not sure if we have strong enough guarantees that the writes happened
> properly if the ratis metadata could become corrupt so this needs to be
> investigated.
> - At the moment I can not validate this approach, as if I do the steps (stop
> the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli,
> then restart the DNs) the pipeline is not closed properly, and SCM fails as
> described in HDDS-2695
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]