[ 
https://issues.apache.org/jira/browse/HDDS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2696:
--------------------------------------
    Target Version/s: 0.7.0  (was: 0.6.0)

> Document recovery from RATIS-677
> --------------------------------
>
>                 Key: HDDS-2696
>                 URL: https://issues.apache.org/jira/browse/HDDS-2696
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: Ozone Datanode
>            Reporter: Istvan Fajth
>            Priority: Critical
>              Labels: Triaged
>
> As RATIS-677 is solved in a way where a setting needs to be changed, and set 
> for the RatisServer implementation to ignore the corruption, and at the 
> moment due to HDDS-2647, we do not have a clear recovery path from a ratis 
> corruption in the pipeline data.
> We should document how this can be recovered. I have an idea which includes 
> closing the pipeline in SCM and remove the ratis metadata for the pipeline in 
> the DataNodes, which effectively clears out the corrupted pipeline from the 
> system.
> There are two problems I have with finding a recovery path, and document it:
> - I am not sure if we have strong enough guarantees that the writes happened 
> properly if the ratis metadata could become corrupt so this needs to be 
> investigated.
> - At the moment I can not validate this approach, as if I do the steps (stop 
> the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli, 
> then restart the DNs) the pipeline is not closed properly, and SCM fails as 
> described in HDDS-2695



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to