[
https://issues.apache.org/jira/browse/HDDS-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marton Elek resolved HDDS-2695.
-------------------------------
Resolution: Cannot Reproduce
> SCM is not able to start under certain conditions
> -------------------------------------------------
>
> Key: HDDS-2695
> URL: https://issues.apache.org/jira/browse/HDDS-2695
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: SCM
> Reporter: Istvan Fajth
> Assignee: Istvan Fajth
> Priority: Critical
> Labels: Triaged
>
> Given
> - a cluster where RATIS-677 happened, and DataNodes are already failing to
> start properly due to the issue
> When
> - I restart the cluster and start to see the exceptions as described in
> RATIS-677
> - I stop the 3 DN that has the failing pipeline
> - remove the ratis metadata for the pipeline
> - close the pipeline with scmcli
> - restart the 3 DN
> Then
> - SCM is unable to come out of safe mode, the log shows the following
> possible reason:
> {code}
> 2019-12-09 01:13:38,437 INFO
> org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: SCM in safe mode.
> Pipelines with at least one datanode reported count is 0, required at least
> one datanode reported per pipeline count is 4
> {code}
> If after this I restart the SCM, it fails without logging any exception, and
> the standard error contains the following message es the last one:
> {code}
> PipelineID=<id_of_pipeline_that_has_been_closed> not found
> {code}
> Also scmcli did not list the closed pipeline after I closed it and checked
> the active pipelines.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]