[
https://issues.apache.org/jira/browse/HDDS-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arpit Agarwal resolved HDDS-3066.
---------------------------------
Fix Version/s: (was: 0.6.0)
0.5.0
Resolution: Fixed
Cherry-picked to ozone-0.5.0.
> SCM startup failed during loading containers from DB
> -----------------------------------------------------
>
> Key: HDDS-3066
> URL: https://issues.apache.org/jira/browse/HDDS-3066
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: SCM
> Reporter: Bharat Viswanadham
> Assignee: Bharat Viswanadham
> Priority: Blocker
> Labels: OMHATest, pull-request-available
> Fix For: 0.5.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> This is happening because pipeline scrubber came and removed pipeline, and
> it closed pipeline and removed from DB and triggered close containers to set
> them to CLOSING. When SCM is restarted before close container command is
> handled and change the state to CLOSING, the below issue can happen.
>
> This can happen in other scenarios like when safeModeHandler calls
> finalizeAndDestroyPipeline and do SCM restart.
>
> The root cause for this is Pipeline removed from DB and the container is in
> open state in this scenario, and when trying to get pipeline we will crash
> SCM due to the {{PipelineNotFoundException error.}}
> {{}}
> {code:java}
> 2020-02-21 13:57:34,888 [main] ERROR
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SCM start
> failed with exception
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException:
> PipelineID=35dff62d-9bfa-449b-b6e8-6f00cc8c1b6e not found at
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:133)
> at
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.addContainerToPipeline(PipelineStateMap.java:110)
> at
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.addContainerToPipeline(PipelineStateManager.java:59)
> at
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.addContainerToPipeline(SCMPipelineManager.java:309)
> at
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.loadExistingContainers(SCMContainerManager.java:121)
> at
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.<init>(SCMContainerManager.java:107)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:412)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:283)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:215)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:612)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:142)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:117)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:66)
> at
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:42)
> at picocli.CommandLine.execute(CommandLine.java:1173) at
> picocli.CommandLine.access$800(CommandLine.java:141) at
> picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at
> picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
> at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at
> picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at
> org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at
> org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:55)
> 2020-02-21 13:57:34,892 [shutdown-hook-0] INFO
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter:
> SHUTDOWN_MSG: /************************************************************
> SHUTDOWN_MSG: Shutting down StorageContainerManager at
> om-ha-1.vpc.cloudera.com/10.65.51.49
> ************************************************************/{code}
> {{}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]