[ https://issues.apache.org/jira/browse/HDDS-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bharat Viswanadham updated HDDS-3066: ------------------------------------- Status: Patch Available (was: Open) > SCM startup failed during loading containers from DB > ----------------------------------------------------- > > Key: HDDS-3066 > URL: https://issues.apache.org/jira/browse/HDDS-3066 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Bharat Viswanadham > Assignee: Bharat Viswanadham > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This is happening because pipeline scrubber came and removed pipeline, and > it closed pipeline and removed from DB and triggered close containers to set > them to CLOSING. When SCM is restarted before close container command is > handled and change the state to CLOSING, the below issue can happen. > > This can happen in other scenarios like when safeModeHandler calls > finalizeAndDestroyPipeline and do SCM restart. > > The root cause for this is Pipeline removed from DB and the container is in > open state in this scenario, and when trying to get pipeline we will crash > SCM due to the {{PipelineNotFoundException error.}} > {{}} > {code:java} > 2020-02-21 13:57:34,888 [main] ERROR > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SCM start > failed with exception > org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: > PipelineID=35dff62d-9bfa-449b-b6e8-6f00cc8c1b6e not found at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:133) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.addContainerToPipeline(PipelineStateMap.java:110) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.addContainerToPipeline(PipelineStateManager.java:59) > at > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.addContainerToPipeline(SCMPipelineManager.java:309) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.loadExistingContainers(SCMContainerManager.java:121) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.<init>(SCMContainerManager.java:107) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:412) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:283) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:215) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:612) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:142) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:117) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:66) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:42) > at picocli.CommandLine.execute(CommandLine.java:1173) at > picocli.CommandLine.access$800(CommandLine.java:141) at > picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at > picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at > picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at > org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at > org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:55) > 2020-02-21 13:57:34,892 [shutdown-hook-0] INFO > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: > SHUTDOWN_MSG: /************************************************************ > SHUTDOWN_MSG: Shutting down StorageContainerManager at > om-ha-1.vpc.cloudera.com/10.65.51.49 > ************************************************************/{code} > {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org