bharatviswa504 opened a new pull request #596: HDDS-3066. SCM crash during 
loading containers to DB.
URL: https://github.com/apache/hadoop-ozone/pull/596
 
 
   ## What changes were proposed in this pull request?
   
    This is happening because pipeline scrubber came and removed pipeline, and 
it closed pipeline and removed from DB and triggered close containers to set 
them to CLOSING. When SCM is restarted before close container command is 
handled and change the state to CLOSING, the below issue can happen.
   
    
   
   This can happen in other scenarios like when safeModeHandler calls 
finalizeAndDestroyPipeline and do SCM restart. 
   
    
   
   The root cause for this is Pipeline removed from DB and the container is in 
open state in this scenario, and when trying to get pipeline we will crash SCM 
due to the PipelineNotFoundException error.
   
   
   
    `2020-02-21 13:57:34,888 [main] ERROR 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SCM start 
failed with exception 
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
PipelineID=35dff62d-9bfa-449b-b6e8-6f00cc8c1b6e not found at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:133)
 at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.addContainerToPipeline(PipelineStateMap.java:110)
 at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.addContainerToPipeline(PipelineStateManager.java:59)
 at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.addContainerToPipeline(SCMPipelineManager.java:309)
 at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.loadExistingContainers(SCMContainerManager.java:121)
 at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.<init>(SCMContainerManager.java:107)
 at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:412)
 at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:283)
 at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:215)
 at 
org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:612)
 at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:142)
 at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:117)
 at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:66)
 at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:42)
 at picocli.CommandLine.execute(CommandLine.java:1173) at 
picocli.CommandLine.access$800(CommandLine.java:141) at 
picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at 
picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
 at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at 
picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at 
org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at 
org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:55)
 2020-02-21 13:57:34,892 [shutdown-hook-0] INFO 
org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SHUTDOWN_MSG: 
/************************************************************ SHUTDOWN_MSG: 
Shutting down StorageContainerManager at om-ha-1.vpc.cloudera.com/10.65.51.49 
************************************************************/
   `
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3066
   
   ## How was this patch tested?
   
   Existing tests. Deployed the fix on the cluster, and SCM able to bootup.
   
   `2020-02-24 12:02:12,531 [main] WARN 
org.apache.hadoop.hdds.scm.container.SCMContainerManager: Found a Container 
ContainerInfo{id=3, state=OPEN, 
pipelineID=PipelineID=afb60e8a-0a69-410a-8699-d2a75e053225, 
stateEnterTime=1159646, owner=om2} which is in OPEN state with out a pipeline 
PipelineID=afb60e8a-0a69-410a-8699-d2a75e053225. Triggering Close Container.`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to