Nilotpal Nandi created HDDS-3317:
------------------------------------

             Summary: replication manager failing to re-replicate under 
replicated containers
                 Key: HDDS-3317
                 URL: https://issues.apache.org/jira/browse/HDDS-3317
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
            Reporter: Nilotpal Nandi


properties set :
------------------
"ozone.scm.stale.node.interval": "2m",
"ozone.scm.dead.node.interval": "4m",
"hdds.scm.replication.thread.interval": "12s",
"ozone.scm.container.size": "1GB"

Steps taken :
-----------------
1) write a key (less than a block size)

2) shutdown two container replica datanodes.

3) Tried to query container info on the container

The entries of datanodes which were shutdown was gone after sometime, when 
query command was run.

But the container, which is now under-replicated, was never re-replicated even 
after waiting more than 25 minutes.
{noformat}
ozone scmcli container info 34 | egrep 'Container|Datanodes' Wed Apr 1 11:20:09 
UTC 2020 Container id: 34 Container State: CLOSED Container Path: 
/hadoop-ozone/datanode/data/hdds/48873cab-51ef-44f9-8995-86a377038c78/current/containerDir0/34/metadata
 Container Metadata: Datanodes: [quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site]

{noformat}
 

It seems pipeline close command is failing.
scm log snippet :
------------------
{noformat}
2020-04-01 11:02:40,757 ERROR 
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Could not execute 
pipeline action=action: CLOSE
closePipeline {
  pipelineID {
    id: "7eea320c-7d56-45f4-bc39-a3f492b0c74e"
  }
  reason: PIPELINE_FAILED
  detailedReason: "ea2322d9-8ede-4f48-a72d-693e809d2b95 has not seen follower/s 
92f73ec3-9ed8-41c8-9103-c4c1b2b365e1 for 304999ms"
}
 pipeline=PipelineID=7eea320c-7d56-45f4-bc39-a3f492b0c74e {}
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
PipelineID=7eea320c-7d56-45f4-bc39-a3f492b0c74e not found
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:133)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getPipeline(PipelineStateManager.java:63)
        at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getPipeline(SCMPipelineManager.java:243)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.onMessage(PipelineActionHandler.java:59)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.onMessage(PipelineActionHandler.java:35)
        at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
2020-04-01 11:02:40,758 ERROR 
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Could not execute 
pipeline action=action: CLOSE
closePipeline {
  pipelineID {
    id: "7eea320c-7d56-45f4-bc39-a3f492b0c74e"
  }
  reason: PIPELINE_FAILED
  detailedReason: "ea2322d9-8ede-4f48-a72d-693e809d2b95 has not seen follower/s 
92f73ec3-9ed8-41c8-9103-c4c1b2b365e1 for 305000ms"
}
 pipeline=PipelineID=7eea320c-7d56-45f4-bc39-a3f492b0c74e {}
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
PipelineID=7eea320c-7d56-45f4-bc39-a3f492b0c74e not found
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:133)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getPipeline(PipelineStateManager.java:63)
        at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getPipeline(SCMPipelineManager.java:243)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.onMessage(PipelineActionHandler.java:59)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.onMessage(PipelineActionHandler.java:35)
        at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
2020-04-01 11:02:40,758 ERROR 
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Could not execute 
pipeline action=action: CLOSE
closePipeline {
  pipelineID {
    id: "7eea320c-7d56-45f4-bc39-a3f492b0c74e"
  }
  reason: PIPELINE_FAILED
  detailedReason: "ea2322d9-8ede-4f48-a72d-693e809d2b95 has not seen follower/s 
92f73ec3-9ed8-41c8-9103-c4c1b2b365e1 for 307501ms"
}
 pipeline=PipelineID=7eea320c-7d56-45f4-bc39-a3f492b0c74e {}
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
PipelineID=7eea320c-7d56-45f4-bc39-a3f492b0c74e not found
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:133)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getPipeline(PipelineStateManager.java:63)
        at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getPipeline(SCMPipelineManager.java:243)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.onMessage(PipelineActionHandler.java:59)
        at 
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.onMessage(PipelineActionHandler.java:35)
        at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to