Yiqun Lin created HDDS-3241:
-------------------------------

             Summary: Invalid container reported to SCM should be deleted
                 Key: HDDS-3241
                 URL: https://issues.apache.org/jira/browse/HDDS-3241
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
            Reporter: Yiqun Lin
            Assignee: Yiqun Lin


For the invalid or out-updated container reported by Datanode, 
ContainerReportHandler in SCM only print error log and doesn't any action.

{noformat}
2020-03-15 05:19:41,072 ERROR 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received container 
report for an unknown container 37 from datanode 
0d98dfab-9d34-46c3-93fd-6b64b65ff543{ip: xx.xx.xx.xx, host: lyq-xx.xx.xx.xx, 
networkLocation: /dc2/rack1, certSerialId: null}.
org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: Container with 
id #37 not found.
        at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:542)
        at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.getContainerInfo(ContainerStateMap.java:188)
        at 
org.apache.hadoop.hdds.scm.container.ContainerStateManager.getContainer(ContainerStateManager.java:484)
        at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainer(SCMContainerManager.java:204)
        at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85)
        at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:126)
        at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
        at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46)
        at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2020-03-15 05:19:41,073 ERROR 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received container 
report for an unknown container 38 from datanode 
0d98dfab-9d34-46c3-93fd-6b64b65ff543{ip: xx.xx.xx.xx, host: lyq-xx.xx.xx.xx, 
networkLocation: /dc2/rack1, certSerialId: null}.
org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: Container with 
id #38 not found.
        at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:542)
        at 
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.getContainerInfo(ContainerStateMap.java:188)
        at 
org.apache.hadoop.hdds.scm.container.ContainerStateManager.getContainer(ContainerStateManager.java:484)
        at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainer(SCMContainerManager.java:204)
        at 
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85)
        at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:126)
        at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97)
        at 
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46)
        at 
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
{noformat}

Actually SCM should inform Datanode to delete its outdated container. 
Otherwise, Datanode will always report this invalid container and this dirty 
container data will be always kept in Datanode. Sometimes, we bring back a node 
that be repaired and it maybe stores stale data.

We could have a setting to control this auto-deletion behavior if this is a 
little risk approach.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to