Sammi Chen created HDDS-3945:
--------------------------------
Summary: ContainerReplicaNotFoundException when remove a replica
in ContainerReportHandler
Key: HDDS-3945
URL: https://issues.apache.org/jira/browse/HDDS-3945
Project: Hadoop Distributed Data Store
Issue Type: Bug
Reporter: Sammi Chen
It's not easy to produce.
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Container #54339 is
over replicated. Expected replica count is 3, but found 16.
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
826dda09-1259-4c5c-9a80-56b985665dc4{ip: 9.180.6.157, host: host-9-180-6-157,
networkLocation: /rack10, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
6f87886a-745b-4eb6-9b4b-54e1f909f20c{ip: 9.180.13.218, host: host-9-180-13-218,
networkLocation: /rack2, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
d3336357-8920-4a4e-a12f-e57da1640c4d{ip: 9.180.20.94, host: host-9-180-20-94,
networkLocation: /rack1, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
7b4edd6e-5787-4574-9928-810514a05d2b{ip: 9.179.142.222, host: host222,
networkLocation: /rack2, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
5b36ed4f-4a6b-4014-b181-235789956d34{ip: 9.180.8.67, host: host-9-180-8-67,
networkLocation: /rack10, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
d35f7754-3914-4e3a-ac91-4ae26e08e8a7{ip: 9.180.19.144, host: host-9-180-19-144,
networkLocation: /rack3, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
db854037-4846-4093-89de-e492e0f14239{ip: 9.179.142.198, host: host198,
networkLocation: /rack3, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
228dacd3-36cf-4473-93ec-c06a739a8a2d{ip: 9.180.8.87, host: host-9-180-8-87,
networkLocation: /rack10, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
2e1b2fdd-f8fb-4252-bfc1-31d5339681be{ip: 9.179.144.104, host:
host-9-179-144-104, networkLocation: /rack2, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
1904b912-998d-43ba-9e54-f7e7c40c1759{ip: 9.180.21.100, host: host-9-180-21-100,
networkLocation: /rack2, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
dd64e953-bdef-4dae-a4c5-51aa7114ea0a{ip: 9.180.8.40, host: host-9-180-8-40,
networkLocation: /rack10, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
47cdfded-e88f-44f3-81b9-4f95e65e364f{ip: 9.180.8.78, host: host-9-180-8-78,
networkLocation: /rack10, certSerialId: null}
2020-07-04 16:14:19,820 [ReplicationMonitor] INFO
org.apache.hadoop.hdds.scm.container.ReplicationManager: Sending delete
container command for container #54339 to datanode
11974d80-c4ff-4963-81fa-873888feaa24{ip: 9.180.8.58, host: host-9-180-8-58,
networkLocation: /rack10, certSerialId: null}
2020-07-04 16:18:29,709 [EventQueue-ContainerReportForContainerReportHandler]
ERROR org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Exception
while processing container report for container 54339 from datanode
7b4edd6e-5787-4574-9928-810514a05d2b{ip: 9.179.142.222, host: host222,
networkLocation: /rack2, certSerialId: null}.
org.apache.hadoop.hdds.scm.container.ContainerReplicaNotFoundException:
Container #54339, replica: ContainerReplica{containerID=#54339,
datanodeDetails=7b4edd6e-5787-4574-9928-810514a05d2b{ip: 9.179.142.222, host:
host222, networkLocation: /rack2, certSerialId: null},
placeOfBirth=ca0dedd0-f586-4f99-986b-3a953dfc2dde, sequenceId=4249}
at
org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.removeContainerReplica(ContainerStateMap.java:256)
at
org.apache.hadoop.hdds.scm.container.ContainerStateManager.removeContainerReplica(ContainerStateManager.java:534)
at
org.apache.hadoop.hdds.scm.container.SCMContainerManager.removeContainerReplica(SCMContainerManager.java:560)
at
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerReplica(AbstractContainerReportHandler.java:234)
at
org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:81)
at
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:163)
at
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:131)
at
org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:51)
at
org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]