[ https://issues.apache.org/jira/browse/HDDS-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118920#comment-17118920 ]
maobaolong commented on HDDS-3669: ---------------------------------- PipelineStateMap#removePipeline add the following code List<Pipeline> list = query2OpenPipelines.get(new PipelineQuery(pipeline)); if (list != null) { if (list.remove(pipeline)) { LOG.warn("Remove a pipeline {} in query2OpenPipelines.", pipeline); } } > SCM Infinite loop in BlockManagerImpl.allocateBlock > --------------------------------------------------- > > Key: HDDS-3669 > URL: https://issues.apache.org/jira/browse/HDDS-3669 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM > Affects Versions: 0.6.0 > Reporter: maobaolong > Assignee: maobaolong > Priority: Major > > The following step can reproduce this issue > - A new ozone cluster with only a factor three pipeline > - put a big file(1G) into cluster, during the put process, we kill the > leader datanode of this pipeline. > The put command will hang, the following log will fill the scm log file. > 2020-05-27 17:32:46,988 [IPC Server handler 23 on default port 9863] WARN > org.apache.hadoop.hdds.scm.container.SCMContainerManager: Container > allocation failed for pipeline=Pipeline[ Id: > bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1, Nodes: > e859cad9-c7f6-451a-a039-af06103aa978{ip: 127.0.0.1, host: localhost, > networkLocation: /default-rack, certSerialId: > null}1cd2bf20-a791-42a0-b4cd-b26d995cb8eb{ip: 127.0.0.1, host: localhost, > networkLocation: /default-rack, certSerialId: > null}0827f3bb-0d94-435a-a157-4db2c84cdedf{ip: 127.0.0.1, host: localhost, > networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:3, > State:OPEN, leaderId:0827f3bb-0d94-435a-a157-4db2c84cdedf, > CreationTimestamp2020-05-27T08:05:36.590Z] requiredSize=268435456 {} > org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: > PipelineID=bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1 not found > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getContainers(PipelineStateMap.java:301) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getContainers(PipelineStateManager.java:95) > at > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getContainersInPipeline(SCMPipelineManager.java:360) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainersForOwner(SCMContainerManager.java:507) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.getMatchingContainer(SCMContainerManager.java:428) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:230) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:190) > at > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:167) > at > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:119) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74) > at > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:100) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13303) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org