[ 
https://issues.apache.org/jira/browse/HDDS-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118920#comment-17118920
 ] 

maobaolong commented on HDDS-3669:
----------------------------------

PipelineStateMap#removePipeline  
add the following code 
  List<Pipeline> list = query2OpenPipelines.get(new PipelineQuery(pipeline));
    if (list != null) {
      if (list.remove(pipeline)) {
        LOG.warn("Remove a pipeline {} in query2OpenPipelines.", pipeline);
      }
    }

> SCM Infinite loop in BlockManagerImpl.allocateBlock
> ---------------------------------------------------
>
>                 Key: HDDS-3669
>                 URL: https://issues.apache.org/jira/browse/HDDS-3669
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>    Affects Versions: 0.6.0
>            Reporter: maobaolong
>            Assignee: maobaolong
>            Priority: Major
>
> The following step can reproduce this issue
> - A new ozone cluster with only a factor three pipeline
> - put a big file(1G) into cluster, during the put process,  we kill the 
> leader datanode of this pipeline.
> The put command will hang, the following log will fill the scm log file.
> 2020-05-27 17:32:46,988 [IPC Server handler 23 on default port 9863] WARN 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager: Container 
> allocation failed for pipeline=Pipeline[ Id: 
> bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1, Nodes: 
> e859cad9-c7f6-451a-a039-af06103aa978{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: 
> null}1cd2bf20-a791-42a0-b4cd-b26d995cb8eb{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: 
> null}0827f3bb-0d94-435a-a157-4db2c84cdedf{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:3, 
> State:OPEN, leaderId:0827f3bb-0d94-435a-a157-4db2c84cdedf, 
> CreationTimestamp2020-05-27T08:05:36.590Z] requiredSize=268435456 {}
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
> PipelineID=bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1 not found
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getContainers(PipelineStateMap.java:301)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getContainers(PipelineStateManager.java:95)
>         at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getContainersInPipeline(SCMPipelineManager.java:360)
>         at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainersForOwner(SCMContainerManager.java:507)
>         at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.getMatchingContainer(SCMContainerManager.java:428)
>         at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:230)
>         at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:190)
>         at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:167)
>         at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:119)
>         at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
>         at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:100)
>         at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13303)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to