[jira] [Updated] (HDDS-3669) SCM Infinite loop in BlockManagerImpl.allocateBlock

2020-06-13 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-3669:
--
Target Version/s: 0.7.0  (was: 0.6.0)

> SCM Infinite loop in BlockManagerImpl.allocateBlock
> ---
>
> Key: HDDS-3669
> URL: https://issues.apache.org/jira/browse/HDDS-3669
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.6.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
>  Labels: Triaged
>
> The following step can reproduce this issue
> - A new ozone cluster with only a factor three pipeline
> - put a big file(1G) into cluster, during the put process,  we kill the 
> leader datanode of this pipeline.
> The put command will hang, the following log will fill the scm log file.
> 2020-05-27 17:32:46,988 [IPC Server handler 23 on default port 9863] WARN 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager: Container 
> allocation failed for pipeline=Pipeline[ Id: 
> bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1, Nodes: 
> e859cad9-c7f6-451a-a039-af06103aa978{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: 
> null}1cd2bf20-a791-42a0-b4cd-b26d995cb8eb{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: 
> null}0827f3bb-0d94-435a-a157-4db2c84cdedf{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:3, 
> State:OPEN, leaderId:0827f3bb-0d94-435a-a157-4db2c84cdedf, 
> CreationTimestamp2020-05-27T08:05:36.590Z] requiredSize=268435456 {}
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
> PipelineID=bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1 not found
> at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getContainers(PipelineStateMap.java:301)
> at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getContainers(PipelineStateManager.java:95)
> at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getContainersInPipeline(SCMPipelineManager.java:360)
> at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainersForOwner(SCMContainerManager.java:507)
> at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.getMatchingContainer(SCMContainerManager.java:428)
> at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:230)
> at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:190)
> at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:167)
> at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:119)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
> at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13303)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3669) SCM Infinite loop in BlockManagerImpl.allocateBlock

2020-06-01 Thread Arpit Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3669:

Target Version/s: 0.6.0
  Labels: TriagePending  (was: )

> SCM Infinite loop in BlockManagerImpl.allocateBlock
> ---
>
> Key: HDDS-3669
> URL: https://issues.apache.org/jira/browse/HDDS-3669
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.6.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
>  Labels: TriagePending
>
> The following step can reproduce this issue
> - A new ozone cluster with only a factor three pipeline
> - put a big file(1G) into cluster, during the put process,  we kill the 
> leader datanode of this pipeline.
> The put command will hang, the following log will fill the scm log file.
> 2020-05-27 17:32:46,988 [IPC Server handler 23 on default port 9863] WARN 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager: Container 
> allocation failed for pipeline=Pipeline[ Id: 
> bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1, Nodes: 
> e859cad9-c7f6-451a-a039-af06103aa978{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: 
> null}1cd2bf20-a791-42a0-b4cd-b26d995cb8eb{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: 
> null}0827f3bb-0d94-435a-a157-4db2c84cdedf{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:3, 
> State:OPEN, leaderId:0827f3bb-0d94-435a-a157-4db2c84cdedf, 
> CreationTimestamp2020-05-27T08:05:36.590Z] requiredSize=268435456 {}
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
> PipelineID=bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1 not found
> at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getContainers(PipelineStateMap.java:301)
> at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getContainers(PipelineStateManager.java:95)
> at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getContainersInPipeline(SCMPipelineManager.java:360)
> at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainersForOwner(SCMContainerManager.java:507)
> at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.getMatchingContainer(SCMContainerManager.java:428)
> at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:230)
> at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:190)
> at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:167)
> at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:119)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
> at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:100)
> at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13303)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3669) SCM Infinite loop in BlockManagerImpl.allocateBlock

2020-05-27 Thread maobaolong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maobaolong updated HDDS-3669:
-
Description: 
The following step can reproduce this issue

- A new ozone cluster with only a factor three pipeline
- put a big file(1G) into cluster, during the put process,  we kill the leader 
datanode of this pipeline.

The put command will hang, the following log will fill the scm log file.
2020-05-27 17:32:46,988 [IPC Server handler 23 on default port 9863] WARN 
org.apache.hadoop.hdds.scm.container.SCMContainerManager: Container allocation 
failed for pipeline=Pipeline[ Id: bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1, Nodes: 
e859cad9-c7f6-451a-a039-af06103aa978{ip: 127.0.0.1, host: localhost, 
networkLocation: /default-rack, certSerialId: 
null}1cd2bf20-a791-42a0-b4cd-b26d995cb8eb{ip: 127.0.0.1, host: localhost, 
networkLocation: /default-rack, certSerialId: 
null}0827f3bb-0d94-435a-a157-4db2c84cdedf{ip: 127.0.0.1, host: localhost, 
networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:3, 
State:OPEN, leaderId:0827f3bb-0d94-435a-a157-4db2c84cdedf, 
CreationTimestamp2020-05-27T08:05:36.590Z] requiredSize=268435456 {}
org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
PipelineID=bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1 not found
at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getContainers(PipelineStateMap.java:301)
at 
org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getContainers(PipelineStateManager.java:95)
at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getContainersInPipeline(SCMPipelineManager.java:360)
at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainersForOwner(SCMContainerManager.java:507)
at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.getMatchingContainer(SCMContainerManager.java:428)
at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:230)
at 
org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:190)
at 
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:167)
at 
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:119)
at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:74)
at 
org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:100)
at 
org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13303)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)




> SCM Infinite loop in BlockManagerImpl.allocateBlock
> ---
>
> Key: HDDS-3669
> URL: https://issues.apache.org/jira/browse/HDDS-3669
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.6.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
>
> The following step can reproduce this issue
> - A new ozone cluster with only a factor three pipeline
> - put a big file(1G) into cluster, during the put process,  we kill the 
> leader datanode of this pipeline.
> The put command will hang, the following log will fill the scm log file.
> 2020-05-27 17:32:46,988 [IPC Server handler 23 on default port 9863] WARN 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager: Container 
> allocation failed for pipeline=Pipeline[ Id: 
> bf7dd356-2d97-4b2a-8a81-e2ddd25bc5a1, Nodes: 
> e859cad9-c7f6-451a-a039-af06103aa978{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: 
> null}1cd2bf20-a791-42a0-b4cd-b26d995cb8eb{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: 
> null}0827f3bb-0d94-435a-a157-4db2c84cdedf{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:3, 
> State:OPEN,