[jira] [Commented] (HDDS-1899) DeleteBlocksCommandHandler is unable to find the container in SCM
[ https://issues.apache.org/jira/browse/HDDS-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924282#comment-16924282 ] Lokesh Jain commented on HDDS-1899: --- [~nandakumar131] I think the exception seems harmless. This exception is thrown when the container can not be found before processing a DeleteBlocks command. As mentioned by you it can be because replication manager deleted a container before block deletion was processed. There is another issue however. Currently all the synchronization is done via locking the container object itself. In case of delete container the container is removed from containerSet but the container object may still be alive and can be used to acquire a lock on the container. Also in deleteContainer we delete the container outside the lock which could race with the other operations. With the current locking semantics we need to check if container exists or not after acquiring a lock on it. Also container deletion should be done inside the lock itself. > DeleteBlocksCommandHandler is unable to find the container in SCM > - > > Key: HDDS-1899 > URL: https://issues.apache.org/jira/browse/HDDS-1899 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster > > DeleteBlocksCommandHandler is unable to find a container in SCM. > {code} > 2019-08-02 14:04:56,735 WARN commandhandler.DeleteBlocksCommandHandler > (DeleteBlocksCommandHandler.java:lambda$handle$0(140)) - Failed to delete > blocks for container=33, TXID=184 > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > Unable to find the container 33 > at > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.lambda$handle$0(DeleteBlocksCommandHandler.java:122) > at java.util.ArrayList.forEach(ArrayList.java:1257) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.handle(DeleteBlocksCommandHandler.java:114) > at > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1899) DeleteBlocksCommandHandler is unable to find the container in SCM
[ https://issues.apache.org/jira/browse/HDDS-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924057#comment-16924057 ] Nanda kumar commented on HDDS-1899: --- There are valid scenarios in which this could happen. When we get delete block request on an over replicated container, the block deleting service will try to send delete block command to all the datanodes which has a replica. At the same time replication manager will send delete container (replica) command to one (or more) datanode(s). There could be a race condition here and if the delete container command is added to the {{SCMNodeManager#commandQueue}} before delete block command, deletion of the container in datanode will happen before processing the delete block request which will result in {{StorageContainerException}}. > DeleteBlocksCommandHandler is unable to find the container in SCM > - > > Key: HDDS-1899 > URL: https://issues.apache.org/jira/browse/HDDS-1899 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster > > DeleteBlocksCommandHandler is unable to find a container in SCM. > {code} > 2019-08-02 14:04:56,735 WARN commandhandler.DeleteBlocksCommandHandler > (DeleteBlocksCommandHandler.java:lambda$handle$0(140)) - Failed to delete > blocks for container=33, TXID=184 > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > Unable to find the container 33 > at > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.lambda$handle$0(DeleteBlocksCommandHandler.java:122) > at java.util.ArrayList.forEach(ArrayList.java:1257) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.handle(DeleteBlocksCommandHandler.java:114) > at > org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org