[jira] [Commented] (HDDS-1899) DeleteBlocksCommandHandler is unable to find the container in SCM

2019-09-06 Thread Lokesh Jain (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924282#comment-16924282
 ] 

Lokesh Jain commented on HDDS-1899:
---

[~nandakumar131] I think the exception seems harmless. This exception is thrown 
when the container can not be found before processing a DeleteBlocks command. 
As mentioned by you it can be because replication manager deleted a container 
before block deletion was processed.

There is another issue however. Currently all the synchronization is done via 
locking the container object itself. In case of delete container the container 
is removed from containerSet but the container object may still be alive and 
can be used to acquire a lock on the container. Also in deleteContainer we 
delete the container outside the lock which could race with the other 
operations.

With the current locking semantics we need to check if container exists or not 
after acquiring a lock on it. Also container deletion should be done inside the 
lock itself.

> DeleteBlocksCommandHandler is unable to find the container in SCM
> -
>
> Key: HDDS-1899
> URL: https://issues.apache.org/jira/browse/HDDS-1899
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> DeleteBlocksCommandHandler is unable to find a container in SCM.
> {code}
> 2019-08-02 14:04:56,735 WARN  commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:lambda$handle$0(140)) - Failed to delete 
> blocks for container=33, TXID=184
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Unable to find the container 33
> at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.lambda$handle$0(DeleteBlocksCommandHandler.java:122)
> at java.util.ArrayList.forEach(ArrayList.java:1257)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.handle(DeleteBlocksCommandHandler.java:114)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1899) DeleteBlocksCommandHandler is unable to find the container in SCM

2019-09-06 Thread Nanda kumar (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924057#comment-16924057
 ] 

Nanda kumar commented on HDDS-1899:
---

There are valid scenarios in which this could happen.

When we get delete block request on an over replicated container, the block 
deleting service will try to send delete block command to all the datanodes 
which has a replica.
At the same time replication manager will send delete container (replica) 
command to one (or more) datanode(s).

There could be a race condition here and if the delete container command is 
added to the {{SCMNodeManager#commandQueue}} before delete block command, 
deletion of the container in datanode will happen before processing the delete 
block request which will result in {{StorageContainerException}}.

> DeleteBlocksCommandHandler is unable to find the container in SCM
> -
>
> Key: HDDS-1899
> URL: https://issues.apache.org/jira/browse/HDDS-1899
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> DeleteBlocksCommandHandler is unable to find a container in SCM.
> {code}
> 2019-08-02 14:04:56,735 WARN  commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:lambda$handle$0(140)) - Failed to delete 
> blocks for container=33, TXID=184
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Unable to find the container 33
> at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.lambda$handle$0(DeleteBlocksCommandHandler.java:122)
> at java.util.ArrayList.forEach(ArrayList.java:1257)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.DeleteBlocksCommandHandler.handle(DeleteBlocksCommandHandler.java:114)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:93)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$1(DatanodeStateMachine.java:432)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org