Istvan Fajth created HDFS-15304:
-----------------------------------

             Summary: Infinite loop between DN and NN at rare condition
                 Key: HDFS-15304
                 URL: https://issues.apache.org/jira/browse/HDFS-15304
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Istvan Fajth


During the investigation lead to HDFS-15303, we have identified the following 
infinite loop between the DNs affected by the data directory layout problem:
- for a particular misplaced block, the VolumeScanner finds the block file, and 
realizes that it is not part of the block map
- the block is added to the block map
- at the next FBR the block is reported to the NN
- the NN finds that the block should have been deleted already, as the 
corresponding inode was already deleted
- NN issues the deletion of the block on the DataNode
- DataNode runs the delete routine, but that fails to delete anything silently 
as it is trying to delete the block from the wrong internal subdir that is 
calculated based on the block id with a different algorythm.
- block is removed from the blockmap
- VolumeScanner finds the block again, and adds it back to the blockmap

The problem can happen only when there is a mixed layout on the DataNode due to 
some issue, and there are blocks in a subdir correct according to Hadoop2 
format, but the DN is already hadoop3, or vice versa if the problematic layout 
born during a rollback. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to