Hi Michael! Thank you for the report. I'm sorry I don't have advice other than the generic advice, like please try a newer version of Hadoop (say Hadoop-2.8.2) . You seem to already know that the BlockManager is the place to look.
If you found it to be a legitimate issue which could affect Apache Hadoop and still hasn't been fixed in trunk ( https://github.com/apache/hadoop ), could you please create a new JIRA for it here https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=116&projectKey=HDFS ? Thanks Ravi On Wed, Nov 8, 2017 at 7:50 PM, Michael Parkin <mpar...@siftscience.com> wrote: > Hello, > > We're seeing some unusual behavior in our two HDFS 2.6.0 > (CDH5.11.1) clusters and was wondering if you could help. When we failover > our Namenodes we observe a large number of PendingDeletionBlocks blocks - > i.e., the metric is zero before failover and several thousand after. > > This seems different to the PostponedMisreplicatedBlocks [1] (expected > before all the datanodes have sent their block reports to the new active > namenode and the number of NumStaleStorages is zero) - we see that metric > become zero once all the block reports have been received. What we're > seeing is that PendingDeletionBlocks increases immediately after > failover, when NumStaleStorages is ~equal to the number of datanodes in > the cluster. > > The amount of extra space used is a problem as we have to increase our > cluster size to accommodate these blocks until the Namenodes are > failed-over. We've checked the debug logs, metasave report, and other jmx > metrics and everything appears fine before we fail-over - apart from the > amount of dfs used growing then decreasing. > > We can't find anything obviously wrong with the HDFS configuration, HA > setup, etc. Any help on where to look/debug next would be appreciated. > > Thanks, > > Michael. > > [1] https://github.com/cloudera/hadoop-common/blob/cdh5-2.6.0_5. > 11.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apach > e/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3047 > > -- > >