Hello,

We're seeing some unusual behavior in our two HDFS 2.6.0
(CDH5.11.1) clusters and was wondering if you could help. When we failover
our Namenodes we observe a large number of PendingDeletionBlocks blocks -
i.e., the metric is zero before failover and several thousand after.

This seems different to the PostponedMisreplicatedBlocks [1] (expected
before all the datanodes have sent their block reports to the new active
namenode and the number of NumStaleStorages is zero) - we see that metric
become zero once all the block reports have been received. What we're
seeing is that PendingDeletionBlocks increases immediately after failover,
when NumStaleStorages is ~equal to the number of datanodes in the cluster.

The amount of extra space used is a problem as we have to increase our
cluster size to accommodate these blocks until the Namenodes are
failed-over.  We've checked the debug logs, metasave report, and other jmx
metrics and everything appears fine before we fail-over - apart from the
amount of dfs used growing then decreasing.

We can't find anything obviously wrong with the HDFS configuration, HA
setup, etc. Any help on where to look/debug next would be appreciated.

Thanks,

Michael.

[1] https://github.com/cloudera/hadoop-common/blob/cdh5-2.6.0_5.
11.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apach
e/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3047

--

Reply via email to