Hi Michael!

Thank you for the report. I'm sorry I don't have advice other than the
generic advice, like please try a newer version of Hadoop (say
Hadoop-2.8.2) . You seem to already know that the BlockManager is the place
to look.

If you found it to be a legitimate issue which could affect Apache Hadoop
and still hasn't been fixed in trunk ( https://github.com/apache/hadoop ),
could you please create a new JIRA for it here
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=116&projectKey=HDFS
?

Thanks
Ravi

On Wed, Nov 8, 2017 at 7:50 PM, Michael Parkin <mpar...@siftscience.com>
wrote:

> Hello,
>
> We're seeing some unusual behavior in our two HDFS 2.6.0
> (CDH5.11.1) clusters and was wondering if you could help. When we failover
> our Namenodes we observe a large number of PendingDeletionBlocks blocks -
> i.e., the metric is zero before failover and several thousand after.
>
> This seems different to the PostponedMisreplicatedBlocks [1] (expected
> before all the datanodes have sent their block reports to the new active
> namenode and the number of NumStaleStorages is zero) - we see that metric
> become zero once all the block reports have been received. What we're
> seeing is that PendingDeletionBlocks increases immediately after
> failover, when NumStaleStorages is ~equal to the number of datanodes in
> the cluster.
>
> The amount of extra space used is a problem as we have to increase our
> cluster size to accommodate these blocks until the Namenodes are
> failed-over.  We've checked the debug logs, metasave report, and other jmx
> metrics and everything appears fine before we fail-over - apart from the
> amount of dfs used growing then decreasing.
>
> We can't find anything obviously wrong with the HDFS configuration, HA
> setup, etc. Any help on where to look/debug next would be appreciated.
>
> Thanks,
>
> Michael.
>
> [1] https://github.com/cloudera/hadoop-common/blob/cdh5-2.6.0_5.
> 11.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apach
> e/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3047
>
> --
>
>

Reply via email to