block counts from CLI

Guru Prateek Pinnadhari Fri, 24 Feb 2017 14:52:05 -0800

Hi,

1.


I noticed that the output of "hadoop fs -count /" is way off when compared
to the metrics reported on Namenode JMX or NameNode UI


Stats from Namenode UI: Total files and dirs = ~30 million



Stats from CLI: Total files and dirs = ~4.2 million

$ hadoop fs -count -v -h /

   DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME

     396.5 K        3.9 M            240.6 G /


Why is there a difference of over 25+ files+directories and a size
difference of 664 TB vs 240 GB?


I found that "hadoop fs -count" fetches info from the specified path's "
org.apache.hadoop.fs.ContentSummary
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.1/org/apache/hadoop/fs/ContentSummary.java#ContentSummary>"
class which seems to track variables called directoryCount, fileCount,
spaceConsumed etc.

Are these variable always up-to-date? Doesnt seem like it. Can I run some
command to force update?


2.

How can I find the total number of blocks used from CLI? (for entire HDFS)

I think I can get it from "hdfs fsck" but I'd rather not run that as it
seems to be very resource intensive and often times out / errors out on a
big cluster.




-- 
Thanks,
Guru

Getting HDFS dir/file/block counts from CLI

Reply via email to