Hi,
1.
I noticed that the output of "hadoop fs -count /" is way off when compared
to the metrics reported on Namenode JMX or NameNode UI
Stats from Namenode UI: Total files and dirs = ~30 million
Stats from CLI: Total files and dirs = ~4.2 million
$ hadoop fs -count -v -h /
DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
396.5 K 3.9 M 240.6 G /
Why is there a difference of over 25+ files+directories and a size
difference of 664 TB vs 240 GB?
I found that "hadoop fs -count" fetches info from the specified path's "
org.apache.hadoop.fs.ContentSummary
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.1/org/apache/hadoop/fs/ContentSummary.java#ContentSummary>"
class which seems to track variables called directoryCount, fileCount,
spaceConsumed etc.
Are these variable always up-to-date? Doesnt seem like it. Can I run some
command to force update?
2.
How can I find the total number of blocks used from CLI? (for entire HDFS)
I think I can get it from "hdfs fsck" but I'd rather not run that as it
seems to be very resource intensive and often times out / errors out on a
big cluster.
--
Thanks,
Guru