Please check your logs directory usage.
On Mon, Apr 14, 2014 at 12:08 PM, Biswajit Nayak <biswajit.na...@inmobi.com>wrote: > Whats the replication factor you have? I believe it should be 3. hadoop > dus shows that disk usage without replication. While name node ui page > gives with replication. > > 38gb * 3 =114gb ~ 1TB > > ~Biswa > -----oThe important thing is not to stop questioning o----- > > > On Mon, Apr 14, 2014 at 9:38 AM, Saumitra <saumitra.offic...@gmail.com>wrote: > >> Hi Biswajeet, >> >> Non-dfs usage is ~100GB over the cluster. But still the number are >> nowhere near 1TB. >> >> Basically I wanted to point out discrepancy in name node status page and >> hadoop >> dfs -dus. In my case, earlier one reports DFS usage as 1TB and later one >> reports it to be 35GB. What are the factors that can cause this difference? >> And why is just 35GB data causing DFS to hit its limits? >> >> >> >> >> On 14-Apr-2014, at 8:31 am, Biswajit Nayak <biswajit.na...@inmobi.com> >> wrote: >> >> Hi Saumitra, >> >> Could you please check the non-dfs usage. They also contribute to filling >> up the disk space. >> >> >> >> ~Biswa >> -----oThe important thing is not to stop questioning o----- >> >> >> On Mon, Apr 14, 2014 at 1:24 AM, Saumitra <saumitra.offic...@gmail.com>wrote: >> >>> Hello, >>> >>> We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. >>> We are using default HDFS block size. >>> >>> We have noticed that disks of slaves are almost full. From name node’s >>> status page (namenode:50070), we could see that disks of live nodes are 90% >>> full and DFS Used% in cluster summary page is ~1TB. >>> >>> However hadoop dfs -dus / shows that file system size is merely 38GB. >>> 38GB number looks to be correct because we keep only few Hive tables and >>> hadoop’s /tmp (distributed cache and job outputs) in HDFS. All other data >>> is cleaned up. I cross-checked this from hadoop dfs -ls. Also I think >>> that there is no internal fragmentation because the files in our Hive >>> tables are well-chopped in ~50MB chunks. Here are last few lines of >>> hadoop fsck / -files -blocks >>> >>> Status: HEALTHY >>> Total size: 38086441332 B >>> Total dirs: 232 >>> Total files: 802 >>> Total blocks (validated): 796 (avg. block size 47847288 B) >>> Minimally replicated blocks: 796 (100.0 %) >>> Over-replicated blocks: 0 (0.0 %) >>> Under-replicated blocks: 6 (0.75376886 %) >>> Mis-replicated blocks: 0 (0.0 %) >>> Default replication factor: 2 >>> Average block replication: 3.0439699 >>> Corrupt blocks: 0 >>> Missing replicas: 6 (0.24762692 %) >>> Number of data-nodes: 9 >>> Number of racks: 1 >>> FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds >>> >>> >>> My question is that why disks of slaves are getting full even though >>> there are only few files in DFS? >>> >> >> >> _____________________________________________________________ >> The information contained in this communication is intended solely for >> the use of the individual or entity to whom it is addressed and others >> authorized to receive it. It may contain confidential or legally privileged >> information. If you are not the intended recipient you are hereby notified >> that any disclosure, copying, distribution or taking any action in reliance >> on the contents of this information is strictly prohibited and may be >> unlawful. If you have received this communication in error, please notify >> us immediately by responding to this email and then delete it from your >> system. The firm is neither liable for the proper and complete transmission >> of the information contained in this communication nor for any delay in its >> receipt. >> >> >> > > _____________________________________________________________ > The information contained in this communication is intended solely for the > use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally privileged > information. If you are not the intended recipient you are hereby notified > that any disclosure, copying, distribution or taking any action in reliance > on the contents of this information is strictly prohibited and may be > unlawful. If you have received this communication in error, please notify > us immediately by responding to this email and then delete it from your > system. The firm is neither liable for the proper and complete transmission > of the information contained in this communication nor for any delay in its > receipt. > -- --Regards Sandeep Nemuri