Thanks Sandeep. Yes , thats correct , I was more interested to know about the uneven distribution within the DN.
Thanks, Rahul On Fri, Jun 14, 2013 at 6:12 PM, Sandeep L <[email protected]>wrote: > Rahul, > > In general most of the times Hadoop tries to compute data locally that is, > if run a MapReduce task on particular input, > Hadoop will try compute data locally and write data locally(Majority of > times this will happen), replicate in other nodes. > > In your scenario majority of your input data may be from a single > datanode, so Hadoop is trying to write output data to same datanode. > > Thanks, > Sandeep. > ------------------------------ > From: [email protected] > Date: Fri, 14 Jun 2013 17:50:46 +0530 > > Subject: Re: Application errors with one disk on datanode getting filled > up to 100% > To: [email protected] > > > Thanks Sandeep, > > I was thinking that the overall hdfs cluster might get unbalanced over the > time and balancer might be useful in that case. > I was more interested to know why only one disk out of configured 4 disks > of the DN is getting all the writes.As per whatever I have read , writes > should be in round robin fashion , which should ideally lead to all the > configured disks in the DN to be similarly loaded. > > Not sure how the balancer is fixing this issue. > > Rgds, > Rahul > > > > On Fri, Jun 14, 2013 at 4:45 PM, Sandeep L <[email protected]>wrote: > > Rahul, > > In general this issue happens some times in Hadoop. There is no exact > reason for this. > To mitigate this you need to run balancer in regular intervals. > > Thanks, > Sandeep. > > ------------------------------ > Date: Fri, 14 Jun 2013 16:39:02 +0530 > Subject: Re: Application errors with one disk on datanode getting filled > up to 100% > From: [email protected] > To: [email protected] > > > No, as of this moment we've no ideas about the reasons for that behavior. > > > On Fri, Jun 14, 2013 at 4:04 PM, Rahul Bhattacharjee < > [email protected]> wrote: > > Thanks Mayank, Any clue on why was only one disk was getting all writes. > > Rahul > > > On Thu, Jun 13, 2013 at 11:47 AM, Mayank <[email protected]> wrote: > > So we did a manual rebalance (followed instructions at: > http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F) > and also reserved 30 GB of space for non dfs usage via > dfs.datanode.du.reserved and restarted our apps. > > Things have been going fine till now. > > Keeping fingers crossed :) > > > On Wed, Jun 12, 2013 at 12:58 PM, Rahul Bhattacharjee < > [email protected]> wrote: > > I have a few points to make , these may not be very helpful for the said > problem. > > +All data nodes are bad exception is kind of not pointing to the problem > related to disk space full. > +hadoop.tmp.dir acts as base location of other hadoop related properties , > not sure if any particular directory is created specifically. > +Only one disk getting filled looks strange.The other disk are part while > formatting the NN. > > Would be interesting to know the reason for this. > Please keep posted. > > Thanks, > Rahul > > > On Mon, Jun 10, 2013 at 3:39 PM, Nitin Pawar <[email protected]>wrote: > > From the snapshot, you got around 3TB for writing data. > > Can you check individual datanode's storage health. > As you said you got 80 servers writing parallely to hdfs, I am not sure > can that be an issue. > As suggested in past threads, you can do a rebalance of the blocks but > that will take some time to finish and will not solve your issue right > away. > > You can wait for others to reply. I am sure there will be far better > solutions from experts for this. > > > On Mon, Jun 10, 2013 at 3:18 PM, Mayank <[email protected]> wrote: > > No it's not a map-reduce job. We've a java app running on around 80 > machines which writes to hdfs. The error that I'd mentioned is being thrown > by the application and yes we've replication factor set to 3 and following > is status of hdfs: > > Configured Capacity : 16.15 TB DFS Used : 11.84 TB Non DFS Used : 872.66 > GB DFS Remaining : 3.46 TB DFS Used% : 73.3 % DFS Remaining% : 21.42 % Live > Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=LIVE> > :10 Dead > Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=DEAD> > : 0 Decommissioning > Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=DECOMMISSIONING> > : 0 Number of Under-Replicated Blocks : 0 > > > On Mon, Jun 10, 2013 at 3:11 PM, Nitin Pawar <[email protected]>wrote: > > when you say application errors out .. does that mean your mapreduce job > is erroring? In that case apart from hdfs space you will need to look at > mapred tmp directory space as well. > > you got 400GB * 4 * 10 = 16TB of disk and lets assume that you have a > replication factor of 3 so at max you will have datasize of 5TB with you. > I am also assuming you are not scheduling your program to run on entire > 5TB with just 10 nodes. > > i suspect your clusters mapred tmp space is getting filled in while the > job is running. > > > > > > On Mon, Jun 10, 2013 at 3:06 PM, Mayank <[email protected]> wrote: > > We are running a hadoop cluster with 10 datanodes and a namenode. Each > datanode is setup with 4 disks (/data1, /data2, /data3, /data4), which each > disk having a capacity 414GB. > > > hdfs-site.xml has following property set: > > <property> > <name>dfs.data.dir</name> > > <value>/data1/hadoopfs,/data2/hadoopfs,/data3/hadoopfs,/data4/hadoopfs</value> > <description>Data dirs for DFS.</description> > </property> > > Now we are facing a issue where in we find /data1 getting filled up > quickly and many a times we see it's usage running at 100% with just few > megabytes of free space. This issue is visible on 7 out of 10 datanodes at > present. > > We've some java applications which are writing to hdfs and many a times we > are seeing foloowing errors in our application logs: > > > > java.io.IOException: All datanodes xxx.xxx.xxx.xxx:50010 are bad. Aborting... > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3093) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2586) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2790) > > > > I went through some old discussions and looks like manual rebalancing is > what is required in this case and we should also have > dfs.datanode.du.reserved set up. > > However I'd like to understand if this issue, with one disk getting filled > up to 100% can result into the issue which we are seeing in our > application. > > Also, are there any other peformance implications due to some of the disks > running at 100% usage on a datanode. > -- > Mayank Joshi > > Skype: mail2mayank > Mb.: +91 8690625808 > > Blog: http://www.techynfreesouls.co.nr > PhotoStream: http://picasaweb.google.com/mail2mayank > > Today is tommorrow I was so worried about yesterday ... > > > > > -- > Nitin Pawar > > > > > -- > Mayank Joshi > > Skype: mail2mayank > Mb.: +91 8690625808 > > Blog: http://www.techynfreesouls.co.nr > PhotoStream: http://picasaweb.google.com/mail2mayank > > Today is tommorrow I was so worried about yesterday ... > > > > > -- > Nitin Pawar > > > > > > -- > Mayank Joshi > > Skype: mail2mayank > Mb.: +91 8690625808 > > Blog: http://www.techynfreesouls.co.nr > PhotoStream: http://picasaweb.google.com/mail2mayank > > Today is tommorrow I was so worried about yesterday ... > > > > > > -- > Mayank Joshi > > Skype: mail2mayank > Mb.: +91 8690625808 > > Blog: http://www.techynfreesouls.co.nr > PhotoStream: http://picasaweb.google.com/mail2mayank > > Today is tommorrow I was so worried about yesterday ... > > >
