>From the snapshot, you got around 3TB for writing data. Can you check individual datanode's storage health. As you said you got 80 servers writing parallely to hdfs, I am not sure can that be an issue. As suggested in past threads, you can do a rebalance of the blocks but that will take some time to finish and will not solve your issue right away.
You can wait for others to reply. I am sure there will be far better solutions from experts for this. On Mon, Jun 10, 2013 at 3:18 PM, Mayank <[email protected]> wrote: > No it's not a map-reduce job. We've a java app running on around 80 > machines which writes to hdfs. The error that I'd mentioned is being thrown > by the application and yes we've replication factor set to 3 and following > is status of hdfs: > > Configured Capacity : 16.15 TB DFS Used : 11.84 TB Non DFS Used : 872.66 > GB DFS Remaining : 3.46 TB DFS Used% : 73.3 % DFS Remaining% : 21.42 % Live > Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=LIVE> > :10 Dead > Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=DEAD> > : 0 Decommissioning > Nodes<http://hmaster.production.indix.tv:50070/dfsnodelist.jsp?whatNodes=DECOMMISSIONING> > : 0 Number of Under-Replicated Blocks : 0 > > > On Mon, Jun 10, 2013 at 3:11 PM, Nitin Pawar <[email protected]>wrote: > >> when you say application errors out .. does that mean your mapreduce job >> is erroring? In that case apart from hdfs space you will need to look at >> mapred tmp directory space as well. >> >> you got 400GB * 4 * 10 = 16TB of disk and lets assume that you have a >> replication factor of 3 so at max you will have datasize of 5TB with you. >> I am also assuming you are not scheduling your program to run on entire >> 5TB with just 10 nodes. >> >> i suspect your clusters mapred tmp space is getting filled in while the >> job is running. >> >> >> >> >> >> On Mon, Jun 10, 2013 at 3:06 PM, Mayank <[email protected]> wrote: >> >>> We are running a hadoop cluster with 10 datanodes and a namenode. Each >>> datanode is setup with 4 disks (/data1, /data2, /data3, /data4), which each >>> disk having a capacity 414GB. >>> >>> >>> hdfs-site.xml has following property set: >>> >>> <property> >>> <name>dfs.data.dir</name> >>> >>> <value>/data1/hadoopfs,/data2/hadoopfs,/data3/hadoopfs,/data4/hadoopfs</value> >>> <description>Data dirs for DFS.</description> >>> </property> >>> >>> Now we are facing a issue where in we find /data1 getting filled up >>> quickly and many a times we see it's usage running at 100% with just few >>> megabytes of free space. This issue is visible on 7 out of 10 datanodes at >>> present. >>> >>> We've some java applications which are writing to hdfs and many a times >>> we are seeing foloowing errors in our application logs: >>> >>> java.io.IOException: All datanodes xxx.xxx.xxx.xxx:50010 are bad. >>> Aborting... >>> at >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3093) >>> at >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2586) >>> at >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2790) >>> >>> >>> I went through some old discussions and looks like manual rebalancing is >>> what is required in this case and we should also have >>> dfs.datanode.du.reserved set up. >>> >>> However I'd like to understand if this issue, with one disk getting >>> filled up to 100% can result into the issue which we are seeing in our >>> application. >>> >>> Also, are there any other peformance implications due to some of the >>> disks running at 100% usage on a datanode. >>> -- >>> Mayank Joshi >>> >>> Skype: mail2mayank >>> Mb.: +91 8690625808 >>> >>> Blog: http://www.techynfreesouls.co.nr >>> PhotoStream: http://picasaweb.google.com/mail2mayank >>> >>> Today is tommorrow I was so worried about yesterday ... >>> >> >> >> >> -- >> Nitin Pawar >> > > > > -- > Mayank Joshi > > Skype: mail2mayank > Mb.: +91 8690625808 > > Blog: http://www.techynfreesouls.co.nr > PhotoStream: http://picasaweb.google.com/mail2mayank > > Today is tommorrow I was so worried about yesterday ... > -- Nitin Pawar
