Re: hadoop balanceing data
did not thank about that good points I found a way to keep it from happening I set dfs.datanode.du.reserved in the config file "Hairong Kuang" wrote in message news:c59f9164.ed09%hair...@yahoo-inc.com... %Remaining is much more fluctuate than %dfs used. This is because dfs shares the disks with mapred and mapred tasks may use a lot of disks temporally. So trying to keep the same %free is impossible most of the time. Hairong On 1/19/09 10:28 PM, "Billy Pearson" wrote: Why do we not use the Remaining % in place of use Used % when we are selecting datanode for new data and when running the balancer. form what I can tell we are using the use % used and we do not factor in non DFS Used at all. I see a datanode with only a 60GB hard drive fill up completely 100% before the other servers that have 130+GB hard drives get half full. Seams like Trying to keep the same % free on the drives in the cluster would be more optimal in production. I know this still may not be perfect but would be nice if we tried. Billy
Re: hadoop balanceing data
%Remaining is much more fluctuate than %dfs used. This is because dfs shares the disks with mapred and mapred tasks may use a lot of disks temporally. So trying to keep the same %free is impossible most of the time. Hairong On 1/19/09 10:28 PM, "Billy Pearson" wrote: > Why do we not use the Remaining % in place of use Used % when we are > selecting datanode for new data and when running the balancer. > form what I can tell we are using the use % used and we do not factor in non > DFS Used at all. > I see a datanode with only a 60GB hard drive fill up completely 100% before > the other servers that have 130+GB hard drives get half full. > Seams like Trying to keep the same % free on the drives in the cluster would > be more optimal in production. > I know this still may not be perfect but would be nice if we tried. > > Billy > >
hadoop balanceing data
Why do we not use the Remaining % in place of use Used % when we are selecting datanode for new data and when running the balancer. form what I can tell we are using the use % used and we do not factor in non DFS Used at all. I see a datanode with only a 60GB hard drive fill up completely 100% before the other servers that have 130+GB hard drives get half full. Seams like Trying to keep the same % free on the drives in the cluster would be more optimal in production. I know this still may not be perfect but would be nice if we tried. Billy