Re: hadoop balanceing data

2009-01-24 Thread Billy Pearson

did not thank about that good points
I found a way to keep it from happening
I set dfs.datanode.du.reserved in the config file


"Hairong Kuang"  wrote in 
message news:c59f9164.ed09%hair...@yahoo-inc.com...
%Remaining is much more fluctuate than %dfs used. This is because dfs 
shares
the disks with mapred and mapred tasks may use a lot of disks temporally. 
So

trying to keep the same %free is impossible most of the time.

Hairong


On 1/19/09 10:28 PM, "Billy Pearson" 
 wrote:



Why do we not use the Remaining % in place of use Used % when we are
selecting datanode for new data and when running the balancer.
form what I can tell we are using the use % used and we do not factor in 
non

DFS Used at all.
I see a datanode with only a 60GB hard drive fill up completely 100% 
before

the other servers that have 130+GB hard drives get half full.
Seams like Trying to keep the same % free on the drives in the cluster 
would

be more optimal in production.
I know this still may not be perfect but would be nice if we tried.

Billy










Re: hadoop balanceing data

2009-01-23 Thread Hairong Kuang
%Remaining is much more fluctuate than %dfs used. This is because dfs shares
the disks with mapred and mapred tasks may use a lot of disks temporally. So
trying to keep the same %free is impossible most of the time.

Hairong


On 1/19/09 10:28 PM, "Billy Pearson"  wrote:

> Why do we not use the Remaining % in place of use Used % when we are
> selecting datanode for new data and when running the balancer.
> form what I can tell we are using the use % used and we do not factor in non
> DFS Used at all.
> I see a datanode with only a 60GB hard drive fill up completely 100% before
> the other servers that have 130+GB hard drives get half full.
> Seams like Trying to keep the same % free on the drives in the cluster would
> be more optimal in production.
> I know this still may not be perfect but would be nice if we tried.
> 
> Billy
> 
> 



hadoop balanceing data

2009-01-19 Thread Billy Pearson
Why do we not use the Remaining % in place of use Used % when we are 
selecting datanode for new data and when running the balancer.
form what I can tell we are using the use % used and we do not factor in non 
DFS Used at all.
I see a datanode with only a 60GB hard drive fill up completely 100% before 
the other servers that have 130+GB hard drives get half full.
Seams like Trying to keep the same % free on the drives in the cluster would 
be more optimal in production.

I know this still may not be perfect but would be nice if we tried.

Billy