You kind of don't have a choice. Running out of space in a datanode is Very Bad.
We run an HDFS balance on a semi regular basis, whenever I do a bunch of node replacements/rebuilds or additions. Also over time as entropy builds and we do data cleanups. You do lose locality, but there's an easy to fix, namely to just do a major compaction. When the RS rewrites the datafile it will be local again. Also, as you do flushes and such, data will slowly become local again. --Dave On Tue, Jun 5, 2012 at 11:29 AM, Eric Raymond <[email protected]> wrote: > Hello all, > > I am not sure this goes here or not, but I wanted to get a formal answer on > using the hdfs balancer with hbase. From what I heard a while back, its > not recommended to run alongside hbase, as it can destroy the regions. How > can I rebalance my datanode/dfs cluster as, it doesnt seem to do this > automatically? I have dfs.balance.bandwidthPerSec set to 100Mbit, but I am > seeing that a majority of the nodes are reaching 99% full, while 3-4 others > are only at 30%. > > Can anyone advise on this? > > Thanks, > > Eric
