I don’t know that it is such a good idea. Let me ask it this way…
What are you balancing with the HBase load balancer? Locations of HFiles on HDFS or which RS is responsible for the HFile? -Mike > On Apr 2, 2015, at 12:42 PM, lars hofhansl <[email protected]> wrote: > > What Kevin says. > The best we can do is exclude the HBase from the HDFS balancer (HDF > S-6133).The HDFS balancer will destroy data locality for HBase. If you don't > care - maybe you have a fat network tree, and your network bandwidth matches > the aggregate disk throughput for each machine - you can run it. Even then as > Kevin says, HBase will just happily rewrite it as before. > > Balancing of HBase data has to happen on the HBase level. Then we have to > decide what we use as a basis for distribution.CPU? RAM? disk space? IOPs? > disk throughput? It depends... So some configurable function of those. > -- Lars > > From: Kevin O'dell <[email protected]> > To: "[email protected]" <[email protected]> > Cc: lars hofhansl <[email protected]> > Sent: Thursday, April 2, 2015 5:41 AM > Subject: Re: introducing nodes w/ more storage > > Hi Mike, > Sorry for the delay here. > How does the HDFS load balancer impact the load balancing of HBase? <-- The > HDFS load balancer is not automatically run, it is a manual process that is > kicked off. It is not recommended to *ever run the HDFS balancer on a cluster > running HBase. Similar to have HBase has no concept or care about the > underlying storage, HDFS has no concept or care of the region layout, nor the > locality we worked so hard to build through compactions. > > Furthermore, once the HDFS balancer has saved us from running out of space on > the smaller nodes, we will run a major compaction, and re-write all of the > HBase data right back to where it was before. > one is the number of regions managed by a region server that’s HBase’s load, > right? And then there’s the data distribution of HBase files that is really > managed by HDFS load balancer, right? <--- Right, until we run major > compaction and "restore" locality by moving the data back > > Even still… eventually the data will be distributed equally across the > cluster. What’s happening with the HDFS balancer? Is that heterogenous or > homogenous in terms of storage? <-- Not quite, as I said before the HDFS > balancer is manual, so it is quite easy to build up a skew, especially if you > use a datanode as an edge node or thrift gateway etc. Yes, the HDFS balancer > is heterogenous, but it doesn't play nice with HBase. > > *The use of the word ever should not be construed as a true definitive. Ever > is being used to represent a best practice. In many cases the HDFS balancer > needs to be run, especially in multi-tenant clusters with archive data. It > is best to immediately run a major compaction to restore HBase locality if > the HDFS balancer is used. > > > On Mon, Mar 23, 2015 at 10:50 AM, Michael Segel <[email protected]> > wrote: > > @lars, > > How does the HDFS load balancer impact the load balancing of HBase? > > Of course there are two loads… one is the number of regions managed by a > region server that’s HBase’s load, right? > And then there’s the data distribution of HBase files that is really managed > by HDFS load balancer, right? > > OP’s question is having a heterogenous cluster where he would like to see a > more even distribution of data/free space based on the capacity of the newer > machines in the cluster. > > This is a storage question, not a memory/cpu core question. > > Or am I missing something? > > > -Mike > >> On Mar 22, 2015, at 10:56 PM, lars hofhansl <[email protected]> wrote: >> >> Seems that it should not be too hard to add that to the stochastic load >> balancer. >> We could add a spaceCost or something. >> >> >> >> ----- Original Message ----- >> From: Jean-Marc Spaggiari <[email protected]> >> To: user <[email protected]> >> Cc: Development <[email protected]> >> Sent: Thursday, March 19, 2015 12:55 PM >> Subject: Re: introducing nodes w/ more storage >> >> You can extend the default balancer and assign the regions based on >> that.But at the end, the replicated blocks might still go all over the >> cluster and your "small" nodes are going to be full and will not be able to >> get anymore writes even for the regions they are supposed to get. >> >> I'm not sure there is a good solution for what you are looking for :( >> >> I build my own balancer but because of differences in the CPUs, not because >> of differences of the storage space... >> >> >> 2015-03-19 15:50 GMT-04:00 Nick Dimiduk <[email protected]>: >> >>> Seems more fantasy than fact, I'm afraid. The default load balancer [0] >>> takes store file size into account, but has no concept of capacity. It >>> doesn't know that nodes in a heterogenous environment have different >>> capacity. >>> >>> This would be a good feature to add though. >>> >>> [0]: >>> >>> https://github.com/apache/hbase/blob/branch-1.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java >>> >>> On Tue, Mar 17, 2015 at 7:26 AM, Ted Tuttle <[email protected]> wrote: >>> >>>> Hello- >>>> >>>> Sometime back I asked a question about introducing new nodes w/ more >>>> storage that existing nodes. I was told at the time that HBase will not >>> be >>>> able to utilize the additional storage; I assumed at the time that >>> regions >>>> are allocated to nodes in something like a round-robin fashion and the >>> node >>>> with the least storage sets the limit for how much each node can utilize. >>>> >>>> My question this time around has to do with nodes w/ unequal numbers of >>>> volumes: Does HBase allocate regions based on nodes or volumes on the >>>> nodes? I am hoping I can add a node with 8 volumes totaling 8X TB and >>> all >>>> the volumes will be filled. This even though legacy nodes have 5 volumes >>>> and total storage of 5X TB. >>>> >>>> Fact or fantasy? >>>> >>>> Thanks, >>>> Ted >>>> >>>> >>> >> > > The opinions expressed here are mine, while they may reflect a cognitive > thought, that is purely accidental. > Use at your own risk. > Michael Segel > michael_segel (AT) hotmail.com > > > > > > > > > > -- > Kevin O'Dell > Field Enablement, Cloudera > The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
