Mike, I agree with all of the above, I am just saying from experience, even clusters that do not run HBase at all rarely run the HDFS balancer except when doing major overhauls such as adding nodes/racks.
And no, you do not use a data node as an edge node. (Really saying that? C’mon, really? ) Never a good design. Ever. <-- Sometimes you have to make do with what you got :) On Thu, Apr 2, 2015 at 10:33 AM, Michael Segel <[email protected]> wrote: > > > When you say … "It is not recommended to *ever run the HDFS balancer on a > cluster running HBase. “ … thats a very scary statement. > > Not really a good idea. Unless you are building a cluster for a specific > use case. > > > When you look at the larger picture… in most use cases, the cluster will > contain more data in flat files (HDFS) than they would inside HBase. > (which you allude to in you last paragraph) so balancing is a good idea. > (Even manual processes can be run in cron jobs ;-) > > And no, you do not use a data node as an edge node. > (Really saying that? C’mon, really? ) Never a good design. Ever. > > > I agree that you should run major compactions after running the load > balancer. (HDFS) > But the point I am trying to make is that with respect to HBase, you still > need to think about the cluster as a whole. > > > > On Apr 2, 2015, at 7:41 AM, Kevin O'dell <[email protected]> > wrote: > > > > Hi Mike, > > > > Sorry for the delay here. > > > > How does the HDFS load balancer impact the load balancing of HBase? <-- > The > > HDFS load balancer is not automatically run, it is a manual process that > is > > kicked off. It is not recommended to *ever run the HDFS balancer on a > > cluster running HBase. Similar to have HBase has no concept or care > about > > the underlying storage, HDFS has no concept or care of the region layout, > > nor the locality we worked so hard to build through compactions. > > > > Furthermore, once the HDFS balancer has saved us from running out of > space > > on the smaller nodes, we will run a major compaction, and re-write all of > > the HBase data right back to where it was before. > > > > one is the number of regions managed by a region server that’s HBase’s > > load, right? And then there’s the data distribution of HBase files that > is > > really managed by HDFS load balancer, right? <--- Right, until we run > major > > compaction and "restore" locality by moving the data back > > > > Even still… eventually the data will be distributed equally across the > > cluster. What’s happening with the HDFS balancer? Is that heterogenous > or > > homogenous in terms of storage? <-- Not quite, as I said before the HDFS > > balancer is manual, so it is quite easy to build up a skew, especially if > > you use a datanode as an edge node or thrift gateway etc. Yes, the HDFS > > balancer is heterogenous, but it doesn't play nice with HBase. > > > > *The use of the word ever should not be construed as a true definitive. > > Ever is being used to represent a best practice. In many cases the HDFS > > balancer needs to be run, especially in multi-tenant clusters > > with archive data. It is best to immediately run a major compaction to > > restore HBase locality if the HDFS balancer is used. > > > > On Mon, Mar 23, 2015 at 10:50 AM, Michael Segel < > [email protected]> > > wrote: > > > >> @lars, > >> > >> How does the HDFS load balancer impact the load balancing of HBase? > >> > >> Of course there are two loads… one is the number of regions managed by a > >> region server that’s HBase’s load, right? > >> And then there’s the data distribution of HBase files that is really > >> managed by HDFS load balancer, right? > >> > >> OP’s question is having a heterogenous cluster where he would like to > see > >> a more even distribution of data/free space based on the capacity of the > >> newer machines in the cluster. > >> > >> This is a storage question, not a memory/cpu core question. > >> > >> Or am I missing something? > >> > >> > >> -Mike > >> > >>> On Mar 22, 2015, at 10:56 PM, lars hofhansl <[email protected]> wrote: > >>> > >>> Seems that it should not be too hard to add that to the stochastic load > >> balancer. > >>> We could add a spaceCost or something. > >>> > >>> > >>> > >>> ----- Original Message ----- > >>> From: Jean-Marc Spaggiari <[email protected]> > >>> To: user <[email protected]> > >>> Cc: Development <[email protected]> > >>> Sent: Thursday, March 19, 2015 12:55 PM > >>> Subject: Re: introducing nodes w/ more storage > >>> > >>> You can extend the default balancer and assign the regions based on > >>> that.But at the end, the replicated blocks might still go all over the > >>> cluster and your "small" nodes are going to be full and will not be > able > >> to > >>> get anymore writes even for the regions they are supposed to get. > >>> > >>> I'm not sure there is a good solution for what you are looking for :( > >>> > >>> I build my own balancer but because of differences in the CPUs, not > >> because > >>> of differences of the storage space... > >>> > >>> > >>> 2015-03-19 15:50 GMT-04:00 Nick Dimiduk <[email protected]>: > >>> > >>>> Seems more fantasy than fact, I'm afraid. The default load balancer > [0] > >>>> takes store file size into account, but has no concept of capacity. It > >>>> doesn't know that nodes in a heterogenous environment have different > >>>> capacity. > >>>> > >>>> This would be a good feature to add though. > >>>> > >>>> [0]: > >>>> > >>>> > >> > https://github.com/apache/hbase/blob/branch-1.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java > >>>> > >>>> On Tue, Mar 17, 2015 at 7:26 AM, Ted Tuttle <[email protected]> > >> wrote: > >>>> > >>>>> Hello- > >>>>> > >>>>> Sometime back I asked a question about introducing new nodes w/ more > >>>>> storage that existing nodes. I was told at the time that HBase will > >> not > >>>> be > >>>>> able to utilize the additional storage; I assumed at the time that > >>>> regions > >>>>> are allocated to nodes in something like a round-robin fashion and > the > >>>> node > >>>>> with the least storage sets the limit for how much each node can > >> utilize. > >>>>> > >>>>> My question this time around has to do with nodes w/ unequal numbers > of > >>>>> volumes: Does HBase allocate regions based on nodes or volumes on the > >>>>> nodes? I am hoping I can add a node with 8 volumes totaling 8X TB > and > >>>> all > >>>>> the volumes will be filled. This even though legacy nodes have 5 > >> volumes > >>>>> and total storage of 5X TB. > >>>>> > >>>>> Fact or fantasy? > >>>>> > >>>>> Thanks, > >>>>> Ted > >>>>> > >>>>> > >>>> > >>> > >> > >> The opinions expressed here are mine, while they may reflect a cognitive > >> thought, that is purely accidental. > >> Use at your own risk. > >> Michael Segel > >> michael_segel (AT) hotmail.com > >> > >> > >> > >> > >> > >> > > > > > > -- > > Kevin O'Dell > > Field Enablement, Cloudera > > The opinions expressed here are mine, while they may reflect a cognitive > thought, that is purely accidental. > Use at your own risk. > Michael Segel > michael_segel (AT) hotmail.com > > > > > > -- Kevin O'Dell Field Enablement, Cloudera
