Re: introducing nodes w/ more storage

Kevin O'dell Thu, 02 Apr 2015 08:32:42 -0700

Mike,

  I agree with all of the above, I am just saying from experience, even
clusters that do not run HBase at all rarely run the HDFS balancer except
when doing major overhauls such as adding nodes/racks.



And no, you do not use a data node as an edge node.
(Really saying that? C’mon, really? ) Never a good design. Ever. <--
Sometimes you have to make do with what you got :)


On Thu, Apr 2, 2015 at 10:33 AM, Michael Segel <[email protected]>
wrote:

>
>
> When you say … "It is not recommended to *ever run the HDFS balancer on a
> cluster running HBase. “ … thats a very scary statement.
>
> Not really a good idea.  Unless you are building a cluster for a specific
> use case.
>
>
> When you look at the larger picture… in most use cases, the cluster will
> contain more data in flat files (HDFS) than they would inside HBase.
> (which you allude to in you last paragraph) so balancing is a good idea.
> (Even manual processes can be run in cron jobs ;-)
>
> And no, you do not use a data node as an edge node.
> (Really saying that? C’mon, really? ) Never a good design. Ever.
>
>
> I agree that you should run major compactions after running the load
> balancer. (HDFS)
> But the point I am trying to make is that with respect to HBase, you still
> need to think about the cluster as a whole.
>
>
> > On Apr 2, 2015, at 7:41 AM, Kevin O'dell <[email protected]>
> wrote:
> >
> > Hi Mike,
> >
> >  Sorry for the delay here.
> >
> > How does the HDFS load balancer impact the load balancing of HBase? <--
> The
> > HDFS load balancer is not automatically run, it is a manual process that
> is
> > kicked off. It is not recommended to *ever run the HDFS balancer on a
> > cluster running HBase.  Similar to have HBase has no concept or care
> about
> > the underlying storage, HDFS has no concept or care of the region layout,
> > nor the locality we worked so hard to build through compactions.
> >
> > Furthermore, once the HDFS balancer has saved us from running out of
> space
> > on the smaller nodes, we will run a major compaction, and re-write all of
> > the HBase data right back to where it was before.
> >
> > one is the number of regions managed by a region server that’s HBase’s
> > load, right? And then there’s the data distribution of HBase files that
> is
> > really managed by HDFS load balancer, right? <--- Right, until we run
> major
> > compaction and "restore" locality by moving the data back
> >
> > Even still… eventually the data will be distributed equally across the
> > cluster. What’s happening with the HDFS balancer?  Is that heterogenous
> or
> > homogenous in terms of storage? <-- Not quite, as I said before the HDFS
> > balancer is manual, so it is quite easy to build up a skew, especially if
> > you use a datanode as an edge node or thrift gateway etc.  Yes, the HDFS
> > balancer is heterogenous, but it doesn't play nice with HBase.
> >
> > *The use of the word ever should not be construed as a true definitive.
> > Ever is being used to represent a best practice.  In many cases the HDFS
> > balancer needs to be run, especially in multi-tenant clusters
> > with archive data.  It is best to immediately run a major compaction to
> > restore HBase locality if the HDFS balancer is used.
> >
> > On Mon, Mar 23, 2015 at 10:50 AM, Michael Segel <
> [email protected]>
> > wrote:
> >
> >> @lars,
> >>
> >> How does the HDFS load balancer impact the load balancing of HBase?
> >>
> >> Of course there are two loads… one is the number of regions managed by a
> >> region server that’s HBase’s load, right?
> >> And then there’s the data distribution of HBase files that is really
> >> managed by HDFS load balancer, right?
> >>
> >> OP’s question is having a heterogenous cluster where he would like to
> see
> >> a more even distribution of data/free space based on the capacity of the
> >> newer machines in the cluster.
> >>
> >> This is a storage question, not a memory/cpu core question.
> >>
> >> Or am I missing something?
> >>
> >>
> >> -Mike
> >>
> >>> On Mar 22, 2015, at 10:56 PM, lars hofhansl <[email protected]> wrote:
> >>>
> >>> Seems that it should not be too hard to add that to the stochastic load
> >> balancer.
> >>> We could add a spaceCost or something.
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>> From: Jean-Marc Spaggiari <[email protected]>
> >>> To: user <[email protected]>
> >>> Cc: Development <[email protected]>
> >>> Sent: Thursday, March 19, 2015 12:55 PM
> >>> Subject: Re: introducing nodes w/ more storage
> >>>
> >>> You can extend the default balancer and assign the regions based on
> >>> that.But at the end, the replicated blocks might still go all over the
> >>> cluster and your "small" nodes are going to be full and will not be
> able
> >> to
> >>> get anymore writes even for the regions they are supposed to get.
> >>>
> >>> I'm not sure there is a good solution for what you are looking for :(
> >>>
> >>> I build my own balancer but because of differences in the CPUs, not
> >> because
> >>> of differences of the storage space...
> >>>
> >>>
> >>> 2015-03-19 15:50 GMT-04:00 Nick Dimiduk <[email protected]>:
> >>>
> >>>> Seems more fantasy than fact, I'm afraid. The default load balancer
> [0]
> >>>> takes store file size into account, but has no concept of capacity. It
> >>>> doesn't know that nodes in a heterogenous environment have different
> >>>> capacity.
> >>>>
> >>>> This would be a good feature to add though.
> >>>>
> >>>> [0]:
> >>>>
> >>>>
> >>
> https://github.com/apache/hbase/blob/branch-1.0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
> >>>>
> >>>> On Tue, Mar 17, 2015 at 7:26 AM, Ted Tuttle <[email protected]>
> >> wrote:
> >>>>
> >>>>> Hello-
> >>>>>
> >>>>> Sometime back I asked a question about introducing new nodes w/ more
> >>>>> storage that existing nodes.  I was told at the time that HBase will
> >> not
> >>>> be
> >>>>> able to utilize the additional storage; I assumed at the time that
> >>>> regions
> >>>>> are allocated to nodes in something like a round-robin fashion and
> the
> >>>> node
> >>>>> with the least storage sets the limit for how much each node can
> >> utilize.
> >>>>>
> >>>>> My question this time around has to do with nodes w/ unequal numbers
> of
> >>>>> volumes: Does HBase allocate regions based on nodes or volumes on the
> >>>>> nodes?  I am hoping I can add a node with 8 volumes totaling 8X TB
> and
> >>>> all
> >>>>> the volumes will be filled.  This even though legacy nodes have 5
> >> volumes
> >>>>> and total storage of 5X TB.
> >>>>>
> >>>>> Fact or fantasy?
> >>>>>
> >>>>> Thanks,
> >>>>> Ted
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >> The opinions expressed here are mine, while they may reflect a cognitive
> >> thought, that is purely accidental.
> >> Use at your own risk.
> >> Michael Segel
> >> michael_segel (AT) hotmail.com
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> > --
> > Kevin O'Dell
> > Field Enablement, Cloudera
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>


-- 
Kevin O'Dell
Field Enablement, Cloudera

Re: introducing nodes w/ more storage

Reply via email to