Re: [E] Re: [SURVEY] The current usage of favor node balancer across the community

Thiruvel Thirumoolan Fri, 30 Apr 2021 12:47:59 -0700

We at VerizonMedia (Yahoo!) have been running FavoredNodes for about 4
years now in production and it has helped us a lot with our scale.


Since we started working on this many years ago, we have contributed
patches to upstream, although not much recently. We will resume
contributing the remaining patches as part of our migration to 2.x. They
will be part of https://issues.apache.org/jira/browse/HBASE-15531. Most of
the code lives in the Favored Node based classes and I think that's helpful
for maintenance as well.

Hello Mallikarjun,
We are glad to see you use and benefit from it. As Stack mentioned, it
would be good to see a writeup of your experience and enhancements.

Thanks!
Thiruvel

On Tue, Apr 27, 2021 at 1:35 PM Stack <st...@duboce.net> wrote:

> On Mon, Apr 26, 2021 at 7:30 PM Mallikarjun <mallik.v.ar...@gmail.com>
> wrote:
>
> > Inline reply
> >
> > On Tue, Apr 27, 2021 at 1:03 AM Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Apr 26, 2021 at 12:30 PM Stack <st...@duboce.net> wrote:
> > >
> > > > On Mon, Apr 26, 2021 at 8:10 AM Mallikarjun <
> mallik.v.ar...@gmail.com>
> > > > wrote:
> > > >
> > > >> We use FavoredStochasticBalancer, which by description says the same
> > > thing
> > > >> as FavoredNodeLoadBalancer. Ignoring that fact, problem appears to
> be
> > > >>
> > > >>
> > > >
> > > > Other concerns:
> > > >
> > > >  * Hard-coded triplet of nodes that will inevitably rot as machines
> > come
> > > > and go (Are there tools for remediation?)
> > >
> >
> > It doesn't really rot, if you think it with balancer responsible to
> > assigning regions
> >
> > 1. On every region assigned to a particular regionserver, the balancer
> > would have to reassign this triplet and hence there is no scope of rot
> > (Same logic applied to WAL as well). (On compaction hdfs blocks will be
> > pulled back if any spill over)
> >
> >
> I don't follow the above but no harm; I can wait for the write-up (smile).
>
>
>
> > 2. We used hostnames only (so, come and go is not going to be new nodes
> but
> > same hostnames)
> >
> >
> Ack.
>
>
> > Couple of outstanding problems though.
> >
> > 1. We couldn't increase replication factor to > 3. Which was fine so far
> > for our use cases. But we have had thoughts around fixing them.
> >
> >
> Not the end-of-the-world I'd say. Would be nice to have though.
>
>
>
> > 2. Balancer doesn't understand favored nodes construct, perfect balanced
> fn
> > among the rsgroup datanodes isn't possible, but with some variance like
> > 10-20% difference is expected
> >
> >
> Can be worked on.....
>
>
>
> >
> > > >  * A workaround for a facility that belongs in the NN
> > >
> >
> > Probably, you can argue both ways. Hbase is the owner of data
>
>
>
> Sort-of. NN hands out where replicas should be placed according to its
> configured policies. Then there is the HDFS balancer....
>
> ....
>
>
>
> > One more concern was that the feature was dead/unused. You seem to refute
> > > this notion of mine.
> > > S
> > >
> >
> > We have been using this for more than a year with hbase 2.1 in highly
> > critical workloads for our company. And several years with hbase 1.2 as
> > well with backporting rsgroup from master at that time. (2017-18 ish)
> >
> > And it has been very smooth operationally in hbase 2.1
> >
> >
> Sweet.
>
> Trying to get the other FN users to show up here on this thread to speak of
> their experience....
>
> Thanks for speaking up,
> S
>
>
> >
> > >
> > >
> > > >
> > > >
> > > >> Going a step back.
> > > >>
> > > >> Did we ever consider giving a thought towards truely multi-tenant
> > hbase?
> > > >>
> > > >
> > > > Always.
> > > >
> > > >
> > > >> Where each rsgroup has a group of datanodes and namespace tables
> data
> > > >> created under that particular rsgroup would sit on those datanodes
> > only?
> > > >> We
> > > >> have attempted to do that and have largely been very successful
> > running
> > > >> clusters of hundreds of terabytes with hundreds of
> > > >> regionservers(datanodes)
> > > >> per cluster.
> > > >>
> > > >>
> > > > So isolation of load by node? (I believe this is where the rsgroup
> > > feature
> > > > came from originally; the desire for a deploy like you describe
> above.
> > > > IIUC, its what Thiru and crew run).
> > > >
> > > >
> > > >
> > > >> 1. We use a modified version of RSGroupBasedFavoredNodeLoadBalancer
> > > >> contributed by Thiruvel Thirumoolan -->
> > > >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HBASE-2D15533&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0hlxL7SW6tggJAukksUnJkMcDJvBO7w1VPzLNvS9baM&m=CxuaNJnoCdjCh384YN2AaTDxatsoQd5j15UwzOOyEtg&s=uLteKIWCi9WvsVltCIY01718uVnqkLYKTpRPlz_Ke1o&e=
> > > >>
> > > >> On each balance operation, while the region is moved around (or
> while
> > > >> creating table), favored nodes are assigned based on the rsgroup
> that
> > > >> region is pinned to. And hence data is pinned to those datanodes
> only
> > > >> (Pinning favored nodes is best effort from the hdfs side, but there
> > are
> > > >> only a few exception scenarios where data will be spilled over and
> > they
> > > >> recover after a major compaction).
> > > >>
> > > >>
> > > > Sounds like you have studied this deploy in operation. Write it up?
> > Blog
> > > > post on hbase.apache.org?
> > > >
> > >
> >
> > Definitely will write up.
> >
> >
> > > >
> > > >
> > > >> 2. We have introduced several balancer cost functions to restore
> > things
> > > to
> > > >> normalcy (multi tenancy with fn pinning) such as when a node is
> dead,
> > or
> > > >> when fn's are imbalanced within the same rsgroup, etc.
> > > >>
> > > >> 3. We had diverse workloads under the same cluster and WAL isolation
> > > >> became
> > > >> a requirement and we went ahead with similar philosophy mentioned in
> > > line
> > > >> 1. Where WAL's are created with FN pinning so that they are tied to
> > > >> datanodes belonging to the same rsgroup. Some discussion seems to
> have
> > > >> happened here -->
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HBASE-2D21641&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0hlxL7SW6tggJAukksUnJkMcDJvBO7w1VPzLNvS9baM&m=CxuaNJnoCdjCh384YN2AaTDxatsoQd5j15UwzOOyEtg&s=D4DMuFwLNtesIeoCU9QIwHYM6I3eEPKrBUEYuIoZJ9A&e=
> > > >>
> > > >> There are several other enhancements we have worked on with respect
> to
> > > >> rsgroup aware export snapshot, rsaware regionmover, rsaware cluster
> > > >> replication, etc.
> > > >>
> > > >> For above use cases, we would be needing fn information on
> hbase:meta.
> > > >>
> > > >> If the use case seems to be a fit for how we would want hbase to be
> > > taken
> > > >> forward as one of the supported use cases, willing to contribute our
> > > >> changes back to the community. (I was anyway planning to initiate
> this
> > > >> discussion)
> > > >>
> > > >
> > > > Contribs always welcome.
> > >
> >
> > Happy to see our thoughts are in line. We will prepare a plan on these
> > contributions.
> >
> >
> > > >
> > > > Thanks Malilkarjun,
> > > > S
> > > >
> > > >
> > > >
> > > >>
> > > >> To strengthen the above use case. Here is what one of our multi
> tenant
> > > >> cluster looks like
> > > >>
> > > >> RSGroups(Tenants): 21 (With tenant isolation)
> > > >> Regionservers: 275
> > > >> Regions Hosted: 6k
> > > >> Tables Hosted: 87
> > > >> Capacity: 250 TB (100TB used)
> > > >>
> > > >> ---
> > > >> Mallikarjun
> > > >>
> > > >>
> > > >> On Mon, Apr 26, 2021 at 9:15 AM 张铎(Duo Zhang) <
> palomino...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > As you all know, we always want to reduce the size of the
> > hbase-server
> > > >> > module. This time we want to separate the balancer related code to
> > > >> another
> > > >> > sub module.
> > > >> >
> > > >> > The design doc:
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://docs.google.com/document/d/1T7WSgcQBJTtbJIjqi8sZYLxD2Z7JbIHx4TJaKKdkBbE/edit#
> > > >> >
> > > >> > You can see the bottom of the design doc, favor node balancer is a
> > > >> problem,
> > > >> > as it stores the favor node information in hbase:meta. Stack
> > mentioned
> > > >> that
> > > >> > the feature is already dead, maybe we could just purge it from our
> > > code
> > > >> > base.
> > > >> >
> > > >> > So here we want to know if there are still some users in the
> > community
> > > >> who
> > > >> > still use favor node balancer. Please share your experience and
> > > whether
> > > >> you
> > > >> > still want to use it.
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [E] Re: [SURVEY] The current usage of favor node balancer across the community

Reply via email to