Hi Mallikarjun, thanks for the response.

I agree that it is hard to fully mitigate a bad rowkey design. We do make
pretty heavy use of hash prefixes, and we don't really have many examples
of the common problem you describe where the "latest" data is in 1-2
regions. Our distribution issues instead come from the fact that we have
customers both big and small living in the same table -- a small customer
might have only a few rows and might only use the product enough to access
those rows a few times a day. A large cluster might have thousands or
millions of rows, and be constantly using the product and accessing those
rows.

While we do use hash prefixes, developers need to choose what to put in
that hash -- for example if the rowkey is customerId + emailAddress +
somethingElse, ideally the hash could at the very least be
hash(customerId + emailAddress). This would result in a relatively well
distributed schema, assuming the cardinality of somethingElse wasn't too
big. But often developers might need to scan data for a given customer, so
the hash might just be hash(customerId). For a large customer with many
email addresses, this presents a problem. In these cases we do try to lean
on small index tables + multi gets where possible, but that's not always
possible and doesn't fully solve the potential for hotspotting.

We have about 1000 tables across 40 clusters and over 5000 regionservers.
This sort of scenario plays out over and over, where large customers end up
causing certain regions to get more traffic than others. One might hope
that these "heavy" regions might also be bigger in size, if read and write
volume were relatively proportional. If that were true, the HFile
size-based normalizer might be enough. For us though, that is not always
the case. My thought is to identify these read-heavy regions and split them
so we can better distribute the load with the balancer.

Perhaps this problem is very specific to our use-case, which is why this
problem has not been more generally solved by a load-based normalizer.

On Mon, May 17, 2021 at 10:09 PM Mallikarjun <mallik.v.ar...@gmail.com>
wrote:

> I think, no matter how good a balancer cost function be written, it cannot
> cover for a not so optimal row key design. Say for example, you have 10
> regionservers, 100 regions and your application is heavy on the latest data
> which is mostly 1 or 2 regions, how many ever splits and/or merges it
> becomes very hard to balance the load among the regionservers.
>
> Here is how we have solved this problem among our clients. Which might not
> work for existing clients, but can be a thought for new clients.
>
> Every request with a row key goes through the enrichment process, which
> prefixes with a hash (from say murmurhash) based on the client requested
> distribution (this stays throughout the lifetime of that table for that
> client). Also We wrote a hbase client abstraction to take care of this in a
> seamless manager for our clients.
>
> Example: Actual row key --> *0QUPHSBTLGM*, and client requested a 3 digit
> prefix based on table region range (000 - 999), would translate to
> *115-0QUPHSBTLGM* with murmurhash
>
> ---
> Mallikarjun
>
>
> On Tue, May 18, 2021 at 1:33 AM Bryan Beaudreault
> <bbeaudrea...@hubspot.com.invalid> wrote:
>
> > Hey all,
> >
> > We run a bunch of big hbase clusters that get used by hundreds of product
> > teams for a variety of real-time workloads. We are a B2B company, so most
> > data has a customerId somewhere in the rowkey. As the team that owns the
> > hbase infrastructure, we try to help product teams properly design
> schemas
> > to avoid hotspotting, but inevitably it happens. It may not necessarily
> > just be hotspotting, but for example request volume may not be evenly
> > distributed across all regions of a table.
> >
> > This hotspotting/distribution issue makes it hard for the balancer to
> keep
> > the cluster balanced from a load perspective -- sure, all RS have the
> same
> > number of regions, but those regions are not all created equal from a
> load
> > perspective. This results in cases where one RS might be consistently at
> > 70% cpu, another might be at 30%, and all the rest are in a band
> > in-between.
> >
> > We already have a normalizer job which works similarly to the
> > SimpleRegionNormalizer -- keeping regions approximately the same size
> from
> > a data size perspective. I'm considering updating our normalizer to also
> > take into account region load.
> >
> > My general plan is to follow a similar strategy to the balancer -- keep a
> > configurable number of RegionLoad objects in memory per-region, and
> extract
> > averages for readRequestsCount from those. If a region's average load is
> >
> > some threshold relative to other regions in the same table, split it. If
> > it's < some threshold relative to other regions in the same table, merge
> > it.
> >
> > I'm writing because I'm wondering if anyone else has had this problem and
> > if there exists prior art here. Is there a reason HBase does not provide
> a
> > configurable load-based normalizer (beyond typical OSS reasons -- no one
> > contributed it yet)?
> >
> > Thanks!
> >
>

Reply via email to