Re: HotSpot detection/mitigation worker?

2021-05-19 Thread Mallikarjun
Agree. It solves the problem in terms of randomizing distribution, but does not increase cardinality. On Tue, May 18, 2021 at 6:29 PM Bryan Beaudreault wrote: > Hi Mallikarjun, thanks for the response. > > I agree that it is hard to fully mitigate a bad rowkey design. We do make > pretty heavy

Re: HotSpot detection/mitigation worker?

2021-05-18 Thread Bryan Beaudreault
Hi Mallikarjun, thanks for the response. I agree that it is hard to fully mitigate a bad rowkey design. We do make pretty heavy use of hash prefixes, and we don't really have many examples of the common problem you describe where the "latest" data is in 1-2 regions. Our distribution issues

Re: HotSpot detection/mitigation worker?

2021-05-17 Thread Mallikarjun
I think, no matter how good a balancer cost function be written, it cannot cover for a not so optimal row key design. Say for example, you have 10 regionservers, 100 regions and your application is heavy on the latest data which is mostly 1 or 2 regions, how many ever splits and/or merges it

HotSpot detection/mitigation worker?

2021-05-17 Thread Bryan Beaudreault
Hey all, We run a bunch of big hbase clusters that get used by hundreds of product teams for a variety of real-time workloads. We are a B2B company, so most data has a customerId somewhere in the rowkey. As the team that owns the hbase infrastructure, we try to help product teams properly design