[jira] [Assigned] (HBASE-28453) Support a middle ground between the Average and Fixed interval rate limiters

Ray Mattingly (Jira) Thu, 21 Mar 2024 11:44:27 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-28453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ray Mattingly reassigned HBASE-28453:
-------------------------------------

    Assignee: Ray Mattingly

> Support a middle ground between the Average and Fixed interval rate limiters
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-28453
>                 URL: https://issues.apache.org/jira/browse/HBASE-28453
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.6.0
>            Reporter: Ray Mattingly
>            Assignee: Ray Mattingly
>            Priority: Major
>         Attachments: Screenshot 2024-03-21 at 2.08.51 PM.png, Screenshot 
> 2024-03-21 at 2.30.01 PM.png
>
>
> h3. Background
> HBase quotas support two rate limiters: a "fixed" and an "average" interval 
> rate limiter.
> h4. FixedIntervalRateLimiter
> The fixed interval rate limiter is simpler: it has a TimeUnit, say 1 second, 
> and it refills a resource allotment on the recurring interval. So you may get 
> 10 resources every second, and if you exhaust all 10 resources in the first 
> millisecond of an interval then you will need to wait 999ms to acquire even 1 
> more resource.
> h4. AverageIntervalRateLimiter
> The average interval rate limiter, HBase's default, allows for more flexibly 
> timed refilling of the resource allotment. Extending our previous example, 
> say you have a 10 reads/sec quota and you have exhausted all 10 resources 
> within 1ms of the last full refill. If you request 1 more read then, rather 
> than returning a 999ms wait interval indicating the next full refill time, 
> the rate limiter will recognize that you only need to wait 99ms before 1 read 
> can be available. After 100ms has passed in aggregate since the last full 
> refill, it will support the refilling of 1/10th the limit to facilitate the 
> request for 1/10th the resources.
> h3. The Problems with Current RateLimiters
> The problem with the fixed interval rate limiter is that it is too strict 
> from a latency perspective. It results in quota limits to which we cannot 
> fully subscribe with any consistency.
> The problem with the average interval rate limiter is that, in practice, it 
> is far too optimistic. For example, a real rate limiter might limit to 
> 100MB/sec of read IO per machine. Any multigets that come in will require 
> only a tiny fraction of this limit; for example, a 64kb block is only 0.06% 
> of the total. As a result, the vast majority of wait intervals end up being 
> tiny — like <5ms. This can actually cause an inverse of your intention, where 
> setting up a throttle causes a DDOS of your RPC layer via continuous 
> throttling and ~immediate retrying. I've discussed this problem in 
> https://issues.apache.org/jira/browse/HBASE-28429 and proposed a minimum wait 
> interval as the solution there; after some more thinking, I believe this new 
> rate limiter would be a less hacky solution to this deficit so I'd like to 
> close that Jira in favor of this one.
> See the attached chart where I put in place a 10k req/sec/machine throttle 
> for this user at 10:43 to try to curb this high traffic, and it resulted in a 
> huge spike of req/sec due to the throttle/retry loop created by the 
> AverageIntervalRateLimiter.
> h3. PartialIntervalRateLimiter as a Solution
> I've implemented a RateLimiter which allows for partial chunks of the overall 
> interval to be refilled, by default these chunks are 10% (or 100ms of a 1s 
> interval). I've deployed this to a test cluster at my day job and have seen 
> this really help our ability to full subscribe to a quota limit without 
> executing superfluous retries. See the other attached chart which shows a 
> cluster undergoing a rolling restart from using FixedIntervalRateLimiter to 
> my new PartialIntervalRateLimiter and how it is then able to fully subscribe 
> to its allotted 25MB/sec/machine read IO quota.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (HBASE-28453) Support a middle ground between the Average and Fixed interval rate limiters

Reply via email to