Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-05-06 Thread Jaydeep Chovatia
Sure, Caleb. I will include the work as part of CASSANDRA-19534
 in the CEP-41.

Jaydeep

On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe 
wrote:

> FYI, there is some ongoing sort-of-related work going on in
> CASSANDRA-19534 
>
> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Just created an official CEP-41
>> 
>> incorporating the feedback from this discussion. Feel free to let me know
>> if I may have missed some important feedback in this thread that is not
>> captured in the CEP-41.
>>
>> Jaydeep
>>
>> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> Thanks, Josh. I will file an official CEP with all the details in a few
>>> days and update this thread with that CEP number.
>>> Thanks a lot everyone for providing valuable insights!
>>>
>>> Jaydeep
>>>
>>> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie 
>>> wrote:
>>>
 Do folks think we should file an official CEP and take it there?

 +1 here.

 Synthesizing your gdoc, Caleb's work, and the feedback from this thread
 into a draft seems like a solid next step.

 On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:

 I see a lot of great ideas being discussed or proposed in the past to
 cover the most common rate limiter candidate use cases. Do folks think we
 should file an official CEP and take it there?

 Jaydeep

 On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe <
 calebrackli...@gmail.com> wrote:

 I just remembered the other day that I had done a quick writeup on the
 state of compaction stress-related throttling in the project:


 https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing

 I'm sure most of it is old news to the people on this thread, but I
 figured I'd post it just in case :)

 On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
 wrote:


 2.) We should make sure the links between the "known" root causes of
 cascading failures and the mechanisms we introduce to avoid them remain
 very strong.

 Seems to me that our historical strategy was to address individual
 known cases one-by-one rather than looking for a more holistic
 load-balancing and load-shedding solution. While the engineer in me likes
 the elegance of a broad, more-inclusive *actual SEDA-like* approach,
 the pragmatist in me wonders how far we think we are today from a stable
 set-point.

 i.e. are we facing a handful of cases where nodes can still get pushed
 over and then cascade that we can surgically address, or are we facing a
 broader lack of back-pressure that rears its head in different domains
 (client -> coordinator, coordinator -> replica, internode with other
 operations, etc) at surprising times and should be considered more
 holistically?

 On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:

 I almost forgot CASSANDRA-15817, which introduced
 reject_repair_compaction_threshold, which provides a mechanism to stop
 repairs while compaction is underwater.

 On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
 wrote:

 
 Hey all,

 I'm a bit late to the discussion. I see that we've already discussed
 CASSANDRA-15013 
  and CASSANDRA-16663
  at least in
 passing. Having written the latter, I'd be the first to admit it's a crude
 tool, although it's been useful here and there, and provides a couple
 primitives that may be useful for future work. As Scott mentions, while it
 is configurable at runtime, it is not adaptive, although we did
 make configuration easier in CASSANDRA-17423
 . It also is
 global to the node, although we've lightly discussed some ideas around
 making it more granular. (For example, keyspace-based limiting, or limiting
 "domains" tagged by the client in requests, could be interesting.) It also
 does not deal with inter-node traffic, of course.

 Something we've not yet mentioned (that does address internode traffic)
 is CASSANDRA-17324
 , which I
 proposed shortly after working on the native request limiter (and have just
 not had much time to return to). The basic idea is this:

 When a node is struggling under the weight of a compaction backlog and
 becomes a cause of increased read latency for clients, we have two safety
 valves:


Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-05-03 Thread Caleb Rackliffe
FYI, there is some ongoing sort-of-related work going on in CASSANDRA-19534


On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia 
wrote:

> Just created an official CEP-41
> 
> incorporating the feedback from this discussion. Feel free to let me know
> if I may have missed some important feedback in this thread that is not
> captured in the CEP-41.
>
> Jaydeep
>
> On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Thanks, Josh. I will file an official CEP with all the details in a few
>> days and update this thread with that CEP number.
>> Thanks a lot everyone for providing valuable insights!
>>
>> Jaydeep
>>
>> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie 
>> wrote:
>>
>>> Do folks think we should file an official CEP and take it there?
>>>
>>> +1 here.
>>>
>>> Synthesizing your gdoc, Caleb's work, and the feedback from this thread
>>> into a draft seems like a solid next step.
>>>
>>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>>>
>>> I see a lot of great ideas being discussed or proposed in the past to
>>> cover the most common rate limiter candidate use cases. Do folks think we
>>> should file an official CEP and take it there?
>>>
>>> Jaydeep
>>>
>>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
>>> wrote:
>>>
>>> I just remembered the other day that I had done a quick writeup on the
>>> state of compaction stress-related throttling in the project:
>>>
>>>
>>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>>>
>>> I'm sure most of it is old news to the people on this thread, but I
>>> figured I'd post it just in case :)
>>>
>>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
>>> wrote:
>>>
>>>
>>> 2.) We should make sure the links between the "known" root causes of
>>> cascading failures and the mechanisms we introduce to avoid them remain
>>> very strong.
>>>
>>> Seems to me that our historical strategy was to address individual known
>>> cases one-by-one rather than looking for a more holistic load-balancing and
>>> load-shedding solution. While the engineer in me likes the elegance of a
>>> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
>>> wonders how far we think we are today from a stable set-point.
>>>
>>> i.e. are we facing a handful of cases where nodes can still get pushed
>>> over and then cascade that we can surgically address, or are we facing a
>>> broader lack of back-pressure that rears its head in different domains
>>> (client -> coordinator, coordinator -> replica, internode with other
>>> operations, etc) at surprising times and should be considered more
>>> holistically?
>>>
>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>>
>>> I almost forgot CASSANDRA-15817, which introduced
>>> reject_repair_compaction_threshold, which provides a mechanism to stop
>>> repairs while compaction is underwater.
>>>
>>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
>>> wrote:
>>>
>>> 
>>> Hey all,
>>>
>>> I'm a bit late to the discussion. I see that we've already discussed
>>> CASSANDRA-15013 
>>>  and CASSANDRA-16663
>>>  at least in
>>> passing. Having written the latter, I'd be the first to admit it's a crude
>>> tool, although it's been useful here and there, and provides a couple
>>> primitives that may be useful for future work. As Scott mentions, while it
>>> is configurable at runtime, it is not adaptive, although we did
>>> make configuration easier in CASSANDRA-17423
>>> . It also is
>>> global to the node, although we've lightly discussed some ideas around
>>> making it more granular. (For example, keyspace-based limiting, or limiting
>>> "domains" tagged by the client in requests, could be interesting.) It also
>>> does not deal with inter-node traffic, of course.
>>>
>>> Something we've not yet mentioned (that does address internode traffic)
>>> is CASSANDRA-17324
>>> , which I
>>> proposed shortly after working on the native request limiter (and have just
>>> not had much time to return to). The basic idea is this:
>>>
>>> When a node is struggling under the weight of a compaction backlog and
>>> becomes a cause of increased read latency for clients, we have two safety
>>> valves:
>>>
>>>
>>> 1.) Disabling the native protocol server, which stops the node from
>>> coordinating reads and writes.
>>> 2.) Jacking up the severity on the node, which tells the dynamic snitch
>>> to avoid the node for reads from other coordinators.
>>>
>>>
>>> These are useful, but we don’t appear to have any mechanism that would
>>> allow us to temporarily reject internode hint, 

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-04-10 Thread Jaydeep Chovatia
Just created an official CEP-41

incorporating the feedback from this discussion. Feel free to let me know
if I may have missed some important feedback in this thread that is not
captured in the CEP-41.

Jaydeep

On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> Thanks, Josh. I will file an official CEP with all the details in a few
> days and update this thread with that CEP number.
> Thanks a lot everyone for providing valuable insights!
>
> Jaydeep
>
> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie 
> wrote:
>
>> Do folks think we should file an official CEP and take it there?
>>
>> +1 here.
>>
>> Synthesizing your gdoc, Caleb's work, and the feedback from this thread
>> into a draft seems like a solid next step.
>>
>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>>
>> I see a lot of great ideas being discussed or proposed in the past to
>> cover the most common rate limiter candidate use cases. Do folks think we
>> should file an official CEP and take it there?
>>
>> Jaydeep
>>
>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
>> wrote:
>>
>> I just remembered the other day that I had done a quick writeup on the
>> state of compaction stress-related throttling in the project:
>>
>>
>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>>
>> I'm sure most of it is old news to the people on this thread, but I
>> figured I'd post it just in case :)
>>
>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
>> wrote:
>>
>>
>> 2.) We should make sure the links between the "known" root causes of
>> cascading failures and the mechanisms we introduce to avoid them remain
>> very strong.
>>
>> Seems to me that our historical strategy was to address individual known
>> cases one-by-one rather than looking for a more holistic load-balancing and
>> load-shedding solution. While the engineer in me likes the elegance of a
>> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
>> wonders how far we think we are today from a stable set-point.
>>
>> i.e. are we facing a handful of cases where nodes can still get pushed
>> over and then cascade that we can surgically address, or are we facing a
>> broader lack of back-pressure that rears its head in different domains
>> (client -> coordinator, coordinator -> replica, internode with other
>> operations, etc) at surprising times and should be considered more
>> holistically?
>>
>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>
>> I almost forgot CASSANDRA-15817, which introduced
>> reject_repair_compaction_threshold, which provides a mechanism to stop
>> repairs while compaction is underwater.
>>
>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
>> wrote:
>>
>> 
>> Hey all,
>>
>> I'm a bit late to the discussion. I see that we've already discussed
>> CASSANDRA-15013 
>>  and CASSANDRA-16663
>>  at least in
>> passing. Having written the latter, I'd be the first to admit it's a crude
>> tool, although it's been useful here and there, and provides a couple
>> primitives that may be useful for future work. As Scott mentions, while it
>> is configurable at runtime, it is not adaptive, although we did
>> make configuration easier in CASSANDRA-17423
>> . It also is
>> global to the node, although we've lightly discussed some ideas around
>> making it more granular. (For example, keyspace-based limiting, or limiting
>> "domains" tagged by the client in requests, could be interesting.) It also
>> does not deal with inter-node traffic, of course.
>>
>> Something we've not yet mentioned (that does address internode traffic)
>> is CASSANDRA-17324
>> , which I
>> proposed shortly after working on the native request limiter (and have just
>> not had much time to return to). The basic idea is this:
>>
>> When a node is struggling under the weight of a compaction backlog and
>> becomes a cause of increased read latency for clients, we have two safety
>> valves:
>>
>>
>> 1.) Disabling the native protocol server, which stops the node from
>> coordinating reads and writes.
>> 2.) Jacking up the severity on the node, which tells the dynamic snitch
>> to avoid the node for reads from other coordinators.
>>
>>
>> These are useful, but we don’t appear to have any mechanism that would
>> allow us to temporarily reject internode hint, batch, and mutation messages
>> that could further delay resolution of the compaction backlog.
>>
>>
>> Whether it's done as part of a larger framework or on its own, it still
>> feels like a good idea.
>>
>> Thinking in terms of opportunity costs here (i.e. where we spend our
>> finite engineering time to 

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-22 Thread Jaydeep Chovatia
Thanks, Josh. I will file an official CEP with all the details in a few
days and update this thread with that CEP number.
Thanks a lot everyone for providing valuable insights!

Jaydeep

On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie  wrote:

> Do folks think we should file an official CEP and take it there?
>
> +1 here.
>
> Synthesizing your gdoc, Caleb's work, and the feedback from this thread
> into a draft seems like a solid next step.
>
> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>
> I see a lot of great ideas being discussed or proposed in the past to
> cover the most common rate limiter candidate use cases. Do folks think we
> should file an official CEP and take it there?
>
> Jaydeep
>
> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
> wrote:
>
> I just remembered the other day that I had done a quick writeup on the
> state of compaction stress-related throttling in the project:
>
>
> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>
> I'm sure most of it is old news to the people on this thread, but I
> figured I'd post it just in case :)
>
> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
> wrote:
>
>
> 2.) We should make sure the links between the "known" root causes of
> cascading failures and the mechanisms we introduce to avoid them remain
> very strong.
>
> Seems to me that our historical strategy was to address individual known
> cases one-by-one rather than looking for a more holistic load-balancing and
> load-shedding solution. While the engineer in me likes the elegance of a
> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
> wonders how far we think we are today from a stable set-point.
>
> i.e. are we facing a handful of cases where nodes can still get pushed
> over and then cascade that we can surgically address, or are we facing a
> broader lack of back-pressure that rears its head in different domains
> (client -> coordinator, coordinator -> replica, internode with other
> operations, etc) at surprising times and should be considered more
> holistically?
>
> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>
> I almost forgot CASSANDRA-15817, which introduced
> reject_repair_compaction_threshold, which provides a mechanism to stop
> repairs while compaction is underwater.
>
> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
> wrote:
>
> 
> Hey all,
>
> I'm a bit late to the discussion. I see that we've already discussed
> CASSANDRA-15013 
>  and CASSANDRA-16663
>  at least in
> passing. Having written the latter, I'd be the first to admit it's a crude
> tool, although it's been useful here and there, and provides a couple
> primitives that may be useful for future work. As Scott mentions, while it
> is configurable at runtime, it is not adaptive, although we did
> make configuration easier in CASSANDRA-17423
> . It also is
> global to the node, although we've lightly discussed some ideas around
> making it more granular. (For example, keyspace-based limiting, or limiting
> "domains" tagged by the client in requests, could be interesting.) It also
> does not deal with inter-node traffic, of course.
>
> Something we've not yet mentioned (that does address internode traffic) is
> CASSANDRA-17324 ,
> which I proposed shortly after working on the native request limiter (and
> have just not had much time to return to). The basic idea is this:
>
> When a node is struggling under the weight of a compaction backlog and
> becomes a cause of increased read latency for clients, we have two safety
> valves:
>
>
> 1.) Disabling the native protocol server, which stops the node from
> coordinating reads and writes.
> 2.) Jacking up the severity on the node, which tells the dynamic snitch to
> avoid the node for reads from other coordinators.
>
>
> These are useful, but we don’t appear to have any mechanism that would
> allow us to temporarily reject internode hint, batch, and mutation messages
> that could further delay resolution of the compaction backlog.
>
>
> Whether it's done as part of a larger framework or on its own, it still
> feels like a good idea.
>
> Thinking in terms of opportunity costs here (i.e. where we spend our
> finite engineering time to holistically improve the experience of operating
> this database) is healthy, but we probably haven't reached the point of
> diminishing returns on nodes being able to protect themselves from clients
> and from other nodes. I would just keep in mind two things:
>
> 1.) The effectiveness of rate-limiting in the system (which includes the
> database and all clients) as a whole necessarily decreases as we move from
> the application to the lowest-level database internals. Limiting correctly
> at the client will save more resources than 

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-22 Thread Josh McKenzie
> Do folks think we should file an official CEP and take it there?
+1 here.

Synthesizing your gdoc, Caleb's work, and the feedback from this thread into a 
draft seems like a solid next step.

On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
> I see a lot of great ideas being discussed or proposed in the past to cover 
> the most common rate limiter candidate use cases. Do folks think we should 
> file an official CEP and take it there?
> 
> Jaydeep
> 
> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe  
> wrote:
>> I just remembered the other day that I had done a quick writeup on the state 
>> of compaction stress-related throttling in the project:
>> 
>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>> 
>> I'm sure most of it is old news to the people on this thread, but I figured 
>> I'd post it just in case :)
>> 
>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie  wrote:
>>> __
 2.) We should make sure the links between the "known" root causes of 
 cascading failures and the mechanisms we introduce to avoid them remain 
 very strong.
>>> Seems to me that our historical strategy was to address individual known 
>>> cases one-by-one rather than looking for a more holistic load-balancing and 
>>> load-shedding solution. While the engineer in me likes the elegance of a 
>>> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me 
>>> wonders how far we think we are today from a stable set-point.
>>> 
>>> i.e. are we facing a handful of cases where nodes can still get pushed over 
>>> and then cascade that we can surgically address, or are we facing a broader 
>>> lack of back-pressure that rears its head in different domains (client -> 
>>> coordinator, coordinator -> replica, internode with other operations, etc) 
>>> at surprising times and should be considered more holistically?
>>> 
>>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
 I almost forgot CASSANDRA-15817, which introduced 
 reject_repair_compaction_threshold, which provides a mechanism to stop 
 repairs while compaction is underwater.
 
> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe  
> wrote:
> 
> Hey all,
> 
> I'm a bit late to the discussion. I see that we've already discussed 
> CASSANDRA-15013  
> and CASSANDRA-16663 
>  at least in 
> passing. Having written the latter, I'd be the first to admit it's a 
> crude tool, although it's been useful here and there, and provides a 
> couple primitives that may be useful for future work. As Scott mentions, 
> while it is configurable at runtime, it is not adaptive, although we did 
> make configuration easier in CASSANDRA-17423 
> . It also is 
> global to the node, although we've lightly discussed some ideas around 
> making it more granular. (For example, keyspace-based limiting, or 
> limiting "domains" tagged by the client in requests, could be 
> interesting.) It also does not deal with inter-node traffic, of course.
> 
> Something we've not yet mentioned (that does address internode traffic) 
> is CASSANDRA-17324 
> , which I proposed 
> shortly after working on the native request limiter (and have just not 
> had much time to return to). The basic idea is this:
> 
>> When a node is struggling under the weight of a compaction backlog and 
>> becomes a cause of increased read latency for clients, we have two 
>> safety valves:
>> 
>> 
>> 
>> 1.) Disabling the native protocol server, which stops the node from 
>> coordinating reads and writes.
>> 2.) Jacking up the severity on the node, which tells the dynamic snitch 
>> to avoid the node for reads from other coordinators.
>> 
>> 
>> These are useful, but we don’t appear to have any mechanism that would 
>> allow us to temporarily reject internode hint, batch, and mutation 
>> messages that could further delay resolution of the compaction backlog.
>> 
> 
> Whether it's done as part of a larger framework or on its own, it still 
> feels like a good idea.
> 
> Thinking in terms of opportunity costs here (i.e. where we spend our 
> finite engineering time to holistically improve the experience of 
> operating this database) is healthy, but we probably haven't reached the 
> point of diminishing returns on nodes being able to protect themselves 
> from clients and from other nodes. I would just keep in mind two things:
> 
> 1.) The effectiveness of rate-limiting in the system (which includes the 
> database and all clients) as a whole necessarily decreases as we move 
> from the application to the 

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-07 Thread Jaydeep Chovatia
I see a lot of great ideas being discussed or proposed in the past to cover
the most common rate limiter candidate use cases. Do folks think we should
file an official CEP and take it there?

Jaydeep

On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
wrote:

> I just remembered the other day that I had done a quick writeup on the
> state of compaction stress-related throttling in the project:
>
>
> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>
> I'm sure most of it is old news to the people on this thread, but I
> figured I'd post it just in case :)
>
> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
> wrote:
>
>> 2.) We should make sure the links between the "known" root causes of
>> cascading failures and the mechanisms we introduce to avoid them remain
>> very strong.
>>
>> Seems to me that our historical strategy was to address individual known
>> cases one-by-one rather than looking for a more holistic load-balancing and
>> load-shedding solution. While the engineer in me likes the elegance of a
>> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
>> wonders how far we think we are today from a stable set-point.
>>
>> i.e. are we facing a handful of cases where nodes can still get pushed
>> over and then cascade that we can surgically address, or are we facing a
>> broader lack of back-pressure that rears its head in different domains
>> (client -> coordinator, coordinator -> replica, internode with other
>> operations, etc) at surprising times and should be considered more
>> holistically?
>>
>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>
>> I almost forgot CASSANDRA-15817, which introduced
>> reject_repair_compaction_threshold, which provides a mechanism to stop
>> repairs while compaction is underwater.
>>
>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
>> wrote:
>>
>> 
>> Hey all,
>>
>> I'm a bit late to the discussion. I see that we've already discussed
>> CASSANDRA-15013 
>>  and CASSANDRA-16663
>>  at least in
>> passing. Having written the latter, I'd be the first to admit it's a crude
>> tool, although it's been useful here and there, and provides a couple
>> primitives that may be useful for future work. As Scott mentions, while it
>> is configurable at runtime, it is not adaptive, although we did
>> make configuration easier in CASSANDRA-17423
>> . It also is
>> global to the node, although we've lightly discussed some ideas around
>> making it more granular. (For example, keyspace-based limiting, or limiting
>> "domains" tagged by the client in requests, could be interesting.) It also
>> does not deal with inter-node traffic, of course.
>>
>> Something we've not yet mentioned (that does address internode traffic)
>> is CASSANDRA-17324
>> , which I
>> proposed shortly after working on the native request limiter (and have just
>> not had much time to return to). The basic idea is this:
>>
>> When a node is struggling under the weight of a compaction backlog and
>> becomes a cause of increased read latency for clients, we have two safety
>> valves:
>>
>> 1.) Disabling the native protocol server, which stops the node from
>> coordinating reads and writes.
>> 2.) Jacking up the severity on the node, which tells the dynamic snitch
>> to avoid the node for reads from other coordinators.
>>
>> These are useful, but we don’t appear to have any mechanism that would
>> allow us to temporarily reject internode hint, batch, and mutation messages
>> that could further delay resolution of the compaction backlog.
>>
>>
>> Whether it's done as part of a larger framework or on its own, it still
>> feels like a good idea.
>>
>> Thinking in terms of opportunity costs here (i.e. where we spend our
>> finite engineering time to holistically improve the experience of operating
>> this database) is healthy, but we probably haven't reached the point of
>> diminishing returns on nodes being able to protect themselves from clients
>> and from other nodes. I would just keep in mind two things:
>>
>> 1.) The effectiveness of rate-limiting in the system (which includes the
>> database and all clients) as a whole necessarily decreases as we move from
>> the application to the lowest-level database internals. Limiting correctly
>> at the client will save more resources than limiting at the native protocol
>> server, and limiting correctly at the native protocol server will save more
>> resources than limiting after we've dispatched requests to some thread pool
>> for processing.
>> 2.) We should make sure the links between the "known" root causes of
>> cascading failures and the mechanisms we introduce to avoid them remain
>> very strong.
>>
>> In any case, I'd be happy to help out in any way I can as this moves

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-02 Thread Caleb Rackliffe
I just remembered the other day that I had done a quick writeup on the
state of compaction stress-related throttling in the project:

https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing

I'm sure most of it is old news to the people on this thread, but I figured
I'd post it just in case :)

On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie  wrote:

> 2.) We should make sure the links between the "known" root causes of
> cascading failures and the mechanisms we introduce to avoid them remain
> very strong.
>
> Seems to me that our historical strategy was to address individual known
> cases one-by-one rather than looking for a more holistic load-balancing and
> load-shedding solution. While the engineer in me likes the elegance of a
> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
> wonders how far we think we are today from a stable set-point.
>
> i.e. are we facing a handful of cases where nodes can still get pushed
> over and then cascade that we can surgically address, or are we facing a
> broader lack of back-pressure that rears its head in different domains
> (client -> coordinator, coordinator -> replica, internode with other
> operations, etc) at surprising times and should be considered more
> holistically?
>
> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>
> I almost forgot CASSANDRA-15817, which introduced
> reject_repair_compaction_threshold, which provides a mechanism to stop
> repairs while compaction is underwater.
>
> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
> wrote:
>
> 
> Hey all,
>
> I'm a bit late to the discussion. I see that we've already discussed
> CASSANDRA-15013 
>  and CASSANDRA-16663
>  at least in
> passing. Having written the latter, I'd be the first to admit it's a crude
> tool, although it's been useful here and there, and provides a couple
> primitives that may be useful for future work. As Scott mentions, while it
> is configurable at runtime, it is not adaptive, although we did
> make configuration easier in CASSANDRA-17423
> . It also is
> global to the node, although we've lightly discussed some ideas around
> making it more granular. (For example, keyspace-based limiting, or limiting
> "domains" tagged by the client in requests, could be interesting.) It also
> does not deal with inter-node traffic, of course.
>
> Something we've not yet mentioned (that does address internode traffic) is
> CASSANDRA-17324 ,
> which I proposed shortly after working on the native request limiter (and
> have just not had much time to return to). The basic idea is this:
>
> When a node is struggling under the weight of a compaction backlog and
> becomes a cause of increased read latency for clients, we have two safety
> valves:
>
> 1.) Disabling the native protocol server, which stops the node from
> coordinating reads and writes.
> 2.) Jacking up the severity on the node, which tells the dynamic snitch to
> avoid the node for reads from other coordinators.
>
> These are useful, but we don’t appear to have any mechanism that would
> allow us to temporarily reject internode hint, batch, and mutation messages
> that could further delay resolution of the compaction backlog.
>
>
> Whether it's done as part of a larger framework or on its own, it still
> feels like a good idea.
>
> Thinking in terms of opportunity costs here (i.e. where we spend our
> finite engineering time to holistically improve the experience of operating
> this database) is healthy, but we probably haven't reached the point of
> diminishing returns on nodes being able to protect themselves from clients
> and from other nodes. I would just keep in mind two things:
>
> 1.) The effectiveness of rate-limiting in the system (which includes the
> database and all clients) as a whole necessarily decreases as we move from
> the application to the lowest-level database internals. Limiting correctly
> at the client will save more resources than limiting at the native protocol
> server, and limiting correctly at the native protocol server will save more
> resources than limiting after we've dispatched requests to some thread pool
> for processing.
> 2.) We should make sure the links between the "known" root causes of
> cascading failures and the mechanisms we introduce to avoid them remain
> very strong.
>
> In any case, I'd be happy to help out in any way I can as this moves
> forward (especially as it relates to our past/current attempts to address
> this problem space).
>
>
>


Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-30 Thread Josh McKenzie
> 2.) We should make sure the links between the "known" root causes of 
> cascading failures and the mechanisms we introduce to avoid them remain very 
> strong.
Seems to me that our historical strategy was to address individual known cases 
one-by-one rather than looking for a more holistic load-balancing and 
load-shedding solution. While the engineer in me likes the elegance of a broad, 
more-inclusive *actual SEDA-like* approach, the pragmatist in me wonders how 
far we think we are today from a stable set-point. 

i.e. are we facing a handful of cases where nodes can still get pushed over and 
then cascade that we can surgically address, or are we facing a broader lack of 
back-pressure that rears its head in different domains (client -> coordinator, 
coordinator -> replica, internode with other operations, etc) at surprising 
times and should be considered more holistically?

On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
> I almost forgot CASSANDRA-15817, which introduced 
> reject_repair_compaction_threshold, which provides a mechanism to stop 
> repairs while compaction is underwater.
> 
>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe  
>> wrote:
>> 
>> Hey all,
>> 
>> I'm a bit late to the discussion. I see that we've already discussed 
>> CASSANDRA-15013  and 
>> CASSANDRA-16663  at 
>> least in passing. Having written the latter, I'd be the first to admit it's 
>> a crude tool, although it's been useful here and there, and provides a 
>> couple primitives that may be useful for future work. As Scott mentions, 
>> while it is configurable at runtime, it is not adaptive, although we did 
>> make configuration easier in CASSANDRA-17423 
>> . It also is global 
>> to the node, although we've lightly discussed some ideas around making it 
>> more granular. (For example, keyspace-based limiting, or limiting "domains" 
>> tagged by the client in requests, could be interesting.) It also does not 
>> deal with inter-node traffic, of course.
>> 
>> Something we've not yet mentioned (that does address internode traffic) is 
>> CASSANDRA-17324 , 
>> which I proposed shortly after working on the native request limiter (and 
>> have just not had much time to return to). The basic idea is this:
>> 
>>> When a node is struggling under the weight of a compaction backlog and 
>>> becomes a cause of increased read latency for clients, we have two safety 
>>> valves:
>>> 
>>> 
>>> 1.) Disabling the native protocol server, which stops the node from 
>>> coordinating reads and writes.
>>> 2.) Jacking up the severity on the node, which tells the dynamic snitch to 
>>> avoid the node for reads from other coordinators.
>>> 
>>> These are useful, but we don’t appear to have any mechanism that would 
>>> allow us to temporarily reject internode hint, batch, and mutation messages 
>>> that could further delay resolution of the compaction backlog.
>>> 
>> 
>> Whether it's done as part of a larger framework or on its own, it still 
>> feels like a good idea.
>> 
>> Thinking in terms of opportunity costs here (i.e. where we spend our finite 
>> engineering time to holistically improve the experience of operating this 
>> database) is healthy, but we probably haven't reached the point of 
>> diminishing returns on nodes being able to protect themselves from clients 
>> and from other nodes. I would just keep in mind two things:
>> 
>> 1.) The effectiveness of rate-limiting in the system (which includes the 
>> database and all clients) as a whole necessarily decreases as we move from 
>> the application to the lowest-level database internals. Limiting correctly 
>> at the client will save more resources than limiting at the native protocol 
>> server, and limiting correctly at the native protocol server will save more 
>> resources than limiting after we've dispatched requests to some thread pool 
>> for processing.
>> 2.) We should make sure the links between the "known" root causes of 
>> cascading failures and the mechanisms we introduce to avoid them remain very 
>> strong.
>> 
>> In any case, I'd be happy to help out in any way I can as this moves forward 
>> (especially as it relates to our past/current attempts to address this 
>> problem space).


Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-29 Thread Caleb Rackliffe
I almost forgot CASSANDRA-15817, which introduced reject_repair_compaction_threshold, which provides a mechanism to stop repairs while compaction is underwater.On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe  wrote:Hey all,I'm a bit late to the discussion. I see that we've already discussed CASSANDRA-15013 and CASSANDRA-16663 at least in passing. Having written the latter, I'd be the first to admit it's a crude tool, although it's been useful here and there, and provides a couple primitives that may be useful for future work. As Scott mentions, while it is configurable at runtime, it is not adaptive, although we did make configuration easier in CASSANDRA-17423. It also is global to the node, although we've lightly discussed some ideas around making it more granular. (For example, keyspace-based limiting, or limiting "domains" tagged by the client in requests, could be interesting.) It also does not deal with inter-node traffic, of course.Something we've not yet mentioned (that does address internode traffic) is CASSANDRA-17324, which I proposed shortly after working on the native request limiter (and have just not had much time to return to). The basic idea is this:When a node is struggling under the weight of a compaction backlog and becomes a cause of increased read latency for clients, we have two safety valves:1.) Disabling the native protocol server, which stops the node from coordinating reads and writes.2.) Jacking up the severity on the node, which tells the dynamic snitch to avoid the node for reads from other coordinators.These are useful, but we don’t appear to have any mechanism that would allow us to temporarily reject internode hint, batch, and mutation messages that could further delay resolution of the compaction backlog.Whether it's done as part of a larger framework or on its own, it still feels like a good idea.Thinking in terms of opportunity costs here (i.e. where we spend our finite engineering time to holistically improve the experience of operating this database) is healthy, but we probably haven't reached the point of diminishing returns on nodes being able to protect themselves from clients and from other nodes. I would just keep in mind two things:1.) The effectiveness of rate-limiting in the system (which includes the database and all clients) as a whole necessarily decreases as we move from the application to the lowest-level database internals. Limiting correctly at the client will save more resources than limiting at the native protocol server, and limiting correctly at the native protocol server will save more resources than limiting after we've dispatched requests to some thread pool for processing.2.) We should make sure the links between the "known" root causes of cascading failures and the mechanisms we introduce to avoid them remain very strong.In any case, I'd be happy to help out in any way I can as this moves forward (especially as it relates to our past/current attempts to address this problem space).


Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-26 Thread Caleb Rackliffe
Hey all,

I'm a bit late to the discussion. I see that we've already discussed
CASSANDRA-15013  and
CASSANDRA-16663  at
least in passing. Having written the latter, I'd be the first to admit it's
a crude tool, although it's been useful here and there, and provides a
couple primitives that may be useful for future work. As Scott mentions,
while it is configurable at runtime, it is not adaptive, although we did
make configuration easier in CASSANDRA-17423
. It also is global
to the node, although we've lightly discussed some ideas around making it
more granular. (For example, keyspace-based limiting, or limiting "domains"
tagged by the client in requests, could be interesting.) It also does not
deal with inter-node traffic, of course.

Something we've not yet mentioned (that does address internode traffic) is
CASSANDRA-17324 ,
which I proposed shortly after working on the native request limiter (and
have just not had much time to return to). The basic idea is this:

When a node is struggling under the weight of a compaction backlog and
> becomes a cause of increased read latency for clients, we have two safety
> valves:
>
> 1.) Disabling the native protocol server, which stops the node from
> coordinating reads and writes.
> 2.) Jacking up the severity on the node, which tells the dynamic snitch to
> avoid the node for reads from other coordinators.
>
> These are useful, but we don’t appear to have any mechanism that would
> allow us to temporarily reject internode hint, batch, and mutation messages
> that could further delay resolution of the compaction backlog.
>

Whether it's done as part of a larger framework or on its own, it still
feels like a good idea.

Thinking in terms of opportunity costs here (i.e. where we spend our finite
engineering time to holistically improve the experience of operating this
database) is healthy, but we probably haven't reached the point of
diminishing returns on nodes being able to protect themselves from clients
and from other nodes. I would just keep in mind two things:

1.) The effectiveness of rate-limiting in the system (which includes the
database and all clients) as a whole necessarily decreases as we move from
the application to the lowest-level database internals. Limiting correctly
at the client will save more resources than limiting at the native protocol
server, and limiting correctly at the native protocol server will save more
resources than limiting after we've dispatched requests to some thread pool
for processing.
2.) We should make sure the links between the "known" root causes of
cascading failures and the mechanisms we introduce to avoid them remain
very strong.

In any case, I'd be happy to help out in any way I can as this moves
forward (especially as it relates to our past/current attempts to address
this problem space).


Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-22 Thread Jaydeep Chovatia
 they are all great
>>> systems and helping quite a lot. But they are still not relying on system
>>> health, etc. and also not in the generic coordinator/replication read/write
>>> path. The idea here is on the similar lines as the existing
>>> implementations, but making it a bit more generic and trying to cover as
>>> many paths as possible.
>>>
>>> German,
>>>
>>> Sure, let's first continue the discussions here. If it turns out that
>>> there is no widespread interest in the idea then we can do 1:1 and see how
>>> we can help each other on a private fork, etc.
>>>
>>> Jaydeep
>>>
>>> On Wed, Jan 17, 2024 at 7:57 AM German Eichberger via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
>>>> Jaydeep,
>>>>
>>>> I concur with Stefan that extensibility of this  should be a design
>>>> goal:
>>>>
>>>>- It should be easy to add additional metrics (e.g. write queue
>>>>depth) and decision logic
>>>>- There should be a way to interact with other systems to signal a
>>>>resource need  which then could kick off things like scaling
>>>>
>>>>
>>>> Super interested in this and we have been thinking about siimilar
>>>> things internally 
>>>>
>>>> Thanks,
>>>> German
>>>> --
>>>> *From:* Jaydeep Chovatia 
>>>> *Sent:* Tuesday, January 16, 2024 1:16 PM
>>>> *To:* dev@cassandra.apache.org 
>>>> *Subject:* [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in
>>>> Cassandra
>>>>
>>>> You don't often get email from chovatia.jayd...@gmail.com. Learn why
>>>> this is important <https://aka.ms/LearnAboutSenderIdentification>
>>>> Hi Stefan,
>>>>
>>>> Please find my response below:
>>>> 1) Currently, I am keeping the signals as interface, so one can
>>>> override with a different implementation, but a point noted that even the
>>>> interface APIs could be also made dynamic so one can define APIs and its
>>>> implementation, if they wish to override.
>>>> 2) I've not looked into that yet, but I will look into it and see if it
>>>> can be easily integrated into the Guardrails framework.
>>>> 3) On the server side, when the framework detects that a node is
>>>> overloaded, then it will throw *OverloadedException* back to the
>>>> client. Because if the node while busy continues to serve additional
>>>> requests, then it will slow down other peer nodes due to dependencies on
>>>> meeting the QUORUM, etc. In this, we are at least preventing server nodes
>>>> from melting down, and giving the control to the client via
>>>> *OverloadedException.* Now, it will be up to the client policy, if
>>>> client wishes to retry immediately on a different server node then
>>>> eventually that server node might be impacted, but if client wishes to do
>>>> exponential back off or throw exception back to the application then that
>>>> server node will not be impacted.
>>>>
>>>>
>>>> Jaydeep
>>>>
>>>> On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič <
>>>> stefan.mikloso...@gmail.com> wrote:
>>>>
>>>> Hi Jaydeep,
>>>>
>>>> That seems quite interesting. Couple points though:
>>>>
>>>> 1) It would be nice if there is a way to "subscribe" to decisions your
>>>> detection framework comes up with. Integration with e.g. diagnostics
>>>> subsystem would be beneficial. This should be pluggable - just coding up an
>>>> interface to dump / react on the decisions how I want. This might also act
>>>> as a notifier to other systems, e-mail, slack channels ...
>>>>
>>>> 2) Have you tried to incorporate this with the Guardrails framework? I
>>>> think that if something is detected to be throttled or rejected (e.g
>>>> writing to a table), there might be a guardrail which would be triggered
>>>> dynamically in runtime. Guardrails are useful as such but here we might
>>>> reuse them so we do not need to code it twice.
>>>>
>>>> 3) I am curious how complex this detection framework would be, it can
>>>> be complicated pretty fast I guess. What would be desirable is to act on it
>>>> in such a way that y

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-18 Thread Jon Haddad
for this to work optimally, b/c you might not want to wait
> for query timeouts before you throttle.  I think there's a lot of value in
> dynamically adaptive compaction, repair, and streaming since it would
> prioritize user requests, but again, if you're not willing to work on that,
> it's your call.
>
> Anyways - I like the idea of putting more safeguards in the database
> itself, we're fundamentally in agreement there.  I see a ton of value in
> having flexible rate limiters, whether it be per-table, keyspace, or
> user+table combination.  I'd also like to ensure the feature doesn't cause
> more disruptions than it solves, which I think would be the case from using
> CPU usage as a signal.
>
> Jon
>
>
> On Wed, Jan 17, 2024 at 10:26 AM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Jon,
>>
>> The major challenge with latency based rate limiters is that the latency
>> is subjective from one workload to another. As a result, in the proposal I
>> have described, the idea is to make decision on the following combinations:
>>
>>1. System parameters (such as CPU usage, etc.)
>>2. Cassandra thread pools health (are they dropping requests, etc.)
>>
>> And if these two are +ve then consider the server under pressure. And
>> once it is under the pressure, then shed the traffic from less aggressive
>> to more aggressive, etc. The idea is to prevent Cassandra server from
>> melting (by considering the above two signals to begin with and add any
>> more based on the learnings)
>>
>> Scott,
>>
>> Yes, I did look at some of the implementations, but they are all great
>> systems and helping quite a lot. But they are still not relying on system
>> health, etc. and also not in the generic coordinator/replication read/write
>> path. The idea here is on the similar lines as the existing
>> implementations, but making it a bit more generic and trying to cover as
>> many paths as possible.
>>
>> German,
>>
>> Sure, let's first continue the discussions here. If it turns out that
>> there is no widespread interest in the idea then we can do 1:1 and see how
>> we can help each other on a private fork, etc.
>>
>> Jaydeep
>>
>> On Wed, Jan 17, 2024 at 7:57 AM German Eichberger via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> Jaydeep,
>>>
>>> I concur with Stefan that extensibility of this  should be a design goal:
>>>
>>>    - It should be easy to add additional metrics (e.g. write queue
>>>depth) and decision logic
>>>- There should be a way to interact with other systems to signal a
>>>resource need  which then could kick off things like scaling
>>>
>>>
>>> Super interested in this and we have been thinking about siimilar things
>>> internally 
>>>
>>> Thanks,
>>> German
>>> --
>>> *From:* Jaydeep Chovatia 
>>> *Sent:* Tuesday, January 16, 2024 1:16 PM
>>> *To:* dev@cassandra.apache.org 
>>> *Subject:* [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in
>>> Cassandra
>>>
>>> You don't often get email from chovatia.jayd...@gmail.com. Learn why
>>> this is important <https://aka.ms/LearnAboutSenderIdentification>
>>> Hi Stefan,
>>>
>>> Please find my response below:
>>> 1) Currently, I am keeping the signals as interface, so one can override
>>> with a different implementation, but a point noted that even the interface
>>> APIs could be also made dynamic so one can define APIs and its
>>> implementation, if they wish to override.
>>> 2) I've not looked into that yet, but I will look into it and see if it
>>> can be easily integrated into the Guardrails framework.
>>> 3) On the server side, when the framework detects that a node is
>>> overloaded, then it will throw *OverloadedException* back to the
>>> client. Because if the node while busy continues to serve additional
>>> requests, then it will slow down other peer nodes due to dependencies on
>>> meeting the QUORUM, etc. In this, we are at least preventing server nodes
>>> from melting down, and giving the control to the client via
>>> *OverloadedException.* Now, it will be up to the client policy, if
>>> client wishes to retry immediately on a different server node then
>>> eventually that server node might be impacted, but if client wishes to do
>>> exponential back off or throw exception back to the application then that
>>> server nod

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-18 Thread Jeff Jirsa
server under pressure. And once it is under the pressure, then shed the traffic from less aggressive to more aggressive, etc. The idea is to prevent Cassandra server from melting (by considering the above two signals to begin with and add any more based on the learnings)Scott,Yes, I did look at some of the implementations, but they are all great systems and helping quite a lot. But they are still not relying on system health, etc. and also not in the generic coordinator/replication read/write path. The idea here is on the similar lines as the existing implementations, but making it a bit more generic and trying to cover as many paths as possible.German,Sure, let's first continue the discussions here. If it turns out that there is no widespread interest in the idea then we can do 1:1 and see how we can help each other on a private fork, etc.JaydeepOn Wed, Jan 17, 2024 at 7:57 AM German Eichberger via dev <dev@cassandra.apache.org> wrote:






Jaydeep,




I concur with Stefan that extensibility of this  should be a design goal:


It should be easy to add additional metrics (e.g. write queue depth) and decision logic
There should be a way to interact with other systems to signal a resource need  which then could kick off things
 like scaling


Super interested in this and we have been thinking about siimilar things internally
 


Thanks,
German


From: Jaydeep Chovatia <chovatia.jayd...@gmail.com>
Sent: Tuesday, January 16, 2024 1:16 PM
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra
 








You don't often get email from chovatia.jayd...@gmail.com. 
Learn why this is important








Hi Stefan,


Please find my response below:
1) Currently, I am keeping the signals as interface, so one can override with a different implementation, but a point noted that even the interface APIs could be also made dynamic so one can define APIs and its implementation, if they wish to override. 
2) I've not looked into that yet, but I will look into it and see if it can be easily integrated into the Guardrails framework.
3) On the server side, when the framework detects that a node is overloaded, then it will throw
OverloadedException back to the client. Because if the node while busy continues to serve additional requests, then it will slow down other peer nodes due to dependencies on meeting the QUORUM, etc. In this, we are at least preventing server nodes from
 melting down, and giving the control to the client via OverloadedException. Now, it will be up to the client policy, if client wishes to retry immediately on a different server node then eventually that server node might be impacted, but if client wishes
 to do exponential back off or throw exception back to the application then that server node will not be impacted.




Jaydeep




On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič <stefan.mikloso...@gmail.com> wrote:


Hi Jaydeep,

That seems quite interesting. Couple points though:

1) It would be nice if there is a way to "subscribe" to decisions your detection framework comes up with. Integration with e.g. diagnostics subsystem would be beneficial. This should be pluggable - just coding up an interface to dump / react on the decisions
 how I want. This might also act as a notifier to other systems, e-mail, slack channels ...


2) Have you tried to incorporate this with the Guardrails framework? I think that if something is detected to be throttled or rejected (e.g writing to a table), there might be a guardrail which would be triggered dynamically in runtime. Guardrails are
 useful as such but here we might reuse them so we do not need to code it twice.   

3) I am curious how complex this detection framework would be, it can be complicated pretty fast I guess. What would be desirable is to act on it in such a way that you will not put that node under even more pressure. In other words, your detection system should
 work in such a way that there will not be any "doom loop" whereby mere throttling of various parts of Cassandra you make it even worse for other nodes in the cluster. For example, if a particular node starts to be overwhelmed and you detect this and requests
 start to be rejected, is it not possible that Java driver would start to see this node as "erroneous" with delayed response time etc and it would start to prefer other nodes in the cluster when deciding what node to contact for query coordination? So you would
 put more load on other nodes, making them more susceptible to be throttled as well ...


Regards


Stefan Miklosovic




On Tue, Jan 16, 2024 at 6:41 PM Jaydeep Chovatia <chovatia.jayd...@gmail.com> wrote:




Hi,


Happy New Year!


I would like to discuss the following idea:


Open-source Cassandra (CASSANDRA-15013)
 has an elementary built-in memory rate limiter based on the incoming payload from user requests. This rate limiter activates if

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-18 Thread Jon Haddad
ntations, but they are all great
> systems and helping quite a lot. But they are still not relying on system
> health, etc. and also not in the generic coordinator/replication read/write
> path. The idea here is on the similar lines as the existing
> implementations, but making it a bit more generic and trying to cover as
> many paths as possible.
>
> German,
>
> Sure, let's first continue the discussions here. If it turns out that
> there is no widespread interest in the idea then we can do 1:1 and see how
> we can help each other on a private fork, etc.
>
> Jaydeep
>
> On Wed, Jan 17, 2024 at 7:57 AM German Eichberger via dev <
> dev@cassandra.apache.org> wrote:
>
>> Jaydeep,
>>
>> I concur with Stefan that extensibility of this  should be a design goal:
>>
>>- It should be easy to add additional metrics (e.g. write queue
>>depth) and decision logic
>>- There should be a way to interact with other systems to signal a
>>resource need  which then could kick off things like scaling
>>
>>
>> Super interested in this and we have been thinking about siimilar things
>> internally 
>>
>> Thanks,
>> German
>> --
>> *From:* Jaydeep Chovatia 
>> *Sent:* Tuesday, January 16, 2024 1:16 PM
>> *To:* dev@cassandra.apache.org 
>> *Subject:* [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in
>> Cassandra
>>
>> You don't often get email from chovatia.jayd...@gmail.com. Learn why
>> this is important <https://aka.ms/LearnAboutSenderIdentification>
>> Hi Stefan,
>>
>> Please find my response below:
>> 1) Currently, I am keeping the signals as interface, so one can override
>> with a different implementation, but a point noted that even the interface
>> APIs could be also made dynamic so one can define APIs and its
>> implementation, if they wish to override.
>> 2) I've not looked into that yet, but I will look into it and see if it
>> can be easily integrated into the Guardrails framework.
>> 3) On the server side, when the framework detects that a node is
>> overloaded, then it will throw *OverloadedException* back to the client.
>> Because if the node while busy continues to serve additional requests, then
>> it will slow down other peer nodes due to dependencies on meeting the
>> QUORUM, etc. In this, we are at least preventing server nodes from melting
>> down, and giving the control to the client via *OverloadedException.*
>> Now, it will be up to the client policy, if client wishes to retry
>> immediately on a different server node then eventually that server node
>> might be impacted, but if client wishes to do exponential back off or throw
>> exception back to the application then that server node will not be
>> impacted.
>>
>>
>> Jaydeep
>>
>> On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>> Hi Jaydeep,
>>
>> That seems quite interesting. Couple points though:
>>
>> 1) It would be nice if there is a way to "subscribe" to decisions your
>> detection framework comes up with. Integration with e.g. diagnostics
>> subsystem would be beneficial. This should be pluggable - just coding up an
>> interface to dump / react on the decisions how I want. This might also act
>> as a notifier to other systems, e-mail, slack channels ...
>>
>> 2) Have you tried to incorporate this with the Guardrails framework? I
>> think that if something is detected to be throttled or rejected (e.g
>> writing to a table), there might be a guardrail which would be triggered
>> dynamically in runtime. Guardrails are useful as such but here we might
>> reuse them so we do not need to code it twice.
>>
>> 3) I am curious how complex this detection framework would be, it can be
>> complicated pretty fast I guess. What would be desirable is to act on it in
>> such a way that you will not put that node under even more pressure. In
>> other words, your detection system should work in such a way that there
>> will not be any "doom loop" whereby mere throttling of various parts of
>> Cassandra you make it even worse for other nodes in the cluster. For
>> example, if a particular node starts to be overwhelmed and you detect this
>> and requests start to be rejected, is it not possible that Java driver
>> would start to see this node as "erroneous" with delayed response time etc
>> and it would start to prefer other nodes in the cluster when deciding what
>> node to contact for query coordination? So you would put 

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-17 Thread Jaydeep Chovatia
Jon,

The major challenge with latency based rate limiters is that the latency is
subjective from one workload to another. As a result, in the proposal I
have described, the idea is to make decision on the following combinations:

   1. System parameters (such as CPU usage, etc.)
   2. Cassandra thread pools health (are they dropping requests, etc.)

And if these two are +ve then consider the server under pressure. And once
it is under the pressure, then shed the traffic from less aggressive to
more aggressive, etc. The idea is to prevent Cassandra server from melting
(by considering the above two signals to begin with and add any more based
on the learnings)

Scott,

Yes, I did look at some of the implementations, but they are all great
systems and helping quite a lot. But they are still not relying on system
health, etc. and also not in the generic coordinator/replication read/write
path. The idea here is on the similar lines as the existing
implementations, but making it a bit more generic and trying to cover as
many paths as possible.

German,

Sure, let's first continue the discussions here. If it turns out that there
is no widespread interest in the idea then we can do 1:1 and see how we can
help each other on a private fork, etc.

Jaydeep

On Wed, Jan 17, 2024 at 7:57 AM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Jaydeep,
>
> I concur with Stefan that extensibility of this  should be a design goal:
>
>- It should be easy to add additional metrics (e.g. write queue depth)
>and decision logic
>- There should be a way to interact with other systems to signal a
>resource need  which then could kick off things like scaling
>
>
> Super interested in this and we have been thinking about siimilar things
> internally 
>
> Thanks,
> German
> --
> *From:* Jaydeep Chovatia 
> *Sent:* Tuesday, January 16, 2024 1:16 PM
> *To:* dev@cassandra.apache.org 
> *Subject:* [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in
> Cassandra
>
> You don't often get email from chovatia.jayd...@gmail.com. Learn why this
> is important <https://aka.ms/LearnAboutSenderIdentification>
> Hi Stefan,
>
> Please find my response below:
> 1) Currently, I am keeping the signals as interface, so one can override
> with a different implementation, but a point noted that even the interface
> APIs could be also made dynamic so one can define APIs and its
> implementation, if they wish to override.
> 2) I've not looked into that yet, but I will look into it and see if it
> can be easily integrated into the Guardrails framework.
> 3) On the server side, when the framework detects that a node is
> overloaded, then it will throw *OverloadedException* back to the client.
> Because if the node while busy continues to serve additional requests, then
> it will slow down other peer nodes due to dependencies on meeting the
> QUORUM, etc. In this, we are at least preventing server nodes from melting
> down, and giving the control to the client via *OverloadedException.*
> Now, it will be up to the client policy, if client wishes to retry
> immediately on a different server node then eventually that server node
> might be impacted, but if client wishes to do exponential back off or throw
> exception back to the application then that server node will not be
> impacted.
>
>
> Jaydeep
>
> On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
> Hi Jaydeep,
>
> That seems quite interesting. Couple points though:
>
> 1) It would be nice if there is a way to "subscribe" to decisions your
> detection framework comes up with. Integration with e.g. diagnostics
> subsystem would be beneficial. This should be pluggable - just coding up an
> interface to dump / react on the decisions how I want. This might also act
> as a notifier to other systems, e-mail, slack channels ...
>
> 2) Have you tried to incorporate this with the Guardrails framework? I
> think that if something is detected to be throttled or rejected (e.g
> writing to a table), there might be a guardrail which would be triggered
> dynamically in runtime. Guardrails are useful as such but here we might
> reuse them so we do not need to code it twice.
>
> 3) I am curious how complex this detection framework would be, it can be
> complicated pretty fast I guess. What would be desirable is to act on it in
> such a way that you will not put that node under even more pressure. In
> other words, your detection system should work in such a way that there
> will not be any "doom loop" whereby mere throttling of various parts of
> Cassandra you make it even worse for other nodes in the cluster. For
> example, if a particular node starts to be overw

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-17 Thread German Eichberger via dev
Jaydeep,

I concur with Stefan that extensibility of this  should be a design goal:

  *   It should be easy to add additional metrics (e.g. write queue depth) and 
decision logic
  *   There should be a way to interact with other systems to signal a resource 
need  which then could kick off things like scaling

Super interested in this and we have been thinking about siimilar things 
internally 

Thanks,
German

From: Jaydeep Chovatia 
Sent: Tuesday, January 16, 2024 1:16 PM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

You don't often get email from chovatia.jayd...@gmail.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Hi Stefan,

Please find my response below:
1) Currently, I am keeping the signals as interface, so one can override with a 
different implementation, but a point noted that even the interface APIs could 
be also made dynamic so one can define APIs and its implementation, if they 
wish to override.
2) I've not looked into that yet, but I will look into it and see if it can be 
easily integrated into the Guardrails framework.
3) On the server side, when the framework detects that a node is overloaded, 
then it will throw OverloadedException back to the client. Because if the node 
while busy continues to serve additional requests, then it will slow down other 
peer nodes due to dependencies on meeting the QUORUM, etc. In this, we are at 
least preventing server nodes from melting down, and giving the control to the 
client via OverloadedException. Now, it will be up to the client policy, if 
client wishes to retry immediately on a different server node then eventually 
that server node might be impacted, but if client wishes to do exponential back 
off or throw exception back to the application then that server node will not 
be impacted.


Jaydeep

On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič 
mailto:stefan.mikloso...@gmail.com>> wrote:
Hi Jaydeep,

That seems quite interesting. Couple points though:

1) It would be nice if there is a way to "subscribe" to decisions your 
detection framework comes up with. Integration with e.g. diagnostics subsystem 
would be beneficial. This should be pluggable - just coding up an interface to 
dump / react on the decisions how I want. This might also act as a notifier to 
other systems, e-mail, slack channels ...

2) Have you tried to incorporate this with the Guardrails framework? I think 
that if something is detected to be throttled or rejected (e.g writing to a 
table), there might be a guardrail which would be triggered dynamically in 
runtime. Guardrails are useful as such but here we might reuse them so we do 
not need to code it twice.

3) I am curious how complex this detection framework would be, it can be 
complicated pretty fast I guess. What would be desirable is to act on it in 
such a way that you will not put that node under even more pressure. In other 
words, your detection system should work in such a way that there will not be 
any "doom loop" whereby mere throttling of various parts of Cassandra you make 
it even worse for other nodes in the cluster. For example, if a particular node 
starts to be overwhelmed and you detect this and requests start to be rejected, 
is it not possible that Java driver would start to see this node as "erroneous" 
with delayed response time etc and it would start to prefer other nodes in the 
cluster when deciding what node to contact for query coordination? So you would 
put more load on other nodes, making them more susceptible to be throttled as 
well ...

Regards

Stefan Miklosovic

On Tue, Jan 16, 2024 at 6:41 PM Jaydeep Chovatia 
mailto:chovatia.jayd...@gmail.com>> wrote:
Hi,

Happy New Year!

I would like to discuss the following idea:

Open-source Cassandra 
(CASSANDRA-15013<https://issues.apache.org/jira/browse/CASSANDRA-15013>) has an 
elementary built-in memory rate limiter based on the incoming payload from user 
requests. This rate limiter activates if any incoming user request’s payload 
exceeds certain thresholds. However, the existing rate limiter only solves 
limited-scope issues. Cassandra's server-side meltdown due to overload is a 
known problem. Often we see that a couple of busy nodes take down the entire 
Cassandra ring due to the ripple effect. The following document proposes a 
generic purpose comprehensive rate limiter that works considering system 
signals, such as CPU, and internal signals, such as thread pools. The rate 
limiter will have knobs to filter out internal traffic, system traffic, 
replication traffic, and furthermore based on the types of queries.

More design details to this doc: [OSS] Cassandra Generic Purpose Rate Limiter - 
Google 
Docs<https://docs.google.com/document/d/1w-A3fnoeBS6tS1ffBda_R0QR90olzFoMqLE7znFEUrQ/edit>

Please let me know your thoughts.

Jaydeep