Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-04-10 Thread Jaydeep Chovatia
Just created an official CEP-41
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-41+%28DRAFT%29+Apache+Cassandra+Unified+Rate+Limiter>
incorporating the feedback from this discussion. Feel free to let me know
if I may have missed some important feedback in this thread that is not
captured in the CEP-41.

Jaydeep

On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> Thanks, Josh. I will file an official CEP with all the details in a few
> days and update this thread with that CEP number.
> Thanks a lot everyone for providing valuable insights!
>
> Jaydeep
>
> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie 
> wrote:
>
>> Do folks think we should file an official CEP and take it there?
>>
>> +1 here.
>>
>> Synthesizing your gdoc, Caleb's work, and the feedback from this thread
>> into a draft seems like a solid next step.
>>
>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>>
>> I see a lot of great ideas being discussed or proposed in the past to
>> cover the most common rate limiter candidate use cases. Do folks think we
>> should file an official CEP and take it there?
>>
>> Jaydeep
>>
>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
>> wrote:
>>
>> I just remembered the other day that I had done a quick writeup on the
>> state of compaction stress-related throttling in the project:
>>
>>
>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>>
>> I'm sure most of it is old news to the people on this thread, but I
>> figured I'd post it just in case :)
>>
>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
>> wrote:
>>
>>
>> 2.) We should make sure the links between the "known" root causes of
>> cascading failures and the mechanisms we introduce to avoid them remain
>> very strong.
>>
>> Seems to me that our historical strategy was to address individual known
>> cases one-by-one rather than looking for a more holistic load-balancing and
>> load-shedding solution. While the engineer in me likes the elegance of a
>> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
>> wonders how far we think we are today from a stable set-point.
>>
>> i.e. are we facing a handful of cases where nodes can still get pushed
>> over and then cascade that we can surgically address, or are we facing a
>> broader lack of back-pressure that rears its head in different domains
>> (client -> coordinator, coordinator -> replica, internode with other
>> operations, etc) at surprising times and should be considered more
>> holistically?
>>
>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>
>> I almost forgot CASSANDRA-15817, which introduced
>> reject_repair_compaction_threshold, which provides a mechanism to stop
>> repairs while compaction is underwater.
>>
>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
>> wrote:
>>
>> 
>> Hey all,
>>
>> I'm a bit late to the discussion. I see that we've already discussed
>> CASSANDRA-15013 <https://issues.apache.org/jira/browse/CASSANDRA-15013>
>>  and CASSANDRA-16663
>> <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least in
>> passing. Having written the latter, I'd be the first to admit it's a crude
>> tool, although it's been useful here and there, and provides a couple
>> primitives that may be useful for future work. As Scott mentions, while it
>> is configurable at runtime, it is not adaptive, although we did
>> make configuration easier in CASSANDRA-17423
>> <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also is
>> global to the node, although we've lightly discussed some ideas around
>> making it more granular. (For example, keyspace-based limiting, or limiting
>> "domains" tagged by the client in requests, could be interesting.) It also
>> does not deal with inter-node traffic, of course.
>>
>> Something we've not yet mentioned (that does address internode traffic)
>> is CASSANDRA-17324
>> <https://issues.apache.org/jira/browse/CASSANDRA-17324>, which I
>> proposed shortly after working on the native request limiter (and have just
>> not had much time to return to). The basic idea is this:
>>
>> When a node is struggling under the weight of a compaction backlog and
>> becomes a cause of increased read latency for clients, we have two safety
>> valves:
>>
>>
>> 1.) Disabling the native protocol server, which stops th

Re: [Discuss] Repair inside C*

2024-02-25 Thread Jaydeep Chovatia
Thanks, Josh. I've just updated the CEP
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Official+Repair+Solution>
and included all the solutions you mentioned below.

Jaydeep

On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie  wrote:

> Very late response from me here (basically necro'ing this thread).
>
> I think it'd be useful to get this condensed into a CEP that we can then
> discuss in that format. It's clearly something we all agree we need and
> having an implementation that works, even if it's not in your preferred
> execution domain, is vastly better than nothing IMO.
>
> I don't have cycles (nor background ;) ) to do that, but it sounds like
> you do Jaydeep given the implementation you have on a private fork + design.
>
> A non-exhaustive list of things that might be useful incorporating into or
> referencing from a CEP:
> Slack thread:
> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> Joey's old C* ticket:
> https://issues.apache.org/jira/browse/CASSANDRA-14346
> Even older automatic repair scheduling:
> https://issues.apache.org/jira/browse/CASSANDRA-10070
> Your design gdoc:
> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
> PR with automated repair:
> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>
> My intuition is that we're all basically in agreement that this is
> something the DB needs, we're all willing to bikeshed for our personal
> preference on where it lives and how it's implemented, and at the end of
> the day, code talks. I don't think anyone's said they'll die on the hill of
> implementation details, so that feels like CEP time to me.
>
> If you were willing and able to get a CEP together for automated repair
> based on the above material, given you've done the work and have the proof
> points it's working at scale, I think this would be a *huge contribution*
> to the community.
>
> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>
> Is anyone going to file an official CEP for this?
> As mentioned in this email thread, here is one of the solution's design
> doc
> <https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0>
> and source code on a private Apache Cassandra patch. Could you go through
> it and let me know what you think?
>
> Jaydeep
>
> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad 
> wrote:
>
> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
>
> This is something I hadn't thought much about, and is a pretty good
> argument for using the sidecar initially.  There's a lot of deployments out
> there and having an official repair option would be a big win.
>
>
> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> > I agree that it would be ideal for Cassandra to have a repair scheduler
> in-DB.
> >
> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
> >
> > Once TCM has landed, we’ll have much stronger primitives for repair
> orchestration in the database itself. But I don’t think that should block
> progress on a repair scheduling solution in the sidecar, and there is
> nothing that would prevent someone from continuing to use a sidecar-based
> solution in perpetuity if they preferred.
> >
> > - Scott
> >
> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad 
> wrote:
> > >
> > > I'm 100% in favor of repair being part of the core DB, not the
> sidecar.  The current (and past) state of things where running the DB
> correctly *requires* running a separate process (either community
> maintained or official C* sidecar) is incredibly painful for folks.  The
> idea that your data integrity needs to be opt-in has never made sense to me
> from the perspective of either the product or the end user.
> > >
> > > I've worked with way too many teams that have either configured this
> incorrectly or not at all.
> > >
> > > Ideally Cassandra would ship with repair built in and on by default.
> Power users can disable if they want to continue to maintain their own
> repair tooling for some reason.
> > &

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-22 Thread Jaydeep Chovatia
Thanks, Josh. I will file an official CEP with all the details in a few
days and update this thread with that CEP number.
Thanks a lot everyone for providing valuable insights!

Jaydeep

On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie  wrote:

> Do folks think we should file an official CEP and take it there?
>
> +1 here.
>
> Synthesizing your gdoc, Caleb's work, and the feedback from this thread
> into a draft seems like a solid next step.
>
> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>
> I see a lot of great ideas being discussed or proposed in the past to
> cover the most common rate limiter candidate use cases. Do folks think we
> should file an official CEP and take it there?
>
> Jaydeep
>
> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
> wrote:
>
> I just remembered the other day that I had done a quick writeup on the
> state of compaction stress-related throttling in the project:
>
>
> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>
> I'm sure most of it is old news to the people on this thread, but I
> figured I'd post it just in case :)
>
> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
> wrote:
>
>
> 2.) We should make sure the links between the "known" root causes of
> cascading failures and the mechanisms we introduce to avoid them remain
> very strong.
>
> Seems to me that our historical strategy was to address individual known
> cases one-by-one rather than looking for a more holistic load-balancing and
> load-shedding solution. While the engineer in me likes the elegance of a
> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
> wonders how far we think we are today from a stable set-point.
>
> i.e. are we facing a handful of cases where nodes can still get pushed
> over and then cascade that we can surgically address, or are we facing a
> broader lack of back-pressure that rears its head in different domains
> (client -> coordinator, coordinator -> replica, internode with other
> operations, etc) at surprising times and should be considered more
> holistically?
>
> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>
> I almost forgot CASSANDRA-15817, which introduced
> reject_repair_compaction_threshold, which provides a mechanism to stop
> repairs while compaction is underwater.
>
> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
> wrote:
>
> 
> Hey all,
>
> I'm a bit late to the discussion. I see that we've already discussed
> CASSANDRA-15013 <https://issues.apache.org/jira/browse/CASSANDRA-15013>
>  and CASSANDRA-16663
> <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least in
> passing. Having written the latter, I'd be the first to admit it's a crude
> tool, although it's been useful here and there, and provides a couple
> primitives that may be useful for future work. As Scott mentions, while it
> is configurable at runtime, it is not adaptive, although we did
> make configuration easier in CASSANDRA-17423
> <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also is
> global to the node, although we've lightly discussed some ideas around
> making it more granular. (For example, keyspace-based limiting, or limiting
> "domains" tagged by the client in requests, could be interesting.) It also
> does not deal with inter-node traffic, of course.
>
> Something we've not yet mentioned (that does address internode traffic) is
> CASSANDRA-17324 <https://issues.apache.org/jira/browse/CASSANDRA-17324>,
> which I proposed shortly after working on the native request limiter (and
> have just not had much time to return to). The basic idea is this:
>
> When a node is struggling under the weight of a compaction backlog and
> becomes a cause of increased read latency for clients, we have two safety
> valves:
>
>
> 1.) Disabling the native protocol server, which stops the node from
> coordinating reads and writes.
> 2.) Jacking up the severity on the node, which tells the dynamic snitch to
> avoid the node for reads from other coordinators.
>
>
> These are useful, but we don’t appear to have any mechanism that would
> allow us to temporarily reject internode hint, batch, and mutation messages
> that could further delay resolution of the compaction backlog.
>
>
> Whether it's done as part of a larger framework or on its own, it still
> feels like a good idea.
>
> Thinking in terms of opportunity costs here (i.e. where we spend our
> finite engineering time to holistically improve the experience of operating
> this database) is healthy, but we probably haven't reached the point of
> diminishing returns on nodes being able to protect the

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-02-07 Thread Jaydeep Chovatia
I see a lot of great ideas being discussed or proposed in the past to cover
the most common rate limiter candidate use cases. Do folks think we should
file an official CEP and take it there?

Jaydeep

On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
wrote:

> I just remembered the other day that I had done a quick writeup on the
> state of compaction stress-related throttling in the project:
>
>
> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>
> I'm sure most of it is old news to the people on this thread, but I
> figured I'd post it just in case :)
>
> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
> wrote:
>
>> 2.) We should make sure the links between the "known" root causes of
>> cascading failures and the mechanisms we introduce to avoid them remain
>> very strong.
>>
>> Seems to me that our historical strategy was to address individual known
>> cases one-by-one rather than looking for a more holistic load-balancing and
>> load-shedding solution. While the engineer in me likes the elegance of a
>> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
>> wonders how far we think we are today from a stable set-point.
>>
>> i.e. are we facing a handful of cases where nodes can still get pushed
>> over and then cascade that we can surgically address, or are we facing a
>> broader lack of back-pressure that rears its head in different domains
>> (client -> coordinator, coordinator -> replica, internode with other
>> operations, etc) at surprising times and should be considered more
>> holistically?
>>
>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>
>> I almost forgot CASSANDRA-15817, which introduced
>> reject_repair_compaction_threshold, which provides a mechanism to stop
>> repairs while compaction is underwater.
>>
>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
>> wrote:
>>
>> 
>> Hey all,
>>
>> I'm a bit late to the discussion. I see that we've already discussed
>> CASSANDRA-15013 
>>  and CASSANDRA-16663
>>  at least in
>> passing. Having written the latter, I'd be the first to admit it's a crude
>> tool, although it's been useful here and there, and provides a couple
>> primitives that may be useful for future work. As Scott mentions, while it
>> is configurable at runtime, it is not adaptive, although we did
>> make configuration easier in CASSANDRA-17423
>> . It also is
>> global to the node, although we've lightly discussed some ideas around
>> making it more granular. (For example, keyspace-based limiting, or limiting
>> "domains" tagged by the client in requests, could be interesting.) It also
>> does not deal with inter-node traffic, of course.
>>
>> Something we've not yet mentioned (that does address internode traffic)
>> is CASSANDRA-17324
>> , which I
>> proposed shortly after working on the native request limiter (and have just
>> not had much time to return to). The basic idea is this:
>>
>> When a node is struggling under the weight of a compaction backlog and
>> becomes a cause of increased read latency for clients, we have two safety
>> valves:
>>
>> 1.) Disabling the native protocol server, which stops the node from
>> coordinating reads and writes.
>> 2.) Jacking up the severity on the node, which tells the dynamic snitch
>> to avoid the node for reads from other coordinators.
>>
>> These are useful, but we don’t appear to have any mechanism that would
>> allow us to temporarily reject internode hint, batch, and mutation messages
>> that could further delay resolution of the compaction backlog.
>>
>>
>> Whether it's done as part of a larger framework or on its own, it still
>> feels like a good idea.
>>
>> Thinking in terms of opportunity costs here (i.e. where we spend our
>> finite engineering time to holistically improve the experience of operating
>> this database) is healthy, but we probably haven't reached the point of
>> diminishing returns on nodes being able to protect themselves from clients
>> and from other nodes. I would just keep in mind two things:
>>
>> 1.) The effectiveness of rate-limiting in the system (which includes the
>> database and all clients) as a whole necessarily decreases as we move from
>> the application to the lowest-level database internals. Limiting correctly
>> at the client will save more resources than limiting at the native protocol
>> server, and limiting correctly at the native protocol server will save more
>> resources than limiting after we've dispatched requests to some thread pool
>> for processing.
>> 2.) We should make sure the links between the "known" root causes of
>> cascading failures and the mechanisms we introduce to avoid them remain
>> very strong.
>>
>> In any case, I'd be happy to help out in any way I can as this moves

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-22 Thread Jaydeep Chovatia
a big win.
>>
>> > The major challenge with latency based rate limiters is that the
>> latency is subjective from one workload to another.
>>
>> You're absolutely right.  This goes to my other suggestion that
>> client-side rate limiting would be a higher priority (on my list at least)
>> as it is perfectly suited for multiple varying workloads.  Of course, if
>> you're not interested in working on the drivers and only on C* itself, this
>> is a moot point.  You're free to work on whatever you want - I just think
>> there's a ton more value in the drivers being able to throttle requests to
>> deal than server side.
>>
>> > And if these two are +ve then consider the server under pressure. And
>> once it is under the pressure, then shed the traffic from less aggressive
>> to more aggressive, etc. The idea is to prevent Cassandra server from
>> melting (by considering the above two signals to begin with and add any
>> more based on the learnings)
>>
>> Yes, I agree using dropped metrics (errors) is useful, as well as queue
>> length.  I can't remember offhand all the details of the request queue and
>> how load shedding works there, I need to go back and look.  If we don't
>> already have load shedding based on queue depth that seems like an easy
>> thing to do immediately, and is a high quality signal.  Maybe someone can
>> remind me if we have that already?
>>
>> My issue with using CPU to rate limit clients is that I think it's a very
>> low quality signal, and I suspect it'll trigger a ton of false positives.
>> For example, there's a big difference from performance being impacted by
>> repair vs large reads vs backing up a snapshot to an object store, but they
>> have similar effects on the CPU - high I/O, high CPU usage, both sustained
>> over time.  Imo it would be a pretty bad decision to throttle clients when
>> we should be throttling repair instead, and we should only do so if it's
>> actually causing an issue for the client, something CPU usage can't tell
>> us, only the response time and error rates can.
>>
>> In the case of a backup, throttling might make sense, or might not, it
>> really depends on the environment and if backups are happening
>> concurrently.  If a backup's configured with nice +19 (as it should be),
>> I'd consider throttling user requests to be a false positive, potentially
>> one that does more harm than good to the cluster, since the OS should be
>> deprioritizing the backup for us rather than us deprioritizing C*.
>>
>> In my ideal world, if C* detected problematic response times (possibly
>> violating a per-table, target latency time) or query timeouts, it would
>> start by throttling back compactions, repairs, and streaming to ensure
>> client requests can be serviced.  I think we'd need to define the latency
>> targets in order for this to work optimally, b/c you might not want to wait
>> for query timeouts before you throttle.  I think there's a lot of value in
>> dynamically adaptive compaction, repair, and streaming since it would
>> prioritize user requests, but again, if you're not willing to work on that,
>> it's your call.
>>
>> Anyways - I like the idea of putting more safeguards in the database
>> itself, we're fundamentally in agreement there.  I see a ton of value in
>> having flexible rate limiters, whether it be per-table, keyspace, or
>> user+table combination.  I'd also like to ensure the feature doesn't cause
>> more disruptions than it solves, which I think would be the case from using
>> CPU usage as a signal.
>>
>> Jon
>>
>>
>> On Wed, Jan 17, 2024 at 10:26 AM Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> Jon,
>>>
>>> The major challenge with latency based rate limiters is that the latency
>>> is subjective from one workload to another. As a result, in the proposal I
>>> have described, the idea is to make decision on the following combinations:
>>>
>>>1. System parameters (such as CPU usage, etc.)
>>>2. Cassandra thread pools health (are they dropping requests, etc.)
>>>
>>> And if these two are +ve then consider the server under pressure. And
>>> once it is under the pressure, then shed the traffic from less aggressive
>>> to more aggressive, etc. The idea is to prevent Cassandra server from
>>> melting (by considering the above two signals to begin with and add any
>>> more based on the learnings)
>>>
>>> Scott,
>>>
>>> Yes, I did look at some of the implementations, but

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-17 Thread Jaydeep Chovatia
Jon,

The major challenge with latency based rate limiters is that the latency is
subjective from one workload to another. As a result, in the proposal I
have described, the idea is to make decision on the following combinations:

   1. System parameters (such as CPU usage, etc.)
   2. Cassandra thread pools health (are they dropping requests, etc.)

And if these two are +ve then consider the server under pressure. And once
it is under the pressure, then shed the traffic from less aggressive to
more aggressive, etc. The idea is to prevent Cassandra server from melting
(by considering the above two signals to begin with and add any more based
on the learnings)

Scott,

Yes, I did look at some of the implementations, but they are all great
systems and helping quite a lot. But they are still not relying on system
health, etc. and also not in the generic coordinator/replication read/write
path. The idea here is on the similar lines as the existing
implementations, but making it a bit more generic and trying to cover as
many paths as possible.

German,

Sure, let's first continue the discussions here. If it turns out that there
is no widespread interest in the idea then we can do 1:1 and see how we can
help each other on a private fork, etc.

Jaydeep

On Wed, Jan 17, 2024 at 7:57 AM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Jaydeep,
>
> I concur with Stefan that extensibility of this  should be a design goal:
>
>- It should be easy to add additional metrics (e.g. write queue depth)
>and decision logic
>- There should be a way to interact with other systems to signal a
>resource need  which then could kick off things like scaling
>
>
> Super interested in this and we have been thinking about siimilar things
> internally 
>
> Thanks,
> German
> ------
> *From:* Jaydeep Chovatia 
> *Sent:* Tuesday, January 16, 2024 1:16 PM
> *To:* dev@cassandra.apache.org 
> *Subject:* [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in
> Cassandra
>
> You don't often get email from chovatia.jayd...@gmail.com. Learn why this
> is important <https://aka.ms/LearnAboutSenderIdentification>
> Hi Stefan,
>
> Please find my response below:
> 1) Currently, I am keeping the signals as interface, so one can override
> with a different implementation, but a point noted that even the interface
> APIs could be also made dynamic so one can define APIs and its
> implementation, if they wish to override.
> 2) I've not looked into that yet, but I will look into it and see if it
> can be easily integrated into the Guardrails framework.
> 3) On the server side, when the framework detects that a node is
> overloaded, then it will throw *OverloadedException* back to the client.
> Because if the node while busy continues to serve additional requests, then
> it will slow down other peer nodes due to dependencies on meeting the
> QUORUM, etc. In this, we are at least preventing server nodes from melting
> down, and giving the control to the client via *OverloadedException.*
> Now, it will be up to the client policy, if client wishes to retry
> immediately on a different server node then eventually that server node
> might be impacted, but if client wishes to do exponential back off or throw
> exception back to the application then that server node will not be
> impacted.
>
>
> Jaydeep
>
> On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
> Hi Jaydeep,
>
> That seems quite interesting. Couple points though:
>
> 1) It would be nice if there is a way to "subscribe" to decisions your
> detection framework comes up with. Integration with e.g. diagnostics
> subsystem would be beneficial. This should be pluggable - just coding up an
> interface to dump / react on the decisions how I want. This might also act
> as a notifier to other systems, e-mail, slack channels ...
>
> 2) Have you tried to incorporate this with the Guardrails framework? I
> think that if something is detected to be throttled or rejected (e.g
> writing to a table), there might be a guardrail which would be triggered
> dynamically in runtime. Guardrails are useful as such but here we might
> reuse them so we do not need to code it twice.
>
> 3) I am curious how complex this detection framework would be, it can be
> complicated pretty fast I guess. What would be desirable is to act on it in
> such a way that you will not put that node under even more pressure. In
> other words, your detection system should work in such a way that there
> will not be any "doom loop" whereby mere throttling of various parts of
> Cassandra you make it even worse for other nodes in the cluster. For
> example, if a particular node starts to be overw

Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-16 Thread Jaydeep Chovatia
Hi Stefan,

Please find my response below:
1) Currently, I am keeping the signals as interface, so one can override
with a different implementation, but a point noted that even the interface
APIs could be also made dynamic so one can define APIs and its
implementation, if they wish to override.
2) I've not looked into that yet, but I will look into it and see if it can
be easily integrated into the Guardrails framework.
3) On the server side, when the framework detects that a node is
overloaded, then it will throw *OverloadedException* back to the client.
Because if the node while busy continues to serve additional requests, then
it will slow down other peer nodes due to dependencies on meeting the
QUORUM, etc. In this, we are at least preventing server nodes from melting
down, and giving the control to the client via *OverloadedException.* Now,
it will be up to the client policy, if client wishes to retry immediately
on a different server node then eventually that server node might be
impacted, but if client wishes to do exponential back off or throw
exception back to the application then that server node will not be
impacted.


Jaydeep

On Tue, Jan 16, 2024 at 10:03 AM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> Hi Jaydeep,
>
> That seems quite interesting. Couple points though:
>
> 1) It would be nice if there is a way to "subscribe" to decisions your
> detection framework comes up with. Integration with e.g. diagnostics
> subsystem would be beneficial. This should be pluggable - just coding up an
> interface to dump / react on the decisions how I want. This might also act
> as a notifier to other systems, e-mail, slack channels ...
>
> 2) Have you tried to incorporate this with the Guardrails framework? I
> think that if something is detected to be throttled or rejected (e.g
> writing to a table), there might be a guardrail which would be triggered
> dynamically in runtime. Guardrails are useful as such but here we might
> reuse them so we do not need to code it twice.
>
> 3) I am curious how complex this detection framework would be, it can be
> complicated pretty fast I guess. What would be desirable is to act on it in
> such a way that you will not put that node under even more pressure. In
> other words, your detection system should work in such a way that there
> will not be any "doom loop" whereby mere throttling of various parts of
> Cassandra you make it even worse for other nodes in the cluster. For
> example, if a particular node starts to be overwhelmed and you detect this
> and requests start to be rejected, is it not possible that Java driver
> would start to see this node as "erroneous" with delayed response time etc
> and it would start to prefer other nodes in the cluster when deciding what
> node to contact for query coordination? So you would put more load on other
> nodes, making them more susceptible to be throttled as well ...
>
> Regards
>
> Stefan Miklosovic
>
> On Tue, Jan 16, 2024 at 6:41 PM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Hi,
>>
>> Happy New Year!
>>
>> I would like to discuss the following idea:
>>
>> Open-source Cassandra (CASSANDRA-15013
>> <https://issues.apache.org/jira/browse/CASSANDRA-15013>) has an
>> elementary built-in memory rate limiter based on the incoming payload from
>> user requests. This rate limiter activates if any incoming user request’s
>> payload exceeds certain thresholds. However, the existing rate limiter only
>> solves limited-scope issues. Cassandra's server-side meltdown due to
>> overload is a known problem. Often we see that a couple of busy nodes take
>> down the entire Cassandra ring due to the ripple effect. The following
>> document proposes a generic purpose comprehensive rate limiter that works
>> considering system signals, such as CPU, and internal signals, such as
>> thread pools. The rate limiter will have knobs to filter out internal
>> traffic, system traffic, replication traffic, and furthermore based on the
>> types of queries.
>>
>> More design details to this doc: [OSS] Cassandra Generic Purpose Rate
>> Limiter - Google Docs
>> <https://docs.google.com/document/d/1w-A3fnoeBS6tS1ffBda_R0QR90olzFoMqLE7znFEUrQ/edit>
>>
>> Please let me know your thoughts.
>>
>> Jaydeep
>>
>


[Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-16 Thread Jaydeep Chovatia
Hi,

Happy New Year!

I would like to discuss the following idea:

Open-source Cassandra (CASSANDRA-15013
) has an elementary
built-in memory rate limiter based on the incoming payload from user
requests. This rate limiter activates if any incoming user request’s
payload exceeds certain thresholds. However, the existing rate limiter only
solves limited-scope issues. Cassandra's server-side meltdown due to
overload is a known problem. Often we see that a couple of busy nodes take
down the entire Cassandra ring due to the ripple effect. The following
document proposes a generic purpose comprehensive rate limiter that works
considering system signals, such as CPU, and internal signals, such as
thread pools. The rate limiter will have knobs to filter out internal
traffic, system traffic, replication traffic, and furthermore based on the
types of queries.

More design details to this doc: [OSS] Cassandra Generic Purpose Rate
Limiter - Google Docs


Please let me know your thoughts.

Jaydeep


Re: Need Confluent "Create" permission for filing a CEP

2023-10-10 Thread Jaydeep Chovatia
Thank you!

On Tue, Oct 10, 2023 at 2:58 AM Brandon Williams  wrote:

> I've added you, you should have access now.
>
> Kind Regards,
> Brandon
>
> On Tue, Oct 10, 2023 at 1:24 AM Jaydeep Chovatia
>  wrote:
> >
> > Hi,
> >
> > I want to create a new CEP request but do not see the "Create" page
> permission on Confluent. Could someone permit me?
> > Here is the CEP draft: [DRAFT] CEP - Apache Cassandra Official Repair
> Solution - Google Docs
> >
> > My confluent user-id is: chovatia.jayd...@gmail.com
> >
> > Jaydeep
>


Need Confluent "Create" permission for filing a CEP

2023-10-10 Thread Jaydeep Chovatia
Hi,

I want to create a new CEP request but do not see the "Create" page
permission on Confluent
.
Could someone permit me?
Here is the CEP draft: [DRAFT] CEP - Apache Cassandra Official Repair
Solution - Google Docs


My confluent user-id is: chovatia.jayd...@gmail.com

Jaydeep


Re: [Discuss] Repair inside C*

2023-08-24 Thread Jaydeep Chovatia
Is anyone going to file an official CEP for this?
As mentioned in this email thread, here is one of the solution's design doc

and source code on a private Apache Cassandra patch. Could you go through
it and let me know what you think?

Jaydeep

On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad 
wrote:

> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
>
> This is something I hadn't thought much about, and is a pretty good
> argument for using the sidecar initially.  There's a lot of deployments out
> there and having an official repair option would be a big win.
>
>
> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> > I agree that it would be ideal for Cassandra to have a repair scheduler
> in-DB.
> >
> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
> >
> > Once TCM has landed, we’ll have much stronger primitives for repair
> orchestration in the database itself. But I don’t think that should block
> progress on a repair scheduling solution in the sidecar, and there is
> nothing that would prevent someone from continuing to use a sidecar-based
> solution in perpetuity if they preferred.
> >
> > - Scott
> >
> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad 
> wrote:
> > >
> > > I'm 100% in favor of repair being part of the core DB, not the
> sidecar.  The current (and past) state of things where running the DB
> correctly *requires* running a separate process (either community
> maintained or official C* sidecar) is incredibly painful for folks.  The
> idea that your data integrity needs to be opt-in has never made sense to me
> from the perspective of either the product or the end user.
> > >
> > > I've worked with way too many teams that have either configured this
> incorrectly or not at all.
> > >
> > > Ideally Cassandra would ship with repair built in and on by default.
> Power users can disable if they want to continue to maintain their own
> repair tooling for some reason.
> > >
> > > Jon
> > >
> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> > >> All,
> > >> We had a brief discussion in [2] about the Uber article [1] where
> they talk about having integrated repair into Cassandra and how great that
> is. I expressed my disappointment that they didn't work with the community
> on that (Uber, if you are listening time to make amends ) and it turns
> out Joey already had the idea and wrote the code [3] - so I wanted to start
> a discussion to gauge interest and maybe how to revive that effort.
> > >> Thanks,
> > >> German
> > >> [1]
> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> > >> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> > >> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
> >
>


[Discuss] Detecting token-ownership mismatch

2023-08-16 Thread Jaydeep Chovatia
Hi,


As we know, Cassandra exchanges important topology and
token-ownership-related details over Gossip. Cassandra internally maintains
the following two separate caches that have the token-ownership information
maintained: 1) Gossip cache and 2) Storage Service cache. The first Gossip
cache is updated on a node, followed by the storage service cache. In the
hot path, ownership is calculated from the storage service cache. Since two
separate caches maintain the same information, then inconsistencies are
bound to happen. It could be very well feasible that the Gossip cache has
up-to-date ownership of the Cassandra cluster, but the service cache does
not, and in that scenario, inconsistent data will be served to the user.

Currently, there is no mechanism in Cassandra that detects and fixes these
two caches.

*Long-term solution*
We are going with the long-term transactional metadata (
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21) to handle
such inconsistencies, and that’s the right thing to do.

*Short-term solution*
But CEP-21 might take some time, and until then, there is a need to
*detect* such
inconsistencies. Once we detect inconsistencies, then we could have two
options: 1) restart the node or 2) Fix the inconsistencies on-the-fly.

I've created the following JIRA for the short-term fix:
https://issues.apache.org/jira/browse/CASSANDRA-18758


Does this sound valuable?


Jaydeep


Re: [Discuss] Repair inside C*

2023-07-25 Thread Jaydeep Chovatia
Sounds good, German. Feel free to let me know if you need my help in filing
CEP, adding supporting content to the CEP, etc.
As I mentioned previously, I have already been working (going through an
internal review) on creating a one-pager doc, code, etc., that has been
working for us for the last six years at an immense scale, and I will share
it soon on a private fork.

Thanks,
Jaydeep

On Tue, Jul 25, 2023 at 9:48 AM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> In [2] we suggested that the next step should be a CEP.
>
> I am happy to lend a hand to this effort as well.
>
> Thanks Jaydeep and David - really appreciated.
>
> German
>
> --
> *From:* David Capwell 
> *Sent:* Tuesday, July 25, 2023 8:32 AM
> *To:* dev 
> *Cc:* German Eichberger 
> *Subject:* [EXTERNAL] Re: [Discuss] Repair inside C*
>
> As someone who has done a lot of work trying to make repair stable, I
> approve of this message ^_^
>
> More than glad to help mentor this work
>
> On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia 
> wrote:
>
> To clarify the repair solution timing, the one we have listed in the
> article is not the recently developed one. We were hitting some
> high-priority production challenges back in early 2018, and to address
> that, we developed and rolled out the solution in production in just a few
> months. The timing-wise, the solution was developed and productized by Q3
> 2018, of course, continued to evolve thereafter. Usually, we explore the
> existing solutions we can leverage, but when we started our journey in
> early 2018, most of the solutions were based on sidecar solutions. There is
> nothing against the sidecar solution; it was just a pure business decision,
> and in that, we wanted to avoid the sidecar to avoid a dependency on the
> control plane. Every solution developed has its deep context, merits, and
> pros and cons; they are all great solutions!
>
> An appeal to the community members is to think one more time about having
> repairs in the Open Source Cassandra itself. As mentioned in my previous
> email, any solution getting adopted is fine; the important aspect is to
> have a repair solution in the OSS Cassandra itself!
>
> Yours Faithfully,
> Jaydeep
>
> On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
> Hi German,
>
> The goal is always to backport our learnings back to the community. For
> example, I have already successfully backported the following two
> enhancements/bug fixes back to the Open Source Cassandra, which are
> described in the article. I am already currently working on open-source a
> few more enhancements mentioned in the article back to the open-source.
>
>1. https://issues.apache.org/jira/browse/CASSANDRA-18555
>2. https://issues.apache.org/jira/browse/CASSANDRA-13740
>
> There is definitely heavy interest in having the repair solution inside
> the Open Source Cassandra itself, very much like Compaction. As I write
> this email, we are internally working on a one-pager proposal doc to all
> the community members on having a repair inside the OSS Apache Cassandra
> along with our private fork - I will share it soon.
>
> Generally, we are ok with any solution getting adopted (either Joey's
> solution or our repair solution or any other solution). The primary
> motivation is to have the repair embedded inside the open-source Cassandra
> itself, so we can retire all various privately developed solutions
> eventually :)
>
> I am also happy to help (drive conversation, discussion, etc.) in any way
> to have a repair solution adopted inside Cassandra itself, please let me
> know. Happy to help!
>
> Yours Faithfully,
> Jaydeep
>
> On Mon, Jul 24, 2023 at 1:44 PM German Eichberger via dev <
> dev@cassandra.apache.org> wrote:
>
> All,
>
> We had a brief discussion in [2] about the Uber article [1] where they
> talk about having integrated repair into Cassandra and how great that is. I
> expressed my disappointment that they didn't work with the community on
> that (Uber, if you are listening time to make amends ) and it turns out
> Joey already had the idea and wrote the code [3] - so I wanted to start a
> discussion to gauge interest and maybe how to revive that effort.
>
> Thanks,
> German
>
> [1]
> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>
>
>


Re: [Discuss] Repair inside C*

2023-07-24 Thread Jaydeep Chovatia
To clarify the repair solution timing, the one we have listed in the
article is not the recently developed one. We were hitting some
high-priority production challenges back in early 2018, and to address
that, we developed and rolled out the solution in production in just a few
months. The timing-wise, the solution was developed and productized by Q3
2018, of course, continued to evolve thereafter. Usually, we explore the
existing solutions we can leverage, but when we started our journey in
early 2018, most of the solutions were based on sidecar solutions. There is
nothing against the sidecar solution; it was just a pure business decision,
and in that, we wanted to avoid the sidecar to avoid a dependency on the
control plane. Every solution developed has its deep context, merits, and
pros and cons; they are all great solutions!

An appeal to the community members is to think one more time about having
repairs in the Open Source Cassandra itself. As mentioned in my previous
email, any solution getting adopted is fine; the important aspect is to
have a repair solution in the OSS Cassandra itself!

Yours Faithfully,
Jaydeep

On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia 
wrote:

> Hi German,
>
> The goal is always to backport our learnings back to the community. For
> example, I have already successfully backported the following two
> enhancements/bug fixes back to the Open Source Cassandra, which are
> described in the article. I am already currently working on open-source a
> few more enhancements mentioned in the article back to the open-source.
>
>1. https://issues.apache.org/jira/browse/CASSANDRA-18555
>2. https://issues.apache.org/jira/browse/CASSANDRA-13740
>
> There is definitely heavy interest in having the repair solution inside
> the Open Source Cassandra itself, very much like Compaction. As I write
> this email, we are internally working on a one-pager proposal doc to all
> the community members on having a repair inside the OSS Apache Cassandra
> along with our private fork - I will share it soon.
>
> Generally, we are ok with any solution getting adopted (either Joey's
> solution or our repair solution or any other solution). The primary
> motivation is to have the repair embedded inside the open-source Cassandra
> itself, so we can retire all various privately developed solutions
> eventually :)
>
> I am also happy to help (drive conversation, discussion, etc.) in any way
> to have a repair solution adopted inside Cassandra itself, please let me
> know. Happy to help!
>
> Yours Faithfully,
> Jaydeep
>
> On Mon, Jul 24, 2023 at 1:44 PM German Eichberger via dev <
> dev@cassandra.apache.org> wrote:
>
>> All,
>>
>> We had a brief discussion in [2] about the Uber article [1] where they
>> talk about having integrated repair into Cassandra and how great that is. I
>> expressed my disappointment that they didn't work with the community on
>> that (Uber, if you are listening time to make amends ) and it turns out
>> Joey already had the idea and wrote the code [3] - so I wanted to start a
>> discussion to gauge interest and maybe how to revive that effort.
>>
>> Thanks,
>> German
>>
>> [1]
>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>>
>


Re: [Discuss] Repair inside C*

2023-07-24 Thread Jaydeep Chovatia
Hi German,

The goal is always to backport our learnings back to the community. For
example, I have already successfully backported the following two
enhancements/bug fixes back to the Open Source Cassandra, which are
described in the article. I am already currently working on open-source a
few more enhancements mentioned in the article back to the open-source.

   1. https://issues.apache.org/jira/browse/CASSANDRA-18555
   2. https://issues.apache.org/jira/browse/CASSANDRA-13740

There is definitely heavy interest in having the repair solution inside the
Open Source Cassandra itself, very much like Compaction. As I write this
email, we are internally working on a one-pager proposal doc to all the
community members on having a repair inside the OSS Apache Cassandra along
with our private fork - I will share it soon.

Generally, we are ok with any solution getting adopted (either Joey's
solution or our repair solution or any other solution). The primary
motivation is to have the repair embedded inside the open-source Cassandra
itself, so we can retire all various privately developed solutions
eventually :)

I am also happy to help (drive conversation, discussion, etc.) in any way
to have a repair solution adopted inside Cassandra itself, please let me
know. Happy to help!

Yours Faithfully,
Jaydeep

On Mon, Jul 24, 2023 at 1:44 PM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> All,
>
> We had a brief discussion in [2] about the Uber article [1] where they
> talk about having integrated repair into Cassandra and how great that is. I
> expressed my disappointment that they didn't work with the community on
> that (Uber, if you are listening time to make amends ) and it turns out
> Joey already had the idea and wrote the code [3] - so I wanted to start a
> discussion to gauge interest and maybe how to revive that effort.
>
> Thanks,
> German
>
> [1]
> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>


Cassandra 3.0.27 - Tombstone disappeared during node replacement

2022-11-14 Thread Jaydeep Chovatia
Hi,

I am running Cassandra 3.0.27 in my production. In some corner case
scenarios, we have tombstones disappearing during bootstrap/decommission.
I've outlined a possible theory with the root cause in this ticket:
https://issues.apache.org/jira/browse/CASSANDRA-17991

Could someone please help validate this?

Jaydeep


Re: Cassandra Token ownership split-brain (3.0.14)

2022-09-06 Thread Jaydeep Chovatia
Thanks Scott. I will prioritize upgrading to 3.0.27 and will circle back if
this issue persists.

Jaydeep


On Tue, Sep 6, 2022 at 3:45 PM C. Scott Andreas 
wrote:

> Hi Jaydeep,
>
> Thanks for reaching out and for bumping this thread.
>
> This is probably not the answer you’re after, but mentioning as it may
> address the issue.
>
> C* 3.0.14 was released over five years ago, with many hundreds of
> important bug fixes landing since July 2017. These include fixes for issues
> that have affected gossip in the past which may be related to this issue.
> Note that 3.0.14 also is susceptible to several critical data loss bugs
> including C-14513 and C-14515.
>
> I’d strongly recommend upgrading to Cassandra 3.0.27 as a starting point.
> If this doesn’t resolve your issue, members of the community may be in a
> better position to help triage a bug report against a current release of
> the database.
>
> - Scott
>
> On Sep 6, 2022, at 5:13 PM, Jaydeep Chovatia 
> wrote:
>
> 
> If anyone has seen this issue and knows a fix, it would be a great help!
> Thanks in advance.
>
> Jaydeep
>
> On Fri, Sep 2, 2022 at 1:56 PM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Hi,
>>
>> We are running a production Cassandra version (3.0.14) with 256 tokens
>> v-node configuration. Occasionally, we see that different nodes show
>> different ownership for the same key. Only a node restart corrects;
>> otherwise, it continues to behave in a split-brain.
>>
>> Say, for example,
>>
>> *NodeA*
>> nodetool getendpoints ks1 table1 10
>> - n1
>> - n2
>> - n3
>>
>> *NodeB*
>> nodetool getendpoints ks1 table1 10
>> - n1
>> - n2
>> *- n5*
>>
>> If I restart NodeB, then it shows the correct ownership {n1,n2,n3}. The
>> majority of the nodes in the ring show correct ownership {n1,n2,n3}, only a
>> few show this issue, and restarting them solves the problem.
>>
>> To me, it seems I think Cassandra's Gossip cache and StorageService cache
>> (TokenMetadata) are having some sort of cache coherence.
>>
>> Anyone has observed this behavior?
>> Any help would be highly appreciated.
>>
>> Jaydeep
>>
>


Re: Cassandra Token ownership split-brain (3.0.14)

2022-09-06 Thread Jaydeep Chovatia
If anyone has seen this issue and knows a fix, it would be a great help!
Thanks in advance.

Jaydeep

On Fri, Sep 2, 2022 at 1:56 PM Jaydeep Chovatia 
wrote:

> Hi,
>
> We are running a production Cassandra version (3.0.14) with 256 tokens
> v-node configuration. Occasionally, we see that different nodes show
> different ownership for the same key. Only a node restart corrects;
> otherwise, it continues to behave in a split-brain.
>
> Say, for example,
>
> *NodeA*
> nodetool getendpoints ks1 table1 10
> - n1
> - n2
> - n3
>
> *NodeB*
> nodetool getendpoints ks1 table1 10
> - n1
> - n2
> *- n5*
>
> If I restart NodeB, then it shows the correct ownership {n1,n2,n3}. The
> majority of the nodes in the ring show correct ownership {n1,n2,n3}, only a
> few show this issue, and restarting them solves the problem.
>
> To me, it seems I think Cassandra's Gossip cache and StorageService cache
> (TokenMetadata) are having some sort of cache coherence.
>
> Anyone has observed this behavior?
> Any help would be highly appreciated.
>
> Jaydeep
>


Cassandra Token ownership split-brain (3.0.14)

2022-09-02 Thread Jaydeep Chovatia
Hi,

We are running a production Cassandra version (3.0.14) with 256 tokens
v-node configuration. Occasionally, we see that different nodes show
different ownership for the same key. Only a node restart corrects;
otherwise, it continues to behave in a split-brain.

Say, for example,

*NodeA*
nodetool getendpoints ks1 table1 10
- n1
- n2
- n3

*NodeB*
nodetool getendpoints ks1 table1 10
- n1
- n2
*- n5*

If I restart NodeB, then it shows the correct ownership {n1,n2,n3}. The
majority of the nodes in the ring show correct ownership {n1,n2,n3}, only a
few show this issue, and restarting them solves the problem.

To me, it seems I think Cassandra's Gossip cache and StorageService cache
(TokenMetadata) are having some sort of cache coherence.

Anyone has observed this behavior?
Any help would be highly appreciated.

Jaydeep


Cassandra 3.0 - A new node is not able to join the cluster

2022-05-02 Thread Jaydeep Chovatia
Hi,

I've a production Cassandra cluster from 3.0.14 branch. Each node consists
of roughly 1.5TB data with a ring size 70+70. I need to add more capacity
to meet production demand, but when I add 71st node, then it streams data
from other nodes as expected, but after some-time it spends an enormous
amount of time doing GCs, and freezes.

Snippet from the log file...

{"@timestamp":"2022-04-23T02:21:38.030+00:00","@version":1,"message":"G1
Old Generation GC in 18288ms.  G1 Old Gen: 34206148032 -> 33725019032;
","logger_name":"o.a.c.service.GCInspector","thread_name":"Service
Thread","level":"WARN","level_value":3}


The new node has around 1.5M SSTables (from multiple tables). If I reduce
the SSTable count to below 500K, then it joins fine.
I am using LCS compaction with default settings. I've tried changing to
STCS, but no luck :(

Any help would be highly appreciated. Thanks a lot!

Jaydeep


Re: [VOTE] Accept GoCQL driver donation and begin incubation process

2018-09-12 Thread Jaydeep Chovatia
+1

On Wed, Sep 12, 2018 at 10:00 AM Roopa Tangirala
 wrote:

> +1
>
>
> *Regards,*
>
> *Roopa Tangirala*
>
> Engineering Manager CDE
>
> *(408) 438-3156 - mobile*
>
>
>
>
>
>
> On Wed, Sep 12, 2018 at 8:51 AM Sylvain Lebresne 
> wrote:
>
> > -0
> >
> > The project seems to have a hard time getting on top of reviewing his
> > backlog
> > of 'patch available' issues, so that I'm skeptical adopting more code to
> > maintain is the thing the project needs the most right now. Besides, I'm
> > also
> > generally skeptical that augmenting the scope of a project makes it
> better:
> > I feel
> > keeping this project focused on the core server is better. I see risks
> > here, but
> > the upsides haven't been made very clear for me, even for end users: yes,
> > it
> > may provide a tiny bit more clarity around which Golang driver to choose
> by
> > default, but I'm not sure users are that lost, and I think there is other
> > ways to
> > solve that if we really want.
> >
> > Anyway, I reckon I may be overly pessimistic here and it's not that
> strong
> > of
> > an objection if a large majority is on-board, so giving my opinion but
> not
> > opposing.
> >
> > --
> > Sylvain
> >
> >
> > On Wed, Sep 12, 2018 at 5:36 PM Jeremiah D Jordan <
> > jeremiah.jor...@gmail.com>
> > wrote:
> >
> > > +1
> > >
> > > But I also think getting this through incubation might take a while/be
> > > impossible given how large the contributor list looks…
> > >
> > > > On Sep 12, 2018, at 10:22 AM, Jeff Jirsa  wrote:
> > > >
> > > > +1
> > > >
> > > > (Incubation looks like it may be challenging to get acceptance from
> all
> > > existing contributors, though)
> > > >
> > > > --
> > > > Jeff Jirsa
> > > >
> > > >
> > > >> On Sep 12, 2018, at 8:12 AM, Nate McCall 
> wrote:
> > > >>
> > > >> This will be the same process used for dtest. We will need to walk
> > > >> this through the incubator per the process outlined here:
> > > >>
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__incubator.apache.org_guides_ip-5Fclearance.html=DwIFAg=adz96Xi0w1RHqtPMowiL2g=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g=g-MlYFZVJ7j5Dj_ZfPfa0Ik8Nxco7QsJhTG1TnJH7xI=rk5T_t1HZY6PAhN5XgflBhfEtNrcZkVTIvQxixDlw9o=
> > > >>
> > > >> Pending the outcome of this vote, we will create the JIRA issues for
> > > >> tracking and after we go through the process, and discuss adding
> > > >> committers in a separate thread (we need to do this atomically
> anyway
> > > >> per general ASF committer adding processes).
> > > >>
> > > >> Thanks,
> > > >> -Nate
> > > >>
> > > >>
> -
> > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >>
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
>


Re: [VOTE] Branching Change for 4.0 Freeze

2018-07-13 Thread Jaydeep Chovatia
+1

On Wed, Jul 11, 2018 at 2:46 PM sankalp kohli 
wrote:

> Hi,
> As discussed in the thread[1], we are proposing that we will not branch
> on 1st September but will only allow following merges into trunk.
>
> a. Bug and Perf fixes to 4.0.
> b. Critical bugs in any version of C*.
> c. Testing changes to help test 4.0
>
> If someone has a change which does not fall under these three, we can
> always discuss it and have an exception.
>
> Vote will be open for 72 hours.
>
> Thanks,
> Sankalp
>
> [1]
>
> https://lists.apache.org/thread.html/494c3ced9e83ceeb53fa127e44eec6e2588a01b769896b25867fd59f@%3Cdev.cassandra.apache.org%3E
>


Re: Real time bad query logging framework in C*

2018-06-20 Thread Jaydeep Chovatia
Thanks Stefan for reviewing this, please find my comments inline:


>We already provide tons of metrics and provide some useful logging (e.g.
when reading too many tombstones), but I think we should still be able to
implement further >checks in-code that highlight potentially issues. Maybe
we could >really use a framework for that, I don't know.


I agree, Cassandra already has details coming out as part of metrics,
logging (like tombstones), etc.

Current log messages for (tombstone messages, large partition message, slow
query messages, etc.) are very useful, but one important aspect missing
here is, all of these are trying to solve same problem but they are
implemented on their own (at different times) and as a result it has
duplicate code and lacks important things like changing threshold w/o
restart, commonality among log messages, have different interface so that
users can consume differently, etc. If we look at this new effort then it
is just making them common so we have a common way of doing the things in
Cassandra with more features like change threshold at runtime, commonality
in log messages, user can consume differently, etc.


>If you followed the discussions a while ago, we also talked about moving some
of the code out of Cassandra into side-car processes. Although this will
likely not >manifest for 4.0, most of the devs seem to be fond of the idea
and so am I.


I agree that side-car is very useful project but in my opinion it will be
difficult to get internal details out in realtime without modifying
Cassandra.


>Not wanting to derail this discussion (about your proposed solution), but
let me just briefly mention that I've been working on some related approach
(diagnostic events, >CASSANDRA-12944), which would allow to expose internal
events to external processes that would be able to analyze these events,
alert users, or event act on them. >It's a different approach from what
you're suggesting, but just wanted to mention this and maybe you'd agree
that having external processes for monitoring Cassandra >has some
advantages.


Thanks for sharing this, this is really useful feature and will make
operational aspect even more easy.

If we look at my proposed then it is just picking low hanging fruit, in
other words it is just rearchitecting existing logs messages like
(tombstone messages, large partition message, slow query messages, etc.)
and adding few more in generic way with more features like (one can
threshold at runtime, commonality in log messages, user can consume
differently, etc.). Idea here is we make it a framework to report these
type of messages so that all the messages (existing + new ones) will have
similarity among them.



On Wed, Jun 20, 2018 at 1:35 AM Stefan Podkowinski  wrote:

> Jaydeep, thanks for taking this discussion to the dev list. I think it's
> the best place to introduce new idea, discuss them in general and how
> they potentially fit in. As already mention in the ticket, I do share
> your assessment that we should try to improve making operational issue
> more visible to users. We already provide tons of metrics and provide
> some useful logging (e.g. when reading too many tombstones), but I think
> we should still be able to implement further checks in-code that
> highlight potentially issues. Maybe we could really use a framework for
> that, I don't know.
>
> If you followed the discussions a while ago, we also talked about moving
> some of the code out of Cassandra into side-car processes. Although this
> will likely not manifest for 4.0, most of the devs seem to be fond of
> the idea and so am I. Not wanting to derail this discussion (about your
> proposed solution), but let me just briefly mention that I've been
> working on some related approach (diagnostic events, CASSANDRA-12944),
> which would allow to expose internal events to external processes that
> would be able to analyze these events, alert users, or event act on
> them. It's a different approach from what you're suggesting, but just
> wanted to mention this and maybe you'd agree that having external
> processes for monitoring Cassandra has some advantages.
>
>
>
> On 20.06.2018 06:33, Jaydeep Chovatia wrote:
> > Hi,
> >
> > We have worked on developing some common framework to detect/log
> > anti-patterns/bad queries in Cassandra. Target for this effort would be
> > to reduce burden on ops to handle Cassandra at large scale, as well as
> > help beginners to quickly identify performance problems with the
> Cassandra.
> > Initially we wanted to try out to make sure it really works and provides
> > value. we've opened JIRA with all the details. Would you please review
> and
> > provide your feedback on this effort?
> > https://issues.apache.org/jira/browse/CASSANDRA-14527
> >
> >
> > Thank You!!!
> >
>

Real time bad query logging framework in C*

2018-06-19 Thread Jaydeep Chovatia
Hi,

We have worked on developing some common framework to detect/log
anti-patterns/bad queries in Cassandra. Target for this effort would be
to reduce burden on ops to handle Cassandra at large scale, as well as
help beginners to quickly identify performance problems with the Cassandra.
Initially we wanted to try out to make sure it really works and provides
value. we've opened JIRA with all the details. Would you please review and
provide your feedback on this effort?
https://issues.apache.org/jira/browse/CASSANDRA-14527


Thank You!!!


Jaydeep


Re: Proposing an Apache Cassandra Management process

2018-04-12 Thread Jaydeep Chovatia
In my opinion this will be a great addition to the Cassandra and will take
overall Cassandra project to next level. This will also improve user
experience especially for new users.

Jaydeep

On Thu, Apr 12, 2018 at 2:42 PM Dinesh Joshi 
wrote:

> Hey all -
> With the uptick in discussion around Cassandra operability and after
> discussing potential solutions with various members of the community, we
> would like to propose the addition of a management process/sub-project into
> Apache Cassandra. The process would be responsible for common operational
> tasks like bulk execution of nodetool commands, backup/restore, and health
> checks, among others. We feel we have a proposal that will garner some
> discussion and debate but is likely to reach consensus.
> While the community, in large part, agrees that these features should
> exist “in the database”, there is debate on how they should be implemented.
> Primarily, whether or not to use an external process or build on
> CassandraDaemon. This is an important architectural decision but we feel
> the most critical aspect is not where the code runs but that the operator
> still interacts with the notion of a single database. Multi-process
> databases are as old as Postgres and continue to be common in newer systems
> like Druid. As such, we propose a separate management process for the
> following reasons:
>
>- Resource isolation & Safety: Features in the management process will
> not affect C*'s read/write path which is critical for stability. An
> isolated process has several technical advantages including preventing use
> of unnecessary dependencies in CassandraDaemon, separation of JVM resources
> like thread pools and heap, and preventing bugs from adversely affecting
> the main process. In particular, GC tuning can be done separately for the
> two processes, hopefully helping to improve, or at least not adversely
> affect, tail latencies of the main process.
>
>- Health Checks & Recovery: Currently users implement health checks in
> their own sidecar process. Implementing them in the serving process does
> not make sense because if the JVM running the CassandraDaemon goes south,
> the healthchecks and potentially any recovery code may not be able to run.
> Having a management process running in isolation opens up the possibility
> to not only report the health of the C* process such as long GC pauses or
> stuck JVM but also to recover from it. Having a list of basic health checks
> that are tested with every C* release and officially supported will help
> boost confidence in C* quality and make it easier to operate.
>
>- Reduced Risk: By having a separate Daemon we open the possibility to
> contribute features that otherwise would not have been considered before
> eg. a UI. A library that started many background threads and is operated
> completely differently would likely be considered too risky for
> CassandraDaemon but is a good candidate for the management process.
>
>
> What can go into the management process?
>- Features that are non-essential for serving reads & writes for eg.
> Backup/Restore or Running Health Checks against the CassandraDaemon, etc.
>
>- Features that do not make the management process critical for
> functioning of the serving process. In other words, if someone does not
> wish to use this management process, they are free to disable it.
>
> We would like to initially build minimal set of features such as health
> checks and bulk commands into the first iteration of the management
> process. We would use the same software stack that is used to build the
> current CassandraDaemon binary. This would be critical for sharing code
> between CassandraDaemon & management processes. The code should live
> in-tree to make this easy.
> With regards to more in-depth features like repair scheduling and
> discussions around compaction in or out of CassandraDaemon, while the
> management process may be a suitable host, it is not our goal to decide
> that at this time. The management process could be used in these cases, as
> they meet the criteria above, but other technical/architectural reasons may
> exists for why it should not be.
> We are looking forward to your comments on our proposal,
> Dinesh Joshi and Jordan West


Re: Flakey Dtests

2017-11-27 Thread Jaydeep Chovatia
This is useful info, Thanks!

Jaydeep

On Mon, Nov 27, 2017 at 2:43 PM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> Complicated question unfortunately — and something we’re actively working
> on improving:
>
> Cassci is no longer being offered/run by Datastax and so we've need to
> come up with a new solution, and what that ultimately is is still a WIP —
> it’s loss was very huge obviously and a testament to the awesome resource
> and effort that was put into providing it to the community for all those
> years.
>
>  - Short Term/Current: Tests (both dtests and unit tests) are being run
> via the ASF Jenkins (https://builds.apache.org) - but that solution isn’t
> hugely helpful as it’s resource constrained.
>  - Short-Medium Term: we hope to get a fully baked CircleCI solution to
> get reliable fast test runs.
>  - Long Term: Actively being discussed but I’m optimistic that we can get
> something awesome for the project with some stable combination of CircleCI
> + ASF Jenkins, and once we do I’m sure this will change any long term plans.
>
> For Unit Tests (a.k.a the Java ones in tree - https://github.com/apache/
> cassandra/tree/trunk/test/unit/org/apache/cassandra):
> Take a look at https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-trunk-test/… looks like the last successful job to finish was
> #389. (https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-trunk-test/389/testReport/). There are currently a total of 6
> tests  (all from CompressedInputStreamTest) failing on trunk via ASF
> Jenkins. These specific test failures are environmental. The only *unit*
> test on trunk that I currently know to be flaky is
> org.apache.cassandra.cql3.ViewTest. testRegularColumnTimestampUpdates
> (tracked as https://issues.apache.org/jira/browse/CASSANDRA-14054)
>
> For Distributed Tests (DTests) (a.k.a the Python ones -
> https://github.com/apache/cassandra-dtest):
> The situation is a great deal more complicated due to the length of time
> and number of resources executing all of the dtests take (and executing the
> tests across the various configurations)...
>
> There are 4 dtest jobs on ASF Jenkins for trunk:
> https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-trunk-dtest/
> https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-trunk-dtest-large/
> https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-trunk-dtest-novnode/
> https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-trunk-dtest-offheap/
>
> It looks like you’ll need to go back to run #353 (
> https://builds.apache.org/view/A-D/view/Cassandra/job/
> Cassandra-trunk-dtest/353/testReport/) to see the test results as the
> last 2 jobs that were triggered failed to execute. Depending on the
> environment variables set tests are executed or skipped — so you’ll see
> different tests being run on the no-vnode job/off-heap job/regular dtest
> job (or some tests might be run multiple times)
>
>
> More recently we’ve been woking on getting CircleCI running. Some sample
> runs from my personal fork can be seen at https://circleci.com/gh/
> mkjellman/cassandra/tree/trunk_circle. I’m personally using a paid
> account to get more CircleCI resources (with 100 containers we can actually
> build the project, run all of the unit tests, and run all of the dtests in
> roughly 28 minutes!). I’m actively working to determine out exactly can
> (and cannot) be executed reliably, routinely, and easily by anyone with
> just a simple free CircleCI account.
>
> I’m also working on getting scheduled CircleCI daily runs setup against
> trunk/3.0 — more on both of those when we’ve got that story fully baked..
> Hope this answers your question! There are quite a few dtests currently
> failing and as Jeff mentioned I’ve created JIRAs for a lot of them already
> so any help (no matter how trivial or annoying it might be or seem) to get
> everything green again.
>
> best,
> kjellman
>
>
> On Nov 27, 2017, at 1:54 PM, Jaydeep Chovatia <chovatia.jayd...@gmail.com<
> mailto:chovatia.jayd...@gmail.com>> wrote:
>
> Is there a way to check which tests are failing in trunk currently?
> Previously this URL <http://cassci.datastax.com/> was giving such results
> but is no longer working.
>
> Jaydeep
>
> On Wed, Nov 15, 2017 at 5:44 PM, Jeff Jirsa <jji...@gmail.com<mailto:jjirs
> a...@gmail.com>> wrote:
>
> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
>
> If you haven't been paying attention to JIRA, you likely didn't notice that
> Josh went through and triage/categorized a bunch of issues by adding
> components, and Michael took the time to open a bunch of JIRAs for 

Re: Flakey Dtests

2017-11-27 Thread Jaydeep Chovatia
Is there a way to check which tests are failing in trunk currently?
Previously this URL  was giving such results
but is no longer working.

Jaydeep

On Wed, Nov 15, 2017 at 5:44 PM, Jeff Jirsa  wrote:

> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
>
> If you haven't been paying attention to JIRA, you likely didn't notice that
> Josh went through and triage/categorized a bunch of issues by adding
> components, and Michael took the time to open a bunch of JIRAs for failing
> tests.
>
> How many is a bunch? Something like 35 or so just for tests currently
> failing on trunk.  If you're a regular contributor, you already know that
> dtests are flakey - it'd be great if a few of us can go through and fix a
> few. Even incremental improvements are improvements. Here's an easy search
> to find them:
>
> https://issues.apache.org/jira/secure/IssueNavigator.
> jspa?reset=true=project+%3D+CASSANDRA+AND+
> component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+
> DESC%2C+created+ASC=hide
>
> If you're a new contributor, fixing tests is often a good way to learn a
> new part of the codebase. Many of these are dtests, which live in a
> different repo ( https://github.com/apache/cassandra-dtest ) and are in
> python, but have no fear, the repo has instructions for setting up and
> running dtests(
> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
>
> Normal contribution workflow applies: self-assign the ticket if you want to
> work on it, click on 'start progress' to indicate that you're working on
> it, mark it 'patch available' when you've uploaded code to be reviewed (in
> a github branch, or as a standalone patch file attached to the JIRA). If
> you have questions, feel free to email the dev list (that's what it's here
> for).
>
> Many thanks will be given,
> - Jeff
>