Re: [Discuss] Repair inside C*

2024-11-01 Thread Jaydeep Chovatia
FYI..I've updated the CEP-37 content to include all the improvements we
have discussed over this ML, ongoing work, and future work. I've also
tagged JIRAs for each ongoing/future work and assigned to some individuals
with a tentative timeline.
Thanks, Mick, for suggesting capturing the ML discussion in the CEP-37!

Jaydeep


Re: [Discuss] Repair inside C*

2024-10-30 Thread Jaydeep Chovatia
Thanks for the kind words, Mick!

Jaydeep

On Wed, Oct 30, 2024 at 1:35 AM Mick Semb Wever  wrote:

> Thanks Jaydeep.  I've exhausted my lines on enquiry and am happy that
> thought has gone into them.
>
>
>
>> On top of the above list, here is my recommendation (this is just a pure
>> thought only and subject to change depending on how all the community
>> members see it):
>>
>>- Nov-Dec: We can definitely prioritize the table-level priority
>>feature, which would address many concerns - Jaydeep (I can take the lead
>>for a small discussion followed by implementation)
>>- Nov-Feb: For table-level tracking, we can divide it into two parts:
>>   - (Part-1) Nov-Dec: A video meeting discussion among a few of us
>>   and see how we want to design, etc. -  Jaydeep
>>   - (Part-2) Dec-Feb: Based on the above design, implement
>>   accordingly - *TODO*
>>
>>
>
> It's the discussion that's important to me, that everyone involved has
> taken onboard the input and has a PoV on it.
>
>


Re: [Discuss] Repair inside C*

2024-10-30 Thread Mick Semb Wever
Thanks Jaydeep.  I've exhausted my lines on enquiry and am happy that
thought has gone into them.



> On top of the above list, here is my recommendation (this is just a pure
> thought only and subject to change depending on how all the community
> members see it):
>
>- Nov-Dec: We can definitely prioritize the table-level priority
>feature, which would address many concerns - Jaydeep (I can take the lead
>for a small discussion followed by implementation)
>- Nov-Feb: For table-level tracking, we can divide it into two parts:
>   - (Part-1) Nov-Dec: A video meeting discussion among a few of us
>   and see how we want to design, etc. -  Jaydeep
>   - (Part-2) Dec-Feb: Based on the above design, implement
>   accordingly - *TODO*
>
>

It's the discussion that's important to me, that everyone involved has
taken onboard the input and has a PoV on it.


Re: [Discuss] Repair inside C*

2024-10-29 Thread Jaydeep Chovatia
>Repairs causing a node to OOM is not unusual.   I've been working with a
customer in this situation the past few weeks.  Getting fixes out, or
mitigating the problem, is not always as quick as one hopes (see my
previous comment about how the repair_session_size setting gets easily
clobbered today).  This situation would be much improved with table
priority and tracking is added to the system_distributed table(s).
I agree we would need to tackle this OOM / JVM crashing scenario
eventually, but on the other hand, adding table-level tracking looks easy
but to perfect it, it would take some effort, say we would have to handle
all corner-case scenarios, such as cleaning the state metadata, what would
happen if there is a race condition that table was dropped but metadata
could not. Architecture extension is simple, but making it bug-free and
robust is a bit complex.

>Does this emergency list imply then not doing --partitioner-range ?
The emergency list is to prioritize a few nodes over others, but those
nodes will continue to honor the same repair configuration that has been
provided. The default configuration is to repair primary token ranges only.

>For per-table custom-priorities and tracking it sounds like adding a
clustering column.  So the number of records would go from ~number of nodes
in the cluster, to ~number of nodes multiplied by up to the number of
tables in the cluster.  We do see clusters too often with up to a thousand
tables, despite strong recommendations not to go over two hundred.  Do you
see here any concern ?
My initial thoughts are to add this as a CQL table property, something like
"repair_pririty=0.0", with all tables having the same priority. But the
user can change the priority through ALTER, say, "ALTER TABLE T1 WITH
repair_pririty=0.1", then T1 will be prioritized over other tables. Again,
I need to give more thought to it and need to do a small discussion either
in a bi-weekly meeting or on a ticket to ensure all folks are on the same
page. If we go with this approach, we do not need to add any additional
columns to the repair metadata tables, so that way the design continues to
remain lightweight, etc.
For a moment, let's just assume we add a new clustering column to track
tables. After that, the number of rows will be =  *
, which is still not an issue. As I mentioned above, the
bigger problem for table-tracking is not the architecture extension, but
perfecting with all race conditions is a bit complex.

>Also, in what versions will we be able to introduce such improvements ? We
will be waiting until the next major release ?  Playing around with the
schema of system tables in release branches is not much fun.
There are a few items on the priority beyond the CEP-37 MVP scope that some
of us are working on:
1. Extend disk-capacity check for full repair - Jaydeep
2. Making the incremental repair more reliable by having an unrepaired
size-based token splitter - Andy T, Chris L
3. Add support for the Preview Repair - Kristijonas
4. Start a new ML discussion gauge consensus on whether repairs should be
backward/forwards compatible between major versions in the future - Andy T

On top of the above list, here is my recommendation (this is just a pure
thought only and subject to change depending on how all the community
members see it):

   - Nov-Dec: We can definitely prioritize the table-level priority
   feature, which would address many concerns - Jaydeep (I can take the lead
   for a small discussion followed by implementation)
   - Nov-Feb: For table-level tracking, we can divide it into two parts:
  - (Part-1) Nov-Dec: A video meeting discussion among a few of us and
  see how we want to design, etc. -  Jaydeep
  - (Part-2) Dec-Feb: Based on the above design, implement accordingly
  - *TODO*


Jaydeep


On Tue, Oct 29, 2024 at 12:06 PM Mick Semb Wever  wrote:

>
> Jaydeep,
>   your replies address my main concerns, there's a few questions of
> curiosity as replies inline below…
>
>
>
>
>
>> >Without any per-table scheduling and history (IIUC)  a node would have
>> to restart the repairs for all keyspaces and tables.
>>
>> The above-mentioned quote should work fine and will make sure the bad
>> tables/keyspaces are skipped, allowing the good keyspaces/tables to proceed
>> on a node as long as the Cassandra JVM itself keeps crashing. If a JVM
>> keeps crashing, then it will restart all over again, but then fixing the
>> JVM crashing might be a more significant issue and does not happen
>> regularly, IMO.
>>
>
>
> Repairs causing a node to OOM is not unusual.   I've been working with a
> customer in this situation the past few weeks.  Getting fixes out, or
> mitigating the problem, is not always as quick as one hopes (see my
> previous comment about how the repair_session_size setting gets easily
> clobbered today).  This situation would be much improved with table
> priority and tracking is added to the system_distributed table(s).
>
>
>
>> If an admin sets some no

Re: [Discuss] Repair inside C*

2024-10-29 Thread Mick Semb Wever
Jaydeep,
  your replies address my main concerns, there's a few questions of
curiosity as replies inline below…





> >Without any per-table scheduling and history (IIUC)  a node would have
> to restart the repairs for all keyspaces and tables.
>
> The above-mentioned quote should work fine and will make sure the bad
> tables/keyspaces are skipped, allowing the good keyspaces/tables to proceed
> on a node as long as the Cassandra JVM itself keeps crashing. If a JVM
> keeps crashing, then it will restart all over again, but then fixing the
> JVM crashing might be a more significant issue and does not happen
> regularly, IMO.
>


Repairs causing a node to OOM is not unusual.   I've been working with a
customer in this situation the past few weeks.  Getting fixes out, or
mitigating the problem, is not always as quick as one hopes (see my
previous comment about how the repair_session_size setting gets easily
clobbered today).  This situation would be much improved with table
priority and tracking is added to the system_distributed table(s).



> If an admin sets some nodes on a priority queue, those nodes will be
> repaired over the scheduler's own list. If an admin tags some nodes on the
> emergency list, then those nodes will repair immediately. Basically, an
> admin tells the scheduler, "*Just do what I say instead of using your
> list of nodes*".
>


Does this emergency list imply then not doing --partitioner-range ?


>I am also curious as to how the impact of these tables changes as we
> address (1) and (2).
>
> Quite a lot of (1) & (2) can be addressed by just adding a new CQL
> property, which won't even touch these metadata tables. In case we need to,
> depending on the design for (1) & (2), it can be either addressed by adding
> new columns and/or adding a new metadata table.
>

For per-table custom-priorities and tracking it sounds like adding a
clustering column.  So the number of records would go from ~number of nodes
in the cluster, to ~number of nodes multiplied by up to the number of
tables in the cluster.  We do see clusters too often with up to a thousand
tables, despite strong recommendations not to go over two hundred.  Do you
see here any concern ?

Also, in what versions will we be able to introduce such improvements ? We
will be waiting until the next major release ?  Playing around with the
schema of system tables in release branches is not much fun.


Re: [Discuss] Repair inside C*

2024-10-29 Thread Jaydeep Chovatia
>Since the auto repair is running from within Cassandra, we might have more
control over this and implement a proper cleanup of such snapshots.
Rightly said, Alexander. Having internal knowledge of Cassandra, we can do
a lot more. For example, for better Incremental Reliability reliability,
Andy T and Chris L have developed a new token-split algorithm on top of the
MVP based on unrepaired data in SSTables (soon it will be added to the MVP
as they are working on writing test cases, etc.), and that
requires internal SSTable data-structure access, etc.

Jaydeep

On Mon, Oct 28, 2024 at 10:51 PM Jaydeep Chovatia <
[email protected]> wrote:

>
>> > That's inaccurate, we can check the replica set for the subrange we're
> about to run and see if it overlaps with the replica set of other ranges
> which are being processed already.
> We can definitely check the replicas for the subrange we plan to run and
> see if they overlap with the ongoing one. I am saying that for a smaller
> cluster if we want to repair multiple token ranges in parallel, it is tough
> to guarantee that replica sets won't overlap.
>
> >Jira to auto-delete snapshots at X% disk full ?
> Sure, just created a new JIRA
> https://issues.apache.org/jira/browse/CASSANDRA-20035
>
> Jaydeep
>


Re: [Discuss] Repair inside C*

2024-10-28 Thread Jaydeep Chovatia
>
>
> > That's inaccurate, we can check the replica set for the subrange we're
about to run and see if it overlaps with the replica set of other ranges
which are being processed already.
We can definitely check the replicas for the subrange we plan to run and
see if they overlap with the ongoing one. I am saying that for a smaller
cluster if we want to repair multiple token ranges in parallel, it is tough
to guarantee that replica sets won't overlap.

>Jira to auto-delete snapshots at X% disk full ?
Sure, just created a new JIRA
https://issues.apache.org/jira/browse/CASSANDRA-20035

Jaydeep


Re: [Discuss] Repair inside C*

2024-10-28 Thread Jeff Jirsa


> On Oct 28, 2024, at 9:52 PM, Alexander Dejanovski 
>  wrote:
> 
> 
> 
>> If a repair session finishes gracefully, then this timeout is not 
>> applicable. Anyway, I do not have any strong opinion on the value. I am open 
>> to lowering it to 1h or something.
> True, it will only delay killing hanging repairs.
> One thing that we cannot solve in Reaper at the moment is that sequential and 
> dc aware repair sessions that get terminated due to the timeout leave 
> ephemeral snapshots behind. Since they're only reclaimed on restart, having a 
> lot of timeouts can end up filling the the disks if the snapshots get 
> materialized.
> Since the auto repair is running from within Cassandra, we might have more 
> control over this and implement a proper cleanup of such snapshots.

Jira to auto-delete snapshots at X% disk full ? 



Re: [Discuss] Repair inside C*

2024-10-28 Thread Alexander Dejanovski
>
> The scheduler repairs, by default, the primary ranges for all the nodes
> going through the repair. Since it uses the primary ranges, all the nodes
> repairing parallelly would not overlap in any form for the primary ranges.
> However, the replica set for the nodes going through repair may or may not
> overlap, but it totally depends on the cluster size and parallelism used.
> If a cluster is small, there is a possibility, but if it is large, the
> possibility reduces. Even if we go with a range-centric approach and if we
> repair N token ranges in parallel, there is no guarantee that their replica
> sets won't overlap for smaller clusters.

That's inaccurate, we can check the replica set for the subrange we're
about to run and see if it overlaps with the replica set of other ranges
which are being processed already.


The only solution is to reduce the repair parallelism to one. node at a
> time.

Yes, I agree.

This is supported with the MVP, we can set "min_repair_interval: 7d"  (the
> default is 24h) and the nodes will repair only once every 7 days.
>
The MVP implementation allows running full and incremental repairs (and
> Preview repair code changes are done and it is coming soon) independently
> and in parallel. One can set the above config for each repair type with
> their preferred schedule.

Nice, sorry I missed these in the CEP doc.


>  I have already created a ticket to add this as an enhancement
> https://issues.apache.org/jira/browse/CASSANDRA-20013

Thanks,  table level repair priority could be a very interesting
improvement, that's something Reaper lacks as well at the moment.

If a repair session finishes gracefully, then this timeout is not
> applicable. Anyway, I do not have any strong opinion on the value. I am
> open to lowering it to *1h* or something.

True, it will only delay killing hanging repairs.
One thing that we cannot solve in Reaper at the moment is that sequential
and dc aware repair sessions that get terminated due to the timeout leave
ephemeral snapshots behind. Since they're only reclaimed on restart, having
a lot of timeouts can end up filling the the disks if the snapshots get
materialized.
Since the auto repair is running from within Cassandra, we might have more
control over this and implement a proper cleanup of such snapshots.


Alexander Dejanovski

Astra Managed Clusters / Mission Control

w. www.datastax.com

 


On Mon, Oct 28, 2024 at 7:01 PM Jaydeep Chovatia 
wrote:

> Thanks a lot, Alexander, for the review! Please find my response below:
>
> >  making these replicas process 3 concurrent repairs while others could
> be left uninvolved in any repair at all...Taking a range centric approach
> (we're not repairing nodes, we're repairing the token ranges) allows to
> spread the load evenly without overlap in the replica sets.
> The scheduler repairs, by default, the primary ranges for all the nodes
> going through the repair. Since it uses the primary ranges, all the nodes
> repairing parallelly would not overlap in any form for the primary ranges.
> However, the replica set for the nodes going through repair may or may not
> overlap, but it totally depends on the cluster size and parallelism used.
> If a cluster is small, there is a possibility, but if it is large, the
> possibility reduces. Even if we go with a range-centric approach and if we
> repair N token ranges in parallel, there is no guarantee that their replica
> sets won't overlap for smaller clusters.
>
> > I'm more worried even with incremental repair here, because you might
> end up with some conflicts around sstables which would be in the pending
> repair pool but would be needed by a competing repair job.
> This can happen regardless of whether we go by "node-centric" vs.
> "range-centric" if we run multiple parallel repair sessions. The reason is
> that SSTables for all the nodes going through repair may not be physically
> isolated 1:1 as per the token ranges being repaired. We just had a detailed
> discussion about the SSTable overlap for incremental repair (IR) last week
> in Slack (#cassandra-repair-scheduling-cep37), and the general consensus
> was that there is no better way to address it than just to retry a few
> times. The only solution is to reduce the repair parallelism to one. node
> at a time.
> The ideal and reliable way to repair IR is to calculate the token ranges
> based on the unrepaired data size and also apply the upper cap on the data
> size being repaired. The good news is that Andy Tolbert already extended
> the CEP-37 MVP for this, and he is working on making it perfect by adding
> necessary tests, etc., so it can be landed on top of this MVP. tl;dr Andy T
> and Chris L are already on top of this and soon it will be available on top
> of CEP-37 MVP.
>
> >I don't know if in the latest versions such sstables would be totally
> ignored or if the competing repair job would fail.
> The competing IR session would be 

Re: [Discuss] Repair inside C*

2024-10-28 Thread Jaydeep Chovatia
Thanks, Mick, for the comment, please find my response below.

>(1)

I think I covered most of the points in my response to Alexander (except
one, which I am responding to below separately). Tl;dr is the MVP that can
be easily extended to do a table-level schedule; it is just going to be
another CQL table property as opposed to a yaml config (currently in MVP).
I had already added this as a near-term feature here and added that when we
add repair priority table-wise, we need to ensure the table-level
scheduling is also taken care of. Please visit my latest few comments to
the ticket https://issues.apache.org/jira/browse/CASSANDRA-20013

>You may also want to do repairs in different DCs differently.

Currently, the MVP allows one to skip one or more DCs if they wish to do so
by defaulting all DCs. This again points to the similar theme of allowing
schedule (or priority) at a table level followed by a DC level. The MVP can
be easily extended at whatever granularity we want scheduling to be without
many architectural changes. We all just have to finalize the granularity we
want. I've also added to the ticket above that scheduling support at a
table-level followed by DC-level granularity.

>I'm curious as to how crashed repairs are handled and resumed

The MVP has a max allowed quota at a keyspace level and at a table level.
So, if a repair and/or keyspace takes much longer than the timeout due to
failures/more data it needs to repair, etc., then it will skip to the next
table/keyspace.

>Without any per-table scheduling and history (IIUC)  a node would have to
restart the repairs for all keyspaces and tables.

The above-mentioned quote should work fine and will make sure the bad
tables/keyspaces are skipped, allowing the good keyspaces/tables to proceed
on a node as long as the Cassandra JVM itself keeps crashing. If a JVM
keeps crashing, then it will restart all over again, but then fixing the
JVM crashing might be a more significant issue and does not happen
regularly, IMO.

>And without such per-table tracking, I'm also kinda curious as to how we
interact with manual repair invocations the user makes.  There are
operational requirements to do manual repairs, e.g. node replacement or if
a node has been down for too long, and consistency breakages until such
repair is complete.  Leaving such operational requirements to this CEP's
in-built scheduler is a limited approach, it may be many days before it
gets to doing it, and even with node priority will it appropriately switch
from primary-range to all-replica-ranges?

To alleviate some of this, the MVP has two options one can configure
dynamically through *nodetool*: 1) Setting priority for nodes, 2) Telling
the scheduler to repair one or more nodes immediately
If an admin sets some nodes on a priority queue, those nodes will be
repaired over the scheduler's own list. If an admin tags some nodes on the
emergency list, then those nodes will repair immediately. Basically, an
admin tells the scheduler, "*Just do what I say instead of using your list
of nodes*".
Even with this, if an admin decides to trigger repair manually directly
through *nodetool repair*, then the scheduler should not interfere with
that manually triggered operation - they can progress independently. The
MVP has options to disable the scheduler's repair dynamically without any
cluster restart, etc., so the admin can use some of the combinations and
decide what to do when they invoke any manual repair operation.

>What if the user accidentally invokes an incremental repair when the
in-built scheduler is expecting only to ever perform full repairs? Does it
know how to detect/remedy that?

The user invocation and the scheduler invocations go through two different
Repair sessions. If the MVP scheduler has been configured only to perform
FR, then the scheduler will never fire IR, but it does not prohibit the
user from firing IR through *nodetool repair*. As an enhancement to the
MVP, in the future, we must warn the user that it might not be safe to run
IR as the in-built scheduler has been configured not to do IR, etc., so be
careful, etc.

>Having read the design doc and PR, I am impressed how lightweight the
design of the tables are.

Thanks. To reiterate, the number of records in the system_distributed will
be equivalent to the number of nodes in the Cluster.

>But I do still think we deserve some numbers, and a further line of
questioning:  what consistency guarantees do we need, how does this work
cross-dc, during topology changes, does an event that introduces
data-at-rest inconsistencies in the cluster then become
confused/inefficient when the mechanism to repair it also now has its
metadata inconsistent.  For the most part this is a problem not unique to
any table in system_distributed and otherwise handled, but how does the
system_distributed keyspace handling of such failures impact repairs.

Keeping practicality in mind, the record count to the table should be as
small as three rows and 

Re: [Discuss] Repair inside C*

2024-10-28 Thread Jaydeep Chovatia
Thanks a lot, Alexander, for the review! Please find my response below:

>  making these replicas process 3 concurrent repairs while others could be
left uninvolved in any repair at all...Taking a range centric approach
(we're not repairing nodes, we're repairing the token ranges) allows to
spread the load evenly without overlap in the replica sets.
The scheduler repairs, by default, the primary ranges for all the nodes
going through the repair. Since it uses the primary ranges, all the nodes
repairing parallelly would not overlap in any form for the primary ranges.
However, the replica set for the nodes going through repair may or may not
overlap, but it totally depends on the cluster size and parallelism used.
If a cluster is small, there is a possibility, but if it is large, the
possibility reduces. Even if we go with a range-centric approach and if we
repair N token ranges in parallel, there is no guarantee that their replica
sets won't overlap for smaller clusters.

> I'm more worried even with incremental repair here, because you might end
up with some conflicts around sstables which would be in the pending repair
pool but would be needed by a competing repair job.
This can happen regardless of whether we go by "node-centric" vs.
"range-centric" if we run multiple parallel repair sessions. The reason is
that SSTables for all the nodes going through repair may not be physically
isolated 1:1 as per the token ranges being repaired. We just had a detailed
discussion about the SSTable overlap for incremental repair (IR) last week
in Slack (#cassandra-repair-scheduling-cep37), and the general consensus
was that there is no better way to address it than just to retry a few
times. The only solution is to reduce the repair parallelism to one. node
at a time.
The ideal and reliable way to repair IR is to calculate the token ranges
based on the unrepaired data size and also apply the upper cap on the data
size being repaired. The good news is that Andy Tolbert already extended
the CEP-37 MVP for this, and he is working on making it perfect by adding
necessary tests, etc., so it can be landed on top of this MVP. tl;dr Andy T
and Chris L are already on top of this and soon it will be available on top
of CEP-37 MVP.

>I don't know if in the latest versions such sstables would be totally
ignored or if the competing repair job would fail.
The competing IR session would be aborted, and the scheduler would retry a
few times.

>Continuous repair might create a lot of overhead for full repairs which
often don't require more than 1 run per week.
This is supported with the MVP, we can set "min_repair_interval: 7d"  (the
default is 24h) and the nodes will repair only once every 7 days.

>It also will not allow running a mix of scheduled full/incremental repairs
The MVP implementation allows running full and incremental repairs (and
Preview repair code changes are done and it is coming soon) independently
and in parallel. One can set the above config for each repair type with
their preferred schedule.

>Here, nodes will be processed sequentially and each node will process the
keyspaces sequentially, tying the repair cycle of all keyspaces together.
The keyspaces and tables on each node will be randomly shuffled to avoid
multiple nodes working on the same table/keyspaces.

>There are many cases where one might have differentiated gc_grace_seconds
settings to optimize reclaiming tombstones when applicable. That requires
having some fine control over the repair cycle for a given keyspace/set of
tables.
As I mentioned, there is already a way to schedule a frequency of repair
cycle, but the frequency is currently a global config on a node; hence
applicable to all the tables on a node. However, the MVP design is flexible
enough to be easily extended to add the schedule as a new CQL table-level
property, which will then honor the table-level schedule as opposed to a
global schedule. There was another suggestion from @masokol (from
Ecchronos) to maybe assign a repair priority on a table level to prioritize
one table over the other, and that can also solve this problem, which is
also feasible on top of the MVP. I have already created a ticket to add
this as an enhancement https://issues.apache.org/jira/browse/CASSANDRA-20013

>I think the 3 hours timeout might be quite large and probably means a lot
of data is being repaired for each split. That usually involves some level
of overstreaming
This timeout applies to unstuck stuck repair sessions due to some bug in
the repair code path. e.g.
https://issues.apache.org/jira/browse/CASSANDRA-14674
If a repair session finishes gracefully, then this timeout is not
applicable. Anyway, I do not have any strong opinion on the value. I am
open to lowering it to *1h* or something.

Jaydeep

On Mon, Oct 28, 2024 at 4:45 AM Alexander DEJANOVSKI 
wrote:

> Hi Jaydeep,
>
> I've taken a look at the proposed design and have a few comments/questions.
> As one of the maintainers of Reaper, I'm looking th

Re: [Discuss] Repair inside C*

2024-10-28 Thread Mick Semb Wever
any name works for me, Jaydeep :-)

I've taken a run through of the CEP, design doc, and current PR.  Below are
my four (rough categories of) questions.
I am keen to see a MVP land, so I'm more looking at what the CEP's design
might not be able to do, rather than what may or may not land in an initial
implementation.  There's a bit below, and some of it really would be better
in the PR, feel free to take it there if deemed more constructive.


1) The need for different schedules for different tables
2) Failure mode: repairs failing and thrashing repairs for all
keyspaces+tables
3) Small concerns on relying on system tables
4) Small concerns on tuning requirements


(1)
Alex also touched on this.  I'm aware of too many reasons where this is a
must-have.  Many users cannot repair their clusters without tuning
per-table schedules.  Different gc_grace_seconds is the biggest reason.
But there's also running full repairs infrequently for disk rot (or similar
reason) on a table that's otherwise frequently incremental repaired (also
means an incremental repair could be skipped if the full repair was
currently running).  Or TWCS tables where you benefit from higher frequency
of incremental repair (and/or want to minimise repairs older than the
current time_window).   You may also want to do repairs in different DCs
differently.

(2)
I'm curious as to how crashed repairs are handled and resumed…
A problem a lot of users struggle with is where the repair on one table is
enigmatically problematic, crashing or timing out, and it takes a long time
to figure it out.
Without any per-table scheduling and history (IIUC)  a node would have to
restart the repairs for all keyspaces and tables.  This will lead to
over-repairing some tables and never repairing others.

And without such per-table tracking, I'm also kinda curious as to how we
interact with manual repair invocations the user makes.

There are operational requirements to do manual repairs, e.g. node
replacement or if a node has been down for too long, and consistency
breakages until such repair is complete.  Leaving such operational
requirements to this CEP's in-built scheduler is a limited approach, it may
be many days before it gets to doing it, and even with node priority will
it appropriately switch from primary-range to all-replica-ranges?

What if the user accidently invokes an incremental repair when the in-built
scheduler is expecting only to ever perform full repairs, does it know how
to detect/remedy that?


(3)
Having stuff in system tables is brittle and a write-amplification, we have
plenty of experience of this from DSE NodeSync and Reaper.  Reaper's
ability to store its metadata out-of-cluster is a huge benefit.  Having
read the design doc and PR, I am impressed how lightweight the design of
the tables are.  But I do still think we deserve some numbers, and a
further line of questioning:  what consistency guarantees do we need, how
does this work cross-dc, during topology changes, does an event that
introduces data-at-rest inconsistencies in the cluster then become
confused/inefficient when the mechanism to repair it also now has its
metadata inconsistent.  For the most part this is a problem not unique to
any table in system_distributed and otherwise handled, but how does the
system_distributed keyspace handling of such failures impact repairs.

Even with strong consistency, I would assume the design needs to be
pessimistic, e.g. multiple node repairs can be started at the time.  Is
this true, if so how is it handled ?

I am also curious as to how the impact of these tables changes as we
address (1) and (2).

(4)
I can see how the CEP's design works well for the biggest clusters, and
those with heterogeneous data-models (which often comes with larger
deployment sets).  But I don't think we can use this as the bar to quality
or acceptance.   Many smaller clusters that come with lots of keyspaces and
tables have real troubles trying to get repairs to run weekly.  We can't
simply blame users for not having optimal data models and deployments.

Carefully tuning the schedules of tables, and the cluster itself, is often
a requirement – time-consuming and a real pain point.  The CEP as it stands
today I can, with confidence, say will simply not work for many users.
Worse than that it will provide false hope, and take time and effort for
users until they realise it won't work, leaving them having to revert to
their previous solution.   No one expects the CEP to initially handle and
solve every situation, especially poor data-models and over-capacity
clusters.  Hope here is just a bit of discussion that can help us be
informative about our limitations, and possibly save some users from
thinking this is their silver bullet.

The biggest aspect to this I believe is (1), but operational stability and
tuning is also critical.  Alex mentions the range-centric approach, which
helps balance load, which in turn gives you more head room.  But there's
also stuff like p

Re: [Discuss] Repair inside C*

2024-10-28 Thread Alexander DEJANOVSKI
Hi Jaydeep,

I've taken a look at the proposed design and have a few comments/questions.
As one of the maintainers of Reaper, I'm looking this through the lens of
how Reaper does things.


*The approach taken in the CEP-37 design is "node-centric" vs a "range
centric" approach (which is the one Reaper takes).*I'm worried that this
will not allow spreading the repair load evenly across the cluster, since
nodes are the concurrency unit. You could allow running repair on 3 nodes
concurrently for example, but these 3 nodes could all involve the same
replicas, making these replicas process 3 concurrent repairs while others
could be left uninvolved in any repair at all.
Taking a range centric approach (we're not repairing nodes, we're repairing
the token ranges) allows to spread the load evenly without overlap in the
replica sets.
I'm more worried even with incremental repair here, because you might end
up with some conflicts around sstables which would be in the pending repair
pool but would be needed by a competing repair job.
I don't know if in the latest versions such sstables would be totally
ignored or if the competing repair job would fail.

*Each repair command will repair all keyspaces (with the ability to fully
exclude some tables) and **I haven't seen a notion of schedule which seems
to suggest repairs are running continuously (unless I missed something?).*
There are many cases where one might have differentiated gc_grace_seconds
settings to optimize reclaiming tombstones when applicable. That requires
having some fine control over the repair cycle for a given keyspace/set of
tables.
Here, nodes will be processed sequentially and each node will process the
keyspaces sequentially, tying the repair cycle of all keyspaces together.
If one of the ranges for a specific keyspace cannot be repaired within the
3 hours timeout, it could block all the other keyspaces repairs.
Continuous repair might create a lot of overhead for full repairs which
often don't require more than 1 run per week.
It also will not allow running a mix of scheduled full/incremental repairs
(I'm unsure if that is still a recommendation, but it was still recommended
not so long ago)

*The timeout base duration is large*
I think the 3 hours timeout might be quite large and probably means a lot
of data is being repaired for each split. That usually involves some level
of overstreaming. I don't have numbers to support this, it's more about my
own experience on sizing splits in production with Reaper to reduce the
impact as much as possible on cluster performance.
We use 30 minutes as default in Reaper with subsequent attempts growing the
timeout dynamically for challenging splits.

Finally thanks for picking this up, I'm eager to see Reaper not being
needed anymore and having the database manage its own repairs!


Le mar. 22 oct. 2024 à 21:10, Benedict  a écrit :

> I realise it’s out of scope, but to counterbalance all of the
> pro-decomposition messages I wanted to chime in with a strong -1. But we
> can debate that in a suitable context later.
>
> On 22 Oct 2024, at 16:36, Jordan West  wrote:
>
> 
> Agreed with the sentiment that decomposition is a good target but out of
> scope here. I’m personally excited to see an in-tree repair scheduler and
> am supportive of the approach shared here.
>
> Jordan
>
> On Tue, Oct 22, 2024 at 08:12 Dinesh Joshi  wrote:
>
>> Decomposing Cassandra may be architecturally desirable but that is not
>> the goal of this CEP. This CEP brings value to operators today so it should
>> be considered on that merit. We definitely need to have a separate
>> conversation on Cassandra's architectural direction.
>>
>> On Tue, Oct 22, 2024 at 7:51 AM Joseph Lynch 
>> wrote:
>>
>>> Definitely like this in C* itself. We only changed our proposal to
>>> putting repair scheduling in the sidecar before because trunk was frozen
>>> for the foreseeable future at that time. With trunk unfrozen and
>>> development on the main process going at a fast pace I think it makes way
>>> more sense to integrate natively as table properties as this CEP proposes.
>>> Completely agree the scheduling overhead should be minimal.
>>>
>>> Moving the actual repair operation (comparing data and streaming
>>> mismatches) along with compaction operations to a separate process long
>>> term makes a lot of sense but imo only once we both have a release of
>>> sidecar and a contract figured out between them on communication. I'm
>>> watching CEP-38 there as I think CQL and virtual tables are looking much
>>> stronger than when we wrote CEP-1 and chose HTTP but that's for that
>>> discussion and not this one.
>>>
>>> -Joey
>>>
>>> On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero 
>>> wrote:
>>>
 Like others have said, I was expecting the scheduling portion of repair
 is
 negligible. I was mostly curious if you had something handy that you can
 quickly share.

 On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
 > >Jaydeep, do 

Re: [Discuss] Repair inside C*

2024-10-22 Thread Benedict
I realise it’s out of scope, but to counterbalance all of the pro-decomposition messages I wanted to chime in with a strong -1. But we can debate that in a suitable context later.On 22 Oct 2024, at 16:36, Jordan West  wrote:Agreed with the sentiment that decomposition is a good target but out of scope here. I’m personally excited to see an in-tree repair scheduler and am supportive of the approach shared here. Jordan On Tue, Oct 22, 2024 at 08:12 Dinesh Joshi  wrote:Decomposing Cassandra may be architecturally desirable but that is not the goal of this CEP. This CEP brings value to operators today so it should be considered on that merit. We definitely need to have a separate conversation on Cassandra's architectural direction.On Tue, Oct 22, 2024 at 7:51 AM Joseph Lynch  wrote:Definitely like this in C* itself. We only changed our proposal to putting repair scheduling in the sidecar before because trunk was frozen for the foreseeable future at that time. With trunk unfrozen and development on the main process going at a fast pace I think it makes way more sense to integrate natively as table properties as this CEP proposes. Completely agree the scheduling overhead should be minimal.Moving the actual repair operation (comparing data and streaming mismatches) along with compaction operations to a separate process long term makes a lot of sense but imo only once we both have a release of sidecar and a contract figured out between them on communication. I'm watching CEP-38 there as I think CQL and virtual tables are looking much stronger than when we wrote CEP-1 and chose HTTP but that's for that discussion and not this one.-JoeyOn Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero  wrote:Like others have said, I was expecting the scheduling portion of repair is
negligible. I was mostly curious if you had something handy that you can
quickly share.

On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
> >Jaydeep, do you have any metrics on your clusters comparing them before
> and after introducing repair scheduling into the Cassandra process?
> 
> Yes, I had made some comparisons when I started rolling this feature out to
> our production five years ago :)  Here are the details:
> *The Scheduling*
> The scheduling itself is exceptionally lightweight, as only one additional
> thread monitors the repair activity, updating the status to a system table
> once every few minutes or so. So, it does not appear anywhere in the CPU
> charts, etc. Unfortunately, I do not have those graphs now, but I can do a
> quick comparison if it helps!
> 
> *The Repair Itself*
> As we all know, the Cassandra repair algorithm is a heavy-weight process
> due to Merkle tree/streaming, etc., no matter how we schedule it. But it is
> an orthogonal topic and folks are already discussing creating a new CEP.
> 
> Jaydeep
> 
> 
> On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero 
> wrote:
> 
> > Jaydeep, do you have any metrics on your clusters comparing them before
> > and after introducing repair scheduling into the Cassandra process?
> >
> > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
> > > Sounds good. Just wanted to bring it up. I agree that the scheduling bit
> > is
> > > pretty light weight and the ideal would be to bring the whole of the
> > repair
> > > external, which is a much bigger can of worms to open.
> > >
> > >
> > >
> > > -Jeremiah
> > >
> > >
> > >
> > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink 
> > wrote:
> > > >
> > > >
> > >
> > > > 
> > > >
> > > > > I actually think we should be looking at how we can move things out
> > of the
> > > > database process.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > While worth pursuing, I think we would need a different CEP just to
> > figure
> > > > out how to do that. Not only is there a lot of infrastructure
> > difficulty in
> > > > running multi process, the inter app communication needs to be figured
> > out
> > > > better then JMX. Even the sidecar we dont have a solid story on how to
> > > > ensure both are running or anything yet. It's up to each app owner to
> > figure
> > > > it out. Once we have a good thing in place I think we can start moving
> > > > compactions, repairs, etc out of the database. Even then it's the
> > _repairs_
> > > > that is expensive, not the scheduling.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
> > > > <[[email protected]](mailto:[email protected])>
> > wrote:
> > > >
> > > >
> > >
> > > >> I love the idea of a repair service being there by default for an
> > install
> > > of C*.  My main concern here is that it is putting more services into
> > the main
> > > database process.  I actually think we should be looking at how we can
> > move
> > > things out of the database process.  The C* process being a giant
> > monolith has
> > > always been a pain point.  Is there anyway it makes 

Re: [Discuss] Repair inside C*

2024-10-22 Thread Jordan West
Agreed with the sentiment that decomposition is a good target but out of
scope here. I’m personally excited to see an in-tree repair scheduler and
am supportive of the approach shared here.

Jordan

On Tue, Oct 22, 2024 at 08:12 Dinesh Joshi  wrote:

> Decomposing Cassandra may be architecturally desirable but that is not the
> goal of this CEP. This CEP brings value to operators today so it should be
> considered on that merit. We definitely need to have a separate
> conversation on Cassandra's architectural direction.
>
> On Tue, Oct 22, 2024 at 7:51 AM Joseph Lynch 
> wrote:
>
>> Definitely like this in C* itself. We only changed our proposal to
>> putting repair scheduling in the sidecar before because trunk was frozen
>> for the foreseeable future at that time. With trunk unfrozen and
>> development on the main process going at a fast pace I think it makes way
>> more sense to integrate natively as table properties as this CEP proposes.
>> Completely agree the scheduling overhead should be minimal.
>>
>> Moving the actual repair operation (comparing data and streaming
>> mismatches) along with compaction operations to a separate process long
>> term makes a lot of sense but imo only once we both have a release of
>> sidecar and a contract figured out between them on communication. I'm
>> watching CEP-38 there as I think CQL and virtual tables are looking much
>> stronger than when we wrote CEP-1 and chose HTTP but that's for that
>> discussion and not this one.
>>
>> -Joey
>>
>> On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero 
>> wrote:
>>
>>> Like others have said, I was expecting the scheduling portion of repair
>>> is
>>> negligible. I was mostly curious if you had something handy that you can
>>> quickly share.
>>>
>>> On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
>>> > >Jaydeep, do you have any metrics on your clusters comparing them
>>> before
>>> > and after introducing repair scheduling into the Cassandra process?
>>> >
>>> > Yes, I had made some comparisons when I started rolling this feature
>>> out to
>>> > our production five years ago :)  Here are the details:
>>> > *The Scheduling*
>>> > The scheduling itself is exceptionally lightweight, as only one
>>> additional
>>> > thread monitors the repair activity, updating the status to a system
>>> table
>>> > once every few minutes or so. So, it does not appear anywhere in the
>>> CPU
>>> > charts, etc. Unfortunately, I do not have those graphs now, but I can
>>> do a
>>> > quick comparison if it helps!
>>> >
>>> > *The Repair Itself*
>>> > As we all know, the Cassandra repair algorithm is a heavy-weight
>>> process
>>> > due to Merkle tree/streaming, etc., no matter how we schedule it. But
>>> it is
>>> > an orthogonal topic and folks are already discussing creating a new
>>> CEP.
>>> >
>>> > Jaydeep
>>> >
>>> >
>>> > On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero <
>>> [email protected]>
>>> > wrote:
>>> >
>>> > > Jaydeep, do you have any metrics on your clusters comparing them
>>> before
>>> > > and after introducing repair scheduling into the Cassandra process?
>>> > >
>>> > > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
>>> > > > Sounds good. Just wanted to bring it up. I agree that the
>>> scheduling bit
>>> > > is
>>> > > > pretty light weight and the ideal would be to bring the whole of
>>> the
>>> > > repair
>>> > > > external, which is a much bigger can of worms to open.
>>> > > >
>>> > > >
>>> > > >
>>> > > > -Jeremiah
>>> > > >
>>> > > >
>>> > > >
>>> > > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink <
>>> [email protected]>
>>> > > wrote:
>>> > > > >
>>> > > > >
>>> > > >
>>> > > > > 
>>> > > > >
>>> > > > > > I actually think we should be looking at how we can move
>>> things out
>>> > > of the
>>> > > > > database process.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > While worth pursuing, I think we would need a different CEP just
>>> to
>>> > > figure
>>> > > > > out how to do that. Not only is there a lot of infrastructure
>>> > > difficulty in
>>> > > > > running multi process, the inter app communication needs to be
>>> figured
>>> > > out
>>> > > > > better then JMX. Even the sidecar we dont have a solid story on
>>> how to
>>> > > > > ensure both are running or anything yet. It's up to each app
>>> owner to
>>> > > figure
>>> > > > > it out. Once we have a good thing in place I think we can start
>>> moving
>>> > > > > compactions, repairs, etc out of the database. Even then it's the
>>> > > _repairs_
>>> > > > > that is expensive, not the scheduling.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
>>> > > > > <[[email protected]](mailto:[email protected])>
>>> > > wrote:
>>> > > > >
>>> > > > >
>>> > > >
>>> > > > >> I love the idea of a repair service being there by default for
>>> an
>>> > > install
>>> > > > of C*.  My main concern here is that it is putting more services
>>> into
>>> > > the main
>>>

Re: [Discuss] Repair inside C*

2024-10-22 Thread Dinesh Joshi
On Mon, Oct 21, 2024 at 9:18 AM David Capwell  wrote:

> One thing to keep in mind is that larger clusters require you “smartly”
> split the ranges else you nuke your cluster… knowing how to split requires
> internal knowledge from the database which we could expose, but then we
> need to expose a new public API (most likely a set of APIs) just to do
> this.  When you do the scheduling internal to the database you can make
> “breaking” changes that improve stability into a patch fix rather than have
> to wait for the next major…
>

As the project and its ecosystem grows we need to have a conversation on
what is a public API? I do not want to derail this thread but very briefly,
we should make a distinction between `project internal` private API that is
exposed to Cassandra's components (which very well could run as a separate
local or remote process) and public API that the rest of the world outside
of the project uses. The backward compatibility expectations will be
different for `project internal` private API and public API.


Re: [Discuss] Repair inside C*

2024-10-22 Thread Dinesh Joshi
Decomposing Cassandra may be architecturally desirable but that is not the
goal of this CEP. This CEP brings value to operators today so it should be
considered on that merit. We definitely need to have a separate
conversation on Cassandra's architectural direction.

On Tue, Oct 22, 2024 at 7:51 AM Joseph Lynch  wrote:

> Definitely like this in C* itself. We only changed our proposal to putting
> repair scheduling in the sidecar before because trunk was frozen for the
> foreseeable future at that time. With trunk unfrozen and development on the
> main process going at a fast pace I think it makes way more sense to
> integrate natively as table properties as this CEP proposes. Completely
> agree the scheduling overhead should be minimal.
>
> Moving the actual repair operation (comparing data and streaming
> mismatches) along with compaction operations to a separate process long
> term makes a lot of sense but imo only once we both have a release of
> sidecar and a contract figured out between them on communication. I'm
> watching CEP-38 there as I think CQL and virtual tables are looking much
> stronger than when we wrote CEP-1 and chose HTTP but that's for that
> discussion and not this one.
>
> -Joey
>
> On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero 
> wrote:
>
>> Like others have said, I was expecting the scheduling portion of repair is
>> negligible. I was mostly curious if you had something handy that you can
>> quickly share.
>>
>> On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
>> > >Jaydeep, do you have any metrics on your clusters comparing them before
>> > and after introducing repair scheduling into the Cassandra process?
>> >
>> > Yes, I had made some comparisons when I started rolling this feature
>> out to
>> > our production five years ago :)  Here are the details:
>> > *The Scheduling*
>> > The scheduling itself is exceptionally lightweight, as only one
>> additional
>> > thread monitors the repair activity, updating the status to a system
>> table
>> > once every few minutes or so. So, it does not appear anywhere in the CPU
>> > charts, etc. Unfortunately, I do not have those graphs now, but I can
>> do a
>> > quick comparison if it helps!
>> >
>> > *The Repair Itself*
>> > As we all know, the Cassandra repair algorithm is a heavy-weight process
>> > due to Merkle tree/streaming, etc., no matter how we schedule it. But
>> it is
>> > an orthogonal topic and folks are already discussing creating a new CEP.
>> >
>> > Jaydeep
>> >
>> >
>> > On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero > >
>> > wrote:
>> >
>> > > Jaydeep, do you have any metrics on your clusters comparing them
>> before
>> > > and after introducing repair scheduling into the Cassandra process?
>> > >
>> > > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
>> > > > Sounds good. Just wanted to bring it up. I agree that the
>> scheduling bit
>> > > is
>> > > > pretty light weight and the ideal would be to bring the whole of the
>> > > repair
>> > > > external, which is a much bigger can of worms to open.
>> > > >
>> > > >
>> > > >
>> > > > -Jeremiah
>> > > >
>> > > >
>> > > >
>> > > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink > >
>> > > wrote:
>> > > > >
>> > > > >
>> > > >
>> > > > > 
>> > > > >
>> > > > > > I actually think we should be looking at how we can move things
>> out
>> > > of the
>> > > > > database process.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > While worth pursuing, I think we would need a different CEP just
>> to
>> > > figure
>> > > > > out how to do that. Not only is there a lot of infrastructure
>> > > difficulty in
>> > > > > running multi process, the inter app communication needs to be
>> figured
>> > > out
>> > > > > better then JMX. Even the sidecar we dont have a solid story on
>> how to
>> > > > > ensure both are running or anything yet. It's up to each app
>> owner to
>> > > figure
>> > > > > it out. Once we have a good thing in place I think we can start
>> moving
>> > > > > compactions, repairs, etc out of the database. Even then it's the
>> > > _repairs_
>> > > > > that is expensive, not the scheduling.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
>> > > > > <[[email protected]](mailto:[email protected])>
>> > > wrote:
>> > > > >
>> > > > >
>> > > >
>> > > > >> I love the idea of a repair service being there by default for an
>> > > install
>> > > > of C*.  My main concern here is that it is putting more services
>> into
>> > > the main
>> > > > database process.  I actually think we should be looking at how we
>> can
>> > > move
>> > > > things out of the database process.  The C* process being a giant
>> > > monolith has
>> > > > always been a pain point.  Is there anyway it makes sense for this
>> to be
>> > > an
>> > > > external process rather than a new thread pool inside the C*
>> process?
>> > > >
>> > > > >>
>> > > >
>> > > > >>
>> > > > >
>> > > > >>
>> > > >
>> > > > >> -Jeremiah

Re: [Discuss] Repair inside C*

2024-10-22 Thread Joseph Lynch
Definitely like this in C* itself. We only changed our proposal to putting
repair scheduling in the sidecar before because trunk was frozen for the
foreseeable future at that time. With trunk unfrozen and development on the
main process going at a fast pace I think it makes way more sense to
integrate natively as table properties as this CEP proposes. Completely
agree the scheduling overhead should be minimal.

Moving the actual repair operation (comparing data and streaming
mismatches) along with compaction operations to a separate process long
term makes a lot of sense but imo only once we both have a release of
sidecar and a contract figured out between them on communication. I'm
watching CEP-38 there as I think CQL and virtual tables are looking much
stronger than when we wrote CEP-1 and chose HTTP but that's for that
discussion and not this one.

-Joey

On Mon, Oct 21, 2024 at 3:25 PM Francisco Guerrero 
wrote:

> Like others have said, I was expecting the scheduling portion of repair is
> negligible. I was mostly curious if you had something handy that you can
> quickly share.
>
> On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
> > >Jaydeep, do you have any metrics on your clusters comparing them before
> > and after introducing repair scheduling into the Cassandra process?
> >
> > Yes, I had made some comparisons when I started rolling this feature out
> to
> > our production five years ago :)  Here are the details:
> > *The Scheduling*
> > The scheduling itself is exceptionally lightweight, as only one
> additional
> > thread monitors the repair activity, updating the status to a system
> table
> > once every few minutes or so. So, it does not appear anywhere in the CPU
> > charts, etc. Unfortunately, I do not have those graphs now, but I can do
> a
> > quick comparison if it helps!
> >
> > *The Repair Itself*
> > As we all know, the Cassandra repair algorithm is a heavy-weight process
> > due to Merkle tree/streaming, etc., no matter how we schedule it. But it
> is
> > an orthogonal topic and folks are already discussing creating a new CEP.
> >
> > Jaydeep
> >
> >
> > On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero 
> > wrote:
> >
> > > Jaydeep, do you have any metrics on your clusters comparing them before
> > > and after introducing repair scheduling into the Cassandra process?
> > >
> > > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
> > > > Sounds good. Just wanted to bring it up. I agree that the scheduling
> bit
> > > is
> > > > pretty light weight and the ideal would be to bring the whole of the
> > > repair
> > > > external, which is a much bigger can of worms to open.
> > > >
> > > >
> > > >
> > > > -Jeremiah
> > > >
> > > >
> > > >
> > > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink 
> > > wrote:
> > > > >
> > > > >
> > > >
> > > > > 
> > > > >
> > > > > > I actually think we should be looking at how we can move things
> out
> > > of the
> > > > > database process.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > While worth pursuing, I think we would need a different CEP just to
> > > figure
> > > > > out how to do that. Not only is there a lot of infrastructure
> > > difficulty in
> > > > > running multi process, the inter app communication needs to be
> figured
> > > out
> > > > > better then JMX. Even the sidecar we dont have a solid story on
> how to
> > > > > ensure both are running or anything yet. It's up to each app owner
> to
> > > figure
> > > > > it out. Once we have a good thing in place I think we can start
> moving
> > > > > compactions, repairs, etc out of the database. Even then it's the
> > > _repairs_
> > > > > that is expensive, not the scheduling.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
> > > > > <[[email protected]](mailto:[email protected])>
> > > wrote:
> > > > >
> > > > >
> > > >
> > > > >> I love the idea of a repair service being there by default for an
> > > install
> > > > of C*.  My main concern here is that it is putting more services into
> > > the main
> > > > database process.  I actually think we should be looking at how we
> can
> > > move
> > > > things out of the database process.  The C* process being a giant
> > > monolith has
> > > > always been a pain point.  Is there anyway it makes sense for this
> to be
> > > an
> > > > external process rather than a new thread pool inside the C* process?
> > > >
> > > > >>
> > > >
> > > > >>
> > > > >
> > > > >>
> > > >
> > > > >> -Jeremiah Jordan
> > > >
> > > > >>
> > > >
> > > > >>
> > > > >
> > > > >>
> > > >
> > > > >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever
> > > > <[[email protected]](mailto:[email protected])> wrote:
> > > > >
> > > > >>
> > > >
> > > > >>>
> > > > >
> > > > >>>
> > > >
> > > > >>> This is looking strong, thanks Jaydeep.
> > > >
> > > > >>>
> > > >
> > > > >>>
> > > > >
> > > > >>>
> > > >
> > > > >>> I would suggest folk take a look at the design doc and the PR in
> the
> > > CEP.
> > > > A l

Re: [Discuss] Repair inside C*

2024-10-21 Thread Francisco Guerrero
Like others have said, I was expecting the scheduling portion of repair is
negligible. I was mostly curious if you had something handy that you can
quickly share.

On 2024/10/21 18:59:41 Jaydeep Chovatia wrote:
> >Jaydeep, do you have any metrics on your clusters comparing them before
> and after introducing repair scheduling into the Cassandra process?
> 
> Yes, I had made some comparisons when I started rolling this feature out to
> our production five years ago :)  Here are the details:
> *The Scheduling*
> The scheduling itself is exceptionally lightweight, as only one additional
> thread monitors the repair activity, updating the status to a system table
> once every few minutes or so. So, it does not appear anywhere in the CPU
> charts, etc. Unfortunately, I do not have those graphs now, but I can do a
> quick comparison if it helps!
> 
> *The Repair Itself*
> As we all know, the Cassandra repair algorithm is a heavy-weight process
> due to Merkle tree/streaming, etc., no matter how we schedule it. But it is
> an orthogonal topic and folks are already discussing creating a new CEP.
> 
> Jaydeep
> 
> 
> On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero 
> wrote:
> 
> > Jaydeep, do you have any metrics on your clusters comparing them before
> > and after introducing repair scheduling into the Cassandra process?
> >
> > On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
> > > Sounds good. Just wanted to bring it up. I agree that the scheduling bit
> > is
> > > pretty light weight and the ideal would be to bring the whole of the
> > repair
> > > external, which is a much bigger can of worms to open.
> > >
> > >
> > >
> > > -Jeremiah
> > >
> > >
> > >
> > > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink 
> > wrote:
> > > >
> > > >
> > >
> > > > 
> > > >
> > > > > I actually think we should be looking at how we can move things out
> > of the
> > > > database process.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > While worth pursuing, I think we would need a different CEP just to
> > figure
> > > > out how to do that. Not only is there a lot of infrastructure
> > difficulty in
> > > > running multi process, the inter app communication needs to be figured
> > out
> > > > better then JMX. Even the sidecar we dont have a solid story on how to
> > > > ensure both are running or anything yet. It's up to each app owner to
> > figure
> > > > it out. Once we have a good thing in place I think we can start moving
> > > > compactions, repairs, etc out of the database. Even then it's the
> > _repairs_
> > > > that is expensive, not the scheduling.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
> > > > <[[email protected]](mailto:[email protected])>
> > wrote:
> > > >
> > > >
> > >
> > > >> I love the idea of a repair service being there by default for an
> > install
> > > of C*.  My main concern here is that it is putting more services into
> > the main
> > > database process.  I actually think we should be looking at how we can
> > move
> > > things out of the database process.  The C* process being a giant
> > monolith has
> > > always been a pain point.  Is there anyway it makes sense for this to be
> > an
> > > external process rather than a new thread pool inside the C* process?
> > >
> > > >>
> > >
> > > >>
> > > >
> > > >>
> > >
> > > >> -Jeremiah Jordan
> > >
> > > >>
> > >
> > > >>
> > > >
> > > >>
> > >
> > > >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever
> > > <[[email protected]](mailto:[email protected])> wrote:
> > > >
> > > >>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> This is looking strong, thanks Jaydeep.
> > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> I would suggest folk take a look at the design doc and the PR in the
> > CEP.
> > > A lot is there (that I have completely missed).
> > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> I would especially ask all authors of prior art (Reaper, DSE
> > nodesync,
> > > ecchronos)  to take a final review of the proposal
> > > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> Jaydeep, can we ask for a two week window while we reach out to these
> > > people ?  There's a lot of prior art in this space, and it feels like
> > we're in
> > > a good place now where it's clear this has legs and we can use that to
> > bring
> > > folk in and make sure there's no remaining blindspots.
> > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>>
> > > >
> > > >>>
> > >
> > > >>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia
> > > <[[email protected]](mailto:[email protected])>
> > wrote:
> > > >
> > > >>>
> > >
> > >  Sorry, there is a typo in the CEP-37 link; here is the correct
> > > [link](
> > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution
> > )
> > >
> > > 
> > >
> > > 
> > > >
> > > 
> > >
> > > 
> > > >
> > > 
> > >
> > >  On Thu, Oct 17, 2024 at 4:36 P

Re: [Discuss] Repair inside C*

2024-10-21 Thread Jaydeep Chovatia
>Jaydeep, do you have any metrics on your clusters comparing them before
and after introducing repair scheduling into the Cassandra process?

Yes, I had made some comparisons when I started rolling this feature out to
our production five years ago :)  Here are the details:
*The Scheduling*
The scheduling itself is exceptionally lightweight, as only one additional
thread monitors the repair activity, updating the status to a system table
once every few minutes or so. So, it does not appear anywhere in the CPU
charts, etc. Unfortunately, I do not have those graphs now, but I can do a
quick comparison if it helps!

*The Repair Itself*
As we all know, the Cassandra repair algorithm is a heavy-weight process
due to Merkle tree/streaming, etc., no matter how we schedule it. But it is
an orthogonal topic and folks are already discussing creating a new CEP.

Jaydeep


On Mon, Oct 21, 2024 at 10:02 AM Francisco Guerrero 
wrote:

> Jaydeep, do you have any metrics on your clusters comparing them before
> and after introducing repair scheduling into the Cassandra process?
>
> On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
> > Sounds good. Just wanted to bring it up. I agree that the scheduling bit
> is
> > pretty light weight and the ideal would be to bring the whole of the
> repair
> > external, which is a much bigger can of worms to open.
> >
> >
> >
> > -Jeremiah
> >
> >
> >
> > > On Oct 21, 2024, at 11:21 AM, Chris Lohfink 
> wrote:
> > >
> > >
> >
> > > 
> > >
> > > > I actually think we should be looking at how we can move things out
> of the
> > > database process.
> > >
> > >
> > >
> > >
> > >
> > > While worth pursuing, I think we would need a different CEP just to
> figure
> > > out how to do that. Not only is there a lot of infrastructure
> difficulty in
> > > running multi process, the inter app communication needs to be figured
> out
> > > better then JMX. Even the sidecar we dont have a solid story on how to
> > > ensure both are running or anything yet. It's up to each app owner to
> figure
> > > it out. Once we have a good thing in place I think we can start moving
> > > compactions, repairs, etc out of the database. Even then it's the
> _repairs_
> > > that is expensive, not the scheduling.
> > >
> > >
> > >
> > >
> > > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
> > > <[[email protected]](mailto:[email protected])>
> wrote:
> > >
> > >
> >
> > >> I love the idea of a repair service being there by default for an
> install
> > of C*.  My main concern here is that it is putting more services into
> the main
> > database process.  I actually think we should be looking at how we can
> move
> > things out of the database process.  The C* process being a giant
> monolith has
> > always been a pain point.  Is there anyway it makes sense for this to be
> an
> > external process rather than a new thread pool inside the C* process?
> >
> > >>
> >
> > >>
> > >
> > >>
> >
> > >> -Jeremiah Jordan
> >
> > >>
> >
> > >>
> > >
> > >>
> >
> > >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever
> > <[[email protected]](mailto:[email protected])> wrote:
> > >
> > >>
> >
> > >>>
> > >
> > >>>
> >
> > >>> This is looking strong, thanks Jaydeep.
> >
> > >>>
> >
> > >>>
> > >
> > >>>
> >
> > >>> I would suggest folk take a look at the design doc and the PR in the
> CEP.
> > A lot is there (that I have completely missed).
> >
> > >>>
> >
> > >>>
> > >
> > >>>
> >
> > >>> I would especially ask all authors of prior art (Reaper, DSE
> nodesync,
> > ecchronos)  to take a final review of the proposal
> > >
> > >>>
> >
> > >>>
> > >
> > >>>
> >
> > >>> Jaydeep, can we ask for a two week window while we reach out to these
> > people ?  There's a lot of prior art in this space, and it feels like
> we're in
> > a good place now where it's clear this has legs and we can use that to
> bring
> > folk in and make sure there's no remaining blindspots.
> >
> > >>>
> >
> > >>>
> > >
> > >>>
> >
> > >>>
> > >
> > >>>
> >
> > >>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia
> > <[[email protected]](mailto:[email protected])>
> wrote:
> > >
> > >>>
> >
> >  Sorry, there is a typo in the CEP-37 link; here is the correct
> > [link](
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution
> )
> >
> > 
> >
> > 
> > >
> > 
> >
> > 
> > >
> > 
> >
> >  On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia
> > <[[email protected]](mailto:[email protected])>
> wrote:
> > >
> > 
> >
> > > First, thank you for your patience while we strengthened the
> CEP-37.
> >
> > >
> >
> > >
> > >
> > >
> >
> > > Over the last eight months, Chris Lohfink, Andy Tolbert, Josh
> McKenzie,
> > Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
> > discussions/a dedicated Slack channel
> #cassandra-repair-scheduling-cep37) to
> > come up with the best possible design that not only significantly
> simplifies

Re: [Discuss] Repair inside C*

2024-10-21 Thread Patrick McFadin
> While worth pursuing, I think we would need a different CEP just to figure 
> out how to do that. Not only is there a lot of infrastructure difficulty in 
> running multi process, the inter app communication needs to be figured out 
> better then JMX.

I strongly agree and this is a good time to start considering this
with TCM pending. The subtle difference is how much is in process vs
coordinated by the Cassandra core. Alex and I were talking about the
idea of ETCD in Kubernetes and in the database role, stands as the
source of truth for cluster operations. How many pods? Where are they?
What services map to what pod? etc etc. Separate processes could
operate independently by letting the database be the database, and
Cassandra could be the coordinator and source of truth with a
transactional guarantee of correctness.


On Mon, Oct 21, 2024 at 9:22 AM Chris Lohfink  wrote:
>
> > I actually think we should be looking at how we can move things out of the 
> > database process.
>
> While worth pursuing, I think we would need a different CEP just to figure 
> out how to do that. Not only is there a lot of infrastructure difficulty in 
> running multi process, the inter app communication needs to be figured out 
> better then JMX. Even the sidecar we dont have a solid story on how to ensure 
> both are running or anything yet. It's up to each app owner to figure it out. 
> Once we have a good thing in place I think we can start moving compactions, 
> repairs, etc out of the database. Even then it's the _repairs_ that is 
> expensive, not the scheduling.
>
> On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan  
> wrote:
>>
>> I love the idea of a repair service being there by default for an install of 
>> C*.  My main concern here is that it is putting more services into the main 
>> database process.  I actually think we should be looking at how we can move 
>> things out of the database process.  The C* process being a giant monolith 
>> has always been a pain point.  Is there anyway it makes sense for this to be 
>> an external process rather than a new thread pool inside the C* process?
>>
>> -Jeremiah Jordan
>>
>> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever  wrote:
>>>
>>>
>>> This is looking strong, thanks Jaydeep.
>>>
>>> I would suggest folk take a look at the design doc and the PR in the CEP.  
>>> A lot is there (that I have completely missed).
>>>
>>> I would especially ask all authors of prior art (Reaper, DSE nodesync, 
>>> ecchronos)  to take a final review of the proposal
>>>
>>> Jaydeep, can we ask for a two week window while we reach out to these 
>>> people ?  There's a lot of prior art in this space, and it feels like we're 
>>> in a good place now where it's clear this has legs and we can use that to 
>>> bring folk in and make sure there's no remaining blindspots.
>>>
>>>
>>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia  
>>> wrote:

 Sorry, there is a typo in the CEP-37 link; here is the correct link


 On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia 
  wrote:
>
> First, thank you for your patience while we strengthened the CEP-37.
>
>
> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie, 
> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online 
> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37) 
> to come up with the best possible design that not only significantly 
> simplifies repair operations but also includes the most common features 
> that everyone will benefit from running at Scale.
>
> For example,
>
> Apache Cassandra must be capable of running multiple repair types, such 
> as Full, Incremental, Paxos, and Preview - so the framework should be 
> easily extendable with no additional overhead from the operator’s point 
> of view.
>
> An easy way to extend the token-split calculation algorithm with a 
> default implementation should exist.
>
> Running incremental repair reliably at Scale is pretty challenging, so we 
> need to place safeguards, such as migration/rollback w/o restart and 
> stopping incremental repair automatically if the disk is about to get 
> full.
>
> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is 
> now officially ready for review after multiple rounds of design, testing, 
> code reviews, documentation reviews, and, more importantly, validation 
> that it runs at Scale!
>
>
> Some facts about CEP-37.
>
> Multiple members have verified all aspects of CEP-37 numerous times.
>
> The design proposed in CEP-37 has been thoroughly tried and tested on an 
> immense scale (hundreds of unique Cassandra clusters, tens of thousands 
> of Cassandra nodes, with tens of millions of QPS) on top of 4.1 
> open-source for more than five years; please see more details here.
>
> The following presentat

Re: [Discuss] Repair inside C*

2024-10-21 Thread Francisco Guerrero
Jaydeep, do you have any metrics on your clusters comparing them before
and after introducing repair scheduling into the Cassandra process?

On 2024/10/21 16:57:57 "J. D. Jordan" wrote:
> Sounds good. Just wanted to bring it up. I agree that the scheduling bit is
> pretty light weight and the ideal would be to bring the whole of the repair
> external, which is a much bigger can of worms to open.
> 
>   
> 
> -Jeremiah
> 
>   
> 
> > On Oct 21, 2024, at 11:21 AM, Chris Lohfink  wrote:  
> >  
> >
> 
> > 
> >
> > > I actually think we should be looking at how we can move things out of the
> > database process.  
> >
> >
> >  
> >
> >
> > While worth pursuing, I think we would need a different CEP just to figure
> > out how to do that. Not only is there a lot of infrastructure difficulty in
> > running multi process, the inter app communication needs to be figured out
> > better then JMX. Even the sidecar we dont have a solid story on how to
> > ensure both are running or anything yet. It's up to each app owner to figure
> > it out. Once we have a good thing in place I think we can start moving
> > compactions, repairs, etc out of the database. Even then it's the _repairs_
> > that is expensive, not the scheduling.
> >
> >  
> >
> >
> > On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan
> > <[[email protected]](mailto:[email protected])> wrote:  
> >
> >
> 
> >> I love the idea of a repair service being there by default for an install
> of C*.  My main concern here is that it is putting more services into the main
> database process.  I actually think we should be looking at how we can move
> things out of the database process.  The C* process being a giant monolith has
> always been a pain point.  Is there anyway it makes sense for this to be an
> external process rather than a new thread pool inside the C* process?
> 
> >>
> 
> >>  
> >
> >>
> 
> >> -Jeremiah Jordan
> 
> >>
> 
> >>  
> >
> >>
> 
> >> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>
> 
> >>>  
> >
> >>>
> 
> >>> This is looking strong, thanks Jaydeep.
> 
> >>>
> 
> >>>  
> >
> >>>
> 
> >>> I would suggest folk take a look at the design doc and the PR in the CEP.
> A lot is there (that I have completely missed).
> 
> >>>
> 
> >>>  
> >
> >>>
> 
> >>> I would especially ask all authors of prior art (Reaper, DSE nodesync,
> ecchronos)  to take a final review of the proposal  
> >
> >>>
> 
> >>>  
> >
> >>>
> 
> >>> Jaydeep, can we ask for a two week window while we reach out to these
> people ?  There's a lot of prior art in this space, and it feels like we're in
> a good place now where it's clear this has legs and we can use that to bring
> folk in and make sure there's no remaining blindspots.
> 
> >>>
> 
> >>>  
> >
> >>>
> 
> >>>  
> >
> >>>
> 
> >>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> >>>
> 
>  Sorry, there is a typo in the CEP-37 link; here is the correct
> [link](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution)
> 
> 
> 
>   
> >
> 
> 
>   
> >
> 
> 
>  On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia
> <[[email protected]](mailto:[email protected])> wrote:  
> >
> 
> 
> > First, thank you for your patience while we strengthened the CEP-37.
> 
> >
> 
> >  
> >
> >
> 
> > Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37) to
> come up with the best possible design that not only significantly simplifies
> repair operations but also includes the most common features that everyone
> will benefit from running at Scale.
> 
> >
> 
> > For example,
> 
> >
> 
> >   * Apache Cassandra must be capable of running multiple repair types,
> such as Full, Incremental, Paxos, and Preview - so the framework should be
> easily extendable with no additional overhead from the operator’s point of
> view.
> 
> >
> 
> >   * An easy way to extend the token-split calculation algorithm with a
> default implementation should exist.
> 
> >
> 
> >   * Running incremental repair reliably at Scale is pretty challenging,
> so we need to place safeguards, such as migration/rollback w/o restart and
> stopping incremental repair automatically if the disk is about to get full.
> 
> >
> 
> >
> 
> >
> 
> > We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is
> now officially ready for review after multiple rounds of design, testing, code
> reviews, documentation reviews, and, more importantly, validation that it runs
> at Scale!
> 
> >
> 
> >  
> >
> >
> 
> > Some facts about CEP-37.
> 
> >
> 
> >   * Multiple members have verified

Re: [Discuss] Repair inside C*

2024-10-21 Thread J. D. Jordan
Sounds good. Just wanted to bring it up. I agree that the scheduling bit is pretty light weight and the ideal would be to bring the whole of the repair external, which is a much bigger can of worms to open.-JeremiahOn Oct 21, 2024, at 11:21 AM, Chris Lohfink  wrote:> I actually think we should be looking at how we can move things out of the database process.While worth pursuing, I think we would need a different CEP just to figure out how to do that. Not only is there a lot of infrastructure difficulty in running multi process, the inter app communication needs to be figured out better then JMX. Even the sidecar we dont have a solid story on how to ensure both are running or anything yet. It's up to each app owner to figure it out. Once we have a good thing in place I think we can start moving compactions, repairs, etc out of the database. Even then it's the _repairs_ that is expensive, not the scheduling.On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan  wrote:
I love the idea of a repair service being there by default for an install of C*.  My main concern here is that it is putting more services into the main database process.  I actually think we should be looking at how we can move things out of the database process.  The C* process being a giant monolith has always been a pain point.  Is there anyway it makes sense for this to be an external process rather than a new thread pool inside the C* process?-Jeremiah Jordan


On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever  wrote:

This is looking strong, thanks Jaydeep.I would suggest folk take a look at the design doc and the PR in the CEP.  A lot is there (that I have completely missed).I would especially ask all authors of prior art (Reaper, DSE nodesync, ecchronos)  to take a final review of the proposalJaydeep, can we ask for a two week window while we reach out to these people ?  There's a lot of prior art in this space, and it feels like we're in a good place now where it's clear this has legs and we can use that to bring folk in and make sure there's no remaining blindspots.On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia  wrote:Sorry, there is a typo in the CEP-37 link; here is the correct linkOn Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia  wrote:First, thank you for your patience while we strengthened the CEP-37.Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie, Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37) to come up with the best possible design that not only significantly simplifies repair operations but also includes the most common features that everyone will benefit from running at Scale. For example,Apache Cassandra must be capable of running multiple repair types, such as Full, Incremental, Paxos, and Preview - so the framework should be easily extendable with no additional overhead from the operator’s point of view.An easy way to extend the token-split calculation algorithm with a default implementation should exist.Running incremental repair reliably at Scale is pretty challenging, so we need to place safeguards, such as migration/rollback w/o restart and stopping incremental repair automatically if the disk is about to get full.We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is now officially ready for review after multiple rounds of design, testing, code reviews, documentation reviews, and, more importantly, validation that it runs at Scale!Some facts about CEP-37.Multiple members have verified all aspects of CEP-37 numerous times.The design proposed in CEP-37 has been thoroughly tried and tested on an immense scale (hundreds of unique Cassandra clusters, tens of thousands of Cassandra nodes, with tens of millions of QPS) on top of 4.1 open-source for more than five years; please see more details here.The following presentation highlights the rigorous applied to CEP-37, which was given during last week’s Apache Cassandra Bay Area Meetup,Since things are massively overhauled, we believe it is almost ready for a final pass pre-VOTE. We would like you to please review the CEP-37 and the associated detailed design doc.Thank you everyone!Chris, Andy, Josh, Dinesh, Kristijonas, and JaydeepOn Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie  wrote:Not quite; finishing touches on the CEP and design doc are in flight (as of last / this week).Soon(tm).On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:Is this CEP ready for a VOTE thread? https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+SolutionOn Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia  wrote:Thanks, Josh. I've just updated the CEP and included all the solutions you mentioned below.  JaydeepOn Thu, Feb 22, 2024 at 9:33 AM Josh Mc

Re: [Discuss] Repair inside C*

2024-10-21 Thread Chris Lohfink
> I actually think we should be looking at how we can move things out of
the database process.

While worth pursuing, I think we would need a different CEP just to figure
out how to do that. Not only is there a lot of infrastructure difficulty in
running multi process, the inter app communication needs to be figured out
better then JMX. Even the sidecar we dont have a solid story on how to
ensure both are running or anything yet. It's up to each app owner to
figure it out. Once we have a good thing in place I think we can start
moving compactions, repairs, etc out of the database. Even then it's the
_repairs_ that is expensive, not the scheduling.

On Mon, Oct 21, 2024 at 9:45 AM Jeremiah Jordan 
wrote:

> I love the idea of a repair service being there by default for an install
> of C*.  My main concern here is that it is putting more services into the
> main database process.  I actually think we should be looking at how we can
> move things out of the database process.  The C* process being a giant
> monolith has always been a pain point.  Is there anyway it makes sense for
> this to be an external process rather than a new thread pool inside the C*
> process?
>
> -Jeremiah Jordan
>
> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever  wrote:
>
>>
>> This is looking strong, thanks Jaydeep.
>>
>> I would suggest folk take a look at the design doc and the PR in the
>> CEP.  A lot is there (that I have completely missed).
>>
>> I would especially ask all authors of prior art (Reaper, DSE nodesync,
>> ecchronos)  to take a final review of the proposal
>>
>> Jaydeep, can we ask for a two week window while we reach out to these
>> people ?  There's a lot of prior art in this space, and it feels like we're
>> in a good place now where it's clear this has legs and we can use that to
>> bring folk in and make sure there's no remaining blindspots.
>>
>>
>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia <
>> [email protected]> wrote:
>>
>>> Sorry, there is a typo in the CEP-37 link; here is the correct link
>>> 
>>>
>>>
>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia <
>>> [email protected]> wrote:
>>>
 First, thank you for your patience while we strengthened the CEP-37.


 Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
 Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
 discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37)
 to come up with the best possible design that not only significantly
 simplifies repair operations but also includes the most common features
 that everyone will benefit from running at Scale.

 For example,

-

Apache Cassandra must be capable of running multiple repair types,
such as Full, Incremental, Paxos, and Preview - so the framework should 
 be
easily extendable with no additional overhead from the operator’s point 
 of
view.
-

An easy way to extend the token-split calculation algorithm with a
default implementation should exist.
-

Running incremental repair reliably at Scale is pretty challenging,
so we need to place safeguards, such as migration/rollback w/o restart 
 and
stopping incremental repair automatically if the disk is about to get 
 full.

 We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra)
 is now officially ready for review after multiple rounds of design,
 testing, code reviews, documentation reviews, and, more importantly,
 validation that it runs at Scale!


 Some facts about CEP-37.

-

Multiple members have verified all aspects of CEP-37 numerous times.
-

The design proposed in CEP-37 has been thoroughly tried and tested
on an immense scale (hundreds of unique Cassandra clusters, tens of
thousands of Cassandra nodes, with tens of millions of QPS) on top of 
 4.1
open-source for more than five years; please see more details here

 
.
-

The following presentation

 
highlights the rigorous applied to CEP-37, which was given during last
week’s Apache Cassandra Bay Area Meetup

,


 Since things are massively overhauled, we believe it is almost ready
 for a final pass pre-VOTE. We would like you to please review the
 CEP-37
 

Re: [Discuss] Repair inside C*

2024-10-21 Thread David Capwell
> Is there anyway it makes sense for this to be an external process rather than 
> a new thread pool inside the C* process?

One thing to keep in mind is that larger clusters require you “smartly” split 
the ranges else you nuke your cluster… knowing how to split requires internal 
knowledge from the database which we could expose, but then we need to expose a 
new public API (most likely a set of APIs) just to do this.  When you do the 
scheduling internal to the database you can make “breaking” changes that 
improve stability into a patch fix rather than have to wait for the next major…

To me this problem is the main reason I am in favor of repair scheduling being 
inside the database… 


> On Oct 21, 2024, at 8:55 AM, Josh McKenzie  wrote:
> 
>> Is there anyway it makes sense for this to be an external process rather 
>> than a new thread pool inside the C* process?
> I'm personally more irked by the merkle tree building / streaming / merging / 
> etc resource utilization being in the primary C* process. My intuition is 
> that the scheduling of things is so lightweight as to be a non-issue when it 
> comes to impact on reads and writes.
> 
> That said, if you're more alluding to a meta conversation about the 
> architecture of the DB and whether having a monolithic :allthethings: process 
> is preferable to breaking things apart, well, that's an entirely different 
> conversation on which I have... different thoughts. :D
> 
> On Mon, Oct 21, 2024, at 10:44 AM, Jeremiah Jordan wrote:
>> I love the idea of a repair service being there by default for an install of 
>> C*.  My main concern here is that it is putting more services into the main 
>> database process.  I actually think we should be looking at how we can move 
>> things out of the database process.  The C* process being a giant monolith 
>> has always been a pain point.  Is there anyway it makes sense for this to be 
>> an external process rather than a new thread pool inside the C* process?
>> 
>> -Jeremiah Jordan
>> 
>> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever > > wrote:
>>> 
>>> This is looking strong, thanks Jaydeep.
>>> 
>>> I would suggest folk take a look at the design doc and the PR in the CEP.  
>>> A lot is there (that I have completely missed).
>>> 
>>> I would especially ask all authors of prior art (Reaper, DSE nodesync, 
>>> ecchronos)  to take a final review of the proposal
>>> 
>>> Jaydeep, can we ask for a two week window while we reach out to these 
>>> people ?  There's a lot of prior art in this space, and it feels like we're 
>>> in a good place now where it's clear this has legs and we can use that to 
>>> bring folk in and make sure there's no remaining blindspots.
>>> 
>>> 
>>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia >> > wrote:
>>> Sorry, there is a typo in the CEP-37 link; here is the correct link 
>>> 
>>> 
>>> 
>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia 
>>> mailto:[email protected]>> wrote:
>>> First, thank you for your patience while we strengthened the CEP-37.
>>> 
>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie, 
>>> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online 
>>> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37) 
>>> to come up with the best possible design that not only significantly 
>>> simplifies repair operations but also includes the most common features 
>>> that everyone will benefit from running at Scale. 
>>> For example,
>>> Apache Cassandra must be capable of running multiple repair types, such as 
>>> Full, Incremental, Paxos, and Preview - so the framework should be easily 
>>> extendable with no additional overhead from the operator’s point of view.
>>> An easy way to extend the token-split calculation algorithm with a default 
>>> implementation should exist.
>>> Running incremental repair reliably at Scale is pretty challenging, so we 
>>> need to place safeguards, such as migration/rollback w/o restart and 
>>> stopping incremental repair automatically if the disk is about to get full.
>>> 
>>> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is 
>>> now officially ready for review after multiple rounds of design, testing, 
>>> code reviews, documentation reviews, and, more importantly, validation that 
>>> it runs at Scale!
>>> 
>>> Some facts about CEP-37.
>>> Multiple members have verified all aspects of CEP-37 numerous times.
>>> The design proposed in CEP-37 has been thoroughly tried and tested on an 
>>> immense scale (hundreds of unique Cassandra clusters, tens of thousands of 
>>> Cassandra nodes, with tens of millions of QPS) on top of 4.1 open-source 
>>> for more than five years; please see more detailshere 
>>> 

Re: [Discuss] Repair inside C*

2024-10-21 Thread Josh McKenzie
> Is there anyway it makes sense for this to be an external process rather than 
> a new thread pool inside the C* process?
I'm personally more irked by the merkle tree building / streaming / merging / 
etc resource utilization being in the primary C* process. My intuition is that 
the *scheduling* of things is so lightweight as to be a non-issue when it comes 
to impact on reads and writes.

That said, if you're more alluding to a meta conversation about the 
*architecture* of the DB and whether having a monolithic :allthethings: process 
is preferable to breaking things apart, well, that's an entirely different 
conversation on which I have... different thoughts. :D

On Mon, Oct 21, 2024, at 10:44 AM, Jeremiah Jordan wrote:
> I love the idea of a repair service being there by default for an install of 
> C*.  My main concern here is that it is putting more services into the main 
> database process.  I actually think we should be looking at how we can move 
> things out of the database process.  The C* process being a giant monolith 
> has always been a pain point.  Is there anyway it makes sense for this to be 
> an external process rather than a new thread pool inside the C* process?
> 
> -Jeremiah Jordan
> 
> On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever  wrote:
>> 
>> This is looking strong, thanks Jaydeep.
>> 
>> I would suggest folk take a look at the design doc and the PR in the CEP.  A 
>> lot is there (that I have completely missed).
>> 
>> I would especially ask all authors of prior art (Reaper, DSE nodesync, 
>> ecchronos)  to take a final review of the proposal
>> 
>> Jaydeep, can we ask for a two week window while we reach out to these people 
>> ?  There's a lot of prior art in this space, and it feels like we're in a 
>> good place now where it's clear this has legs and we can use that to bring 
>> folk in and make sure there's no remaining blindspots.
>> 
>> 
>> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia  
>> wrote:
>>> Sorry, there is a typo in the CEP-37 link; here is the correct link 
>>> 
>>> 
>>> 
>>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia 
>>>  wrote:
 First, thank you for your patience while we strengthened the CEP-37.
 
 
 
 Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie, 
 Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online 
 discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37) 
 to come up with the best possible design that not only significantly 
 simplifies repair operations but also includes the most common features 
 that everyone will benefit from running at Scale. 
 
 For example,
 
  • Apache Cassandra must be capable of running multiple repair types, such 
 as Full, Incremental, Paxos, and Preview - so the framework should be 
 easily extendable with no additional overhead from the operator’s point of 
 view.
 
  • An easy way to extend the token-split calculation algorithm with a 
 default implementation should exist.
 
  • Running incremental repair reliably at Scale is pretty challenging, so 
 we need to place safeguards, such as migration/rollback w/o restart and 
 stopping incremental repair automatically if the disk is about to get full.
 
 We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is 
 now officially ready for review after multiple rounds of design, testing, 
 code reviews, documentation reviews, and, more importantly, validation 
 that it runs at Scale!
 
 
 
 Some facts about CEP-37.
 
  • Multiple members have verified all aspects of CEP-37 numerous times.
 
  • The design proposed in CEP-37 has been thoroughly tried and tested on 
 an immense scale (hundreds of unique Cassandra clusters, tens of thousands 
 of Cassandra nodes, with tens of millions of QPS) on top of 4.1 
 open-source for more than five years; please see more details _here_ 
 .
 
  • The following _presentation_ 
 
  highlights the rigorous applied to CEP-37, which was given during last 
 week’s Apache Cassandra Bay Area _Meetup_ 
 ,
 
 
 Since things are massively overhauled, we believe it is almost ready for a 
 final pass pre-VOTE. We would like you to please review the _CEP-37_ 
 
  and the associated detailed design _doc_ 
 

Re: [Discuss] Repair inside C*

2024-10-21 Thread Jeremiah Jordan
 I love the idea of a repair service being there by default for an install
of C*.  My main concern here is that it is putting more services into the
main database process.  I actually think we should be looking at how we can
move things out of the database process.  The C* process being a giant
monolith has always been a pain point.  Is there anyway it makes sense for
this to be an external process rather than a new thread pool inside the C*
process?

-Jeremiah Jordan

On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever  wrote:

>
> This is looking strong, thanks Jaydeep.
>
> I would suggest folk take a look at the design doc and the PR in the CEP.
> A lot is there (that I have completely missed).
>
> I would especially ask all authors of prior art (Reaper, DSE nodesync,
> ecchronos)  to take a final review of the proposal
>
> Jaydeep, can we ask for a two week window while we reach out to these
> people ?  There's a lot of prior art in this space, and it feels like we're
> in a good place now where it's clear this has legs and we can use that to
> bring folk in and make sure there's no remaining blindspots.
>
>
> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia 
> wrote:
>
>> Sorry, there is a typo in the CEP-37 link; here is the correct link
>> 
>>
>>
>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia <
>> [email protected]> wrote:
>>
>>> First, thank you for your patience while we strengthened the CEP-37.
>>>
>>>
>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
>>> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
>>> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37)
>>> to come up with the best possible design that not only significantly
>>> simplifies repair operations but also includes the most common features
>>> that everyone will benefit from running at Scale.
>>>
>>> For example,
>>>
>>>-
>>>
>>>Apache Cassandra must be capable of running multiple repair types,
>>>such as Full, Incremental, Paxos, and Preview - so the framework should 
>>> be
>>>easily extendable with no additional overhead from the operator’s point 
>>> of
>>>view.
>>>-
>>>
>>>An easy way to extend the token-split calculation algorithm with a
>>>default implementation should exist.
>>>-
>>>
>>>Running incremental repair reliably at Scale is pretty challenging,
>>>so we need to place safeguards, such as migration/rollback w/o restart 
>>> and
>>>stopping incremental repair automatically if the disk is about to get 
>>> full.
>>>
>>> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is
>>> now officially ready for review after multiple rounds of design, testing,
>>> code reviews, documentation reviews, and, more importantly, validation that
>>> it runs at Scale!
>>>
>>>
>>> Some facts about CEP-37.
>>>
>>>-
>>>
>>>Multiple members have verified all aspects of CEP-37 numerous times.
>>>-
>>>
>>>The design proposed in CEP-37 has been thoroughly tried and tested
>>>on an immense scale (hundreds of unique Cassandra clusters, tens of
>>>thousands of Cassandra nodes, with tens of millions of QPS) on top of 4.1
>>>open-source for more than five years; please see more details here
>>>
>>> 
>>>.
>>>-
>>>
>>>The following presentation
>>>
>>> 
>>>highlights the rigorous applied to CEP-37, which was given during last
>>>week’s Apache Cassandra Bay Area Meetup
>>>,
>>>
>>>
>>> Since things are massively overhauled, we believe it is almost ready for
>>> a final pass pre-VOTE. We would like you to please review the CEP-37
>>> 
>>> and the associated detailed design doc
>>> 
>>> .
>>>
>>> Thank you everyone!
>>>
>>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep
>>>
>>>
>>>
>>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie 
>>> wrote:
>>>
 Not quite; finishing touches on the CEP and design doc are in flight
 (as of last / this week).

 Soon(tm).

 On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:

 Is this CEP ready for a VOTE thread?
 https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution

 On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia <
 [email protected]> wrote:

 Thanks, Josh. I've just updated the CEP
 

Re: [Discuss] Repair inside C*

2024-10-18 Thread Jaydeep Chovatia
Mick, I am highly sorry to mispronounce your name.

Indeed, Mick  - two additional weeks is not an issue at all.

Jaydeep

On Fri, Oct 18, 2024 at 1:41 PM Jaydeep Chovatia 
wrote:

> Indeed, Mike - two additional weeks is not an issue at all.
> Thanks!
>
> Jaydeep
>


Re: [Discuss] Repair inside C*

2024-10-18 Thread Jaydeep Chovatia
Indeed, Mike - two additional weeks is not an issue at all.
Thanks!

Jaydeep


Re: [Discuss] Repair inside C*

2024-10-18 Thread Mick Semb Wever
This is looking strong, thanks Jaydeep.

I would suggest folk take a look at the design doc and the PR in the CEP.
A lot is there (that I have completely missed).

I would especially ask all authors of prior art (Reaper, DSE nodesync,
ecchronos)  to take a final review of the proposal

Jaydeep, can we ask for a two week window while we reach out to these
people ?  There's a lot of prior art in this space, and it feels like we're
in a good place now where it's clear this has legs and we can use that to
bring folk in and make sure there's no remaining blindspots.


On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia 
wrote:

> Sorry, there is a typo in the CEP-37 link; here is the correct link
> 
>
>
> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia <
> [email protected]> wrote:
>
>> First, thank you for your patience while we strengthened the CEP-37.
>>
>>
>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
>> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
>> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37)
>> to come up with the best possible design that not only significantly
>> simplifies repair operations but also includes the most common features
>> that everyone will benefit from running at Scale.
>>
>> For example,
>>
>>-
>>
>>Apache Cassandra must be capable of running multiple repair types,
>>such as Full, Incremental, Paxos, and Preview - so the framework should be
>>easily extendable with no additional overhead from the operator’s point of
>>view.
>>-
>>
>>An easy way to extend the token-split calculation algorithm with a
>>default implementation should exist.
>>-
>>
>>Running incremental repair reliably at Scale is pretty challenging,
>>so we need to place safeguards, such as migration/rollback w/o restart and
>>stopping incremental repair automatically if the disk is about to get 
>> full.
>>
>> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is
>> now officially ready for review after multiple rounds of design, testing,
>> code reviews, documentation reviews, and, more importantly, validation that
>> it runs at Scale!
>>
>>
>> Some facts about CEP-37.
>>
>>-
>>
>>Multiple members have verified all aspects of CEP-37 numerous times.
>>-
>>
>>The design proposed in CEP-37 has been thoroughly tried and tested on
>>an immense scale (hundreds of unique Cassandra clusters, tens of thousands
>>of Cassandra nodes, with tens of millions of QPS) on top of 4.1 
>> open-source
>>for more than five years; please see more details here
>>
>> 
>>.
>>-
>>
>>The following presentation
>>
>> 
>>highlights the rigorous applied to CEP-37, which was given during last
>>week’s Apache Cassandra Bay Area Meetup
>>,
>>
>>
>> Since things are massively overhauled, we believe it is almost ready for
>> a final pass pre-VOTE. We would like you to please review the CEP-37
>> 
>> and the associated detailed design doc
>> 
>> .
>>
>> Thank you everyone!
>>
>> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep
>>
>>
>>
>> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie 
>> wrote:
>>
>>> Not quite; finishing touches on the CEP and design doc are in flight (as
>>> of last / this week).
>>>
>>> Soon(tm).
>>>
>>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
>>>
>>> Is this CEP ready for a VOTE thread?
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution
>>>
>>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia <
>>> [email protected]> wrote:
>>>
>>> Thanks, Josh. I've just updated the CEP
>>> 
>>> and included all the solutions you mentioned below.
>>>
>>> Jaydeep
>>>
>>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie 
>>> wrote:
>>>
>>>
>>> Very late response from me here (basically necro'ing this thread).
>>>
>>> I think it'd be useful to get this condensed into a CEP that we can then
>>> discuss in that format. It's clearly something we all agree we need and
>>> having an implementation that works, even if it's not in your preferred
>>> execution domain, is vastly better than nothing IMO.
>>>
>>> I don't have cycles (nor background ;) ) to do that, but it s

Re: [Discuss] Repair inside C*

2024-10-17 Thread Jaydeep Chovatia
Sorry, there is a typo in the CEP-37 link; here is the correct link



On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia 
wrote:

> First, thank you for your patience while we strengthened the CEP-37.
>
>
> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37)
> to come up with the best possible design that not only significantly
> simplifies repair operations but also includes the most common features
> that everyone will benefit from running at Scale.
>
> For example,
>
>-
>
>Apache Cassandra must be capable of running multiple repair types,
>such as Full, Incremental, Paxos, and Preview - so the framework should be
>easily extendable with no additional overhead from the operator’s point of
>view.
>-
>
>An easy way to extend the token-split calculation algorithm with a
>default implementation should exist.
>-
>
>Running incremental repair reliably at Scale is pretty challenging, so
>we need to place safeguards, such as migration/rollback w/o restart and
>stopping incremental repair automatically if the disk is about to get full.
>
> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is
> now officially ready for review after multiple rounds of design, testing,
> code reviews, documentation reviews, and, more importantly, validation that
> it runs at Scale!
>
>
> Some facts about CEP-37.
>
>-
>
>Multiple members have verified all aspects of CEP-37 numerous times.
>-
>
>The design proposed in CEP-37 has been thoroughly tried and tested on
>an immense scale (hundreds of unique Cassandra clusters, tens of thousands
>of Cassandra nodes, with tens of millions of QPS) on top of 4.1 open-source
>for more than five years; please see more details here
>
> 
>.
>-
>
>The following presentation
>
> 
>highlights the rigorous applied to CEP-37, which was given during last
>week’s Apache Cassandra Bay Area Meetup
>,
>
>
> Since things are massively overhauled, we believe it is almost ready for a
> final pass pre-VOTE. We would like you to please review the CEP-37
> 
> and the associated detailed design doc
> 
> .
>
> Thank you everyone!
>
> Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep
>
>
>
> On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie 
> wrote:
>
>> Not quite; finishing touches on the CEP and design doc are in flight (as
>> of last / this week).
>>
>> Soon(tm).
>>
>> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
>>
>> Is this CEP ready for a VOTE thread?
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution
>>
>> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia <
>> [email protected]> wrote:
>>
>> Thanks, Josh. I've just updated the CEP
>> 
>> and included all the solutions you mentioned below.
>>
>> Jaydeep
>>
>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie 
>> wrote:
>>
>>
>> Very late response from me here (basically necro'ing this thread).
>>
>> I think it'd be useful to get this condensed into a CEP that we can then
>> discuss in that format. It's clearly something we all agree we need and
>> having an implementation that works, even if it's not in your preferred
>> execution domain, is vastly better than nothing IMO.
>>
>> I don't have cycles (nor background ;) ) to do that, but it sounds like
>> you do Jaydeep given the implementation you have on a private fork + design.
>>
>> A non-exhaustive list of things that might be useful incorporating into
>> or referencing from a CEP:
>> Slack thread:
>> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> Joey's old C* ticket:
>> https://issues.apache.org/jira/browse/CASSANDRA-14346
>> Even older automatic repair scheduling:
>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>> Your design gdoc:
>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>> PR with automated repair:
>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>
>> My intuition is that we're all basically in agreeme

Re: [Discuss] Repair inside C*

2024-10-17 Thread Jaydeep Chovatia
First, thank you for your patience while we strengthened the CEP-37.


Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37)
to come up with the best possible design that not only significantly
simplifies repair operations but also includes the most common features
that everyone will benefit from running at Scale.

For example,

   -

   Apache Cassandra must be capable of running multiple repair types, such
   as Full, Incremental, Paxos, and Preview - so the framework should be
   easily extendable with no additional overhead from the operator’s point of
   view.
   -

   An easy way to extend the token-split calculation algorithm with a
   default implementation should exist.
   -

   Running incremental repair reliably at Scale is pretty challenging, so
   we need to place safeguards, such as migration/rollback w/o restart and
   stopping incremental repair automatically if the disk is about to get full.

We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is
now officially ready for review after multiple rounds of design, testing,
code reviews, documentation reviews, and, more importantly, validation that
it runs at Scale!


Some facts about CEP-37.

   -

   Multiple members have verified all aspects of CEP-37 numerous times.
   -

   The design proposed in CEP-37 has been thoroughly tried and tested on an
   immense scale (hundreds of unique Cassandra clusters, tens of thousands of
   Cassandra nodes, with tens of millions of QPS) on top of 4.1 open-source
   for more than five years; please see more details here
   

   .
   -

   The following presentation
   

   highlights the rigorous applied to CEP-37, which was given during last
   week’s Apache Cassandra Bay Area Meetup
   ,


Since things are massively overhauled, we believe it is almost ready for a
final pass pre-VOTE. We would like you to please review the CEP-37

and the associated detailed design doc

.

Thank you everyone!

Chris, Andy, Josh, Dinesh, Kristijonas, and Jaydeep



On Thu, Sep 19, 2024 at 11:26 AM Josh McKenzie  wrote:

> Not quite; finishing touches on the CEP and design doc are in flight (as
> of last / this week).
>
> Soon(tm).
>
> On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
>
> Is this CEP ready for a VOTE thread?
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution
>
> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia <
> [email protected]> wrote:
>
> Thanks, Josh. I've just updated the CEP
> 
> and included all the solutions you mentioned below.
>
> Jaydeep
>
> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie 
> wrote:
>
>
> Very late response from me here (basically necro'ing this thread).
>
> I think it'd be useful to get this condensed into a CEP that we can then
> discuss in that format. It's clearly something we all agree we need and
> having an implementation that works, even if it's not in your preferred
> execution domain, is vastly better than nothing IMO.
>
> I don't have cycles (nor background ;) ) to do that, but it sounds like
> you do Jaydeep given the implementation you have on a private fork + design.
>
> A non-exhaustive list of things that might be useful incorporating into or
> referencing from a CEP:
> Slack thread:
> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> Joey's old C* ticket:
> https://issues.apache.org/jira/browse/CASSANDRA-14346
> Even older automatic repair scheduling:
> https://issues.apache.org/jira/browse/CASSANDRA-10070
> Your design gdoc:
> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
> PR with automated repair:
> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>
> My intuition is that we're all basically in agreement that this is
> something the DB needs, we're all willing to bikeshed for our personal
> preference on where it lives and how it's implemented, and at the end of
> the day, code talks. I don't think anyone's said they'll die on the hill of
> implementation details, so that feels like CEP time to me.
>
> If you were willing and able to get a CEP together for automated repair
> based on the above mater

Re: [Discuss] Repair inside C*

2024-09-19 Thread Josh McKenzie
Not quite; finishing touches on the CEP and design doc are in flight (as of 
last / this week).

Soon(tm).

On Thu, Sep 19, 2024, at 2:07 PM, Patrick McFadin wrote:
> Is this CEP ready for a VOTE thread? 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution
> 
> On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia 
>  wrote:
>> Thanks, Josh. I've just updated the CEP 
>> 
>>  and included all the solutions you mentioned below.  
>> 
>> Jaydeep
>> 
>> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie  wrote:
>>> __
>>> Very late response from me here (basically necro'ing this thread).
>>> 
>>> I think it'd be useful to get this condensed into a CEP that we can then 
>>> discuss in that format. It's clearly something we all agree we need and 
>>> having an implementation that works, even if it's not in your preferred 
>>> execution domain, is vastly better than nothing IMO.
>>> 
>>> I don't have cycles (nor background ;) ) to do that, but it sounds like you 
>>> do Jaydeep given the implementation you have on a private fork + design.
>>> 
>>> A non-exhaustive list of things that might be useful incorporating into or 
>>> referencing from a CEP:
>>> Slack thread: https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>> Joey's old C* ticket: https://issues.apache.org/jira/browse/CASSANDRA-14346
>>> Even older automatic repair scheduling: 
>>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>>> Your design gdoc: 
>>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>>> PR with automated repair: 
>>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>> 
>>> My intuition is that we're all basically in agreement that this is 
>>> something the DB needs, we're all willing to bikeshed for our personal 
>>> preference on where it lives and how it's implemented, and at the end of 
>>> the day, code talks. I don't think anyone's said they'll die on the hill of 
>>> implementation details, so that feels like CEP time to me.
>>> 
>>> If you were willing and able to get a CEP together for automated repair 
>>> based on the above material, given you've done the work and have the proof 
>>> points it's working at scale, I think this would be a *huge contribution* 
>>> to the community.
>>> 
>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
 Is anyone going to file an official CEP for this?
 As mentioned in this email thread, here is one of the solution's design 
 doc 
 
  and source code on a private Apache Cassandra patch. Could you go through 
 it and let me know what you think?
 
 Jaydeep
 
 On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad  
 wrote:
> > That said I would happily support an effort to bring repair scheduling 
> > to the sidecar immediately. This has nothing blocking it, and would 
> > potentially enable the sidecar to provide an official repair scheduling 
> > solution that is compatible with current or even previous versions of 
> > the database.
> 
> This is something I hadn't thought much about, and is a pretty good 
> argument for using the sidecar initially.  There's a lot of deployments 
> out there and having an official repair option would be a big win. 
> 
> 
> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> > I agree that it would be ideal for Cassandra to have a repair scheduler 
> > in-DB.
> >
> > That said I would happily support an effort to bring repair scheduling 
> > to the sidecar immediately. This has nothing blocking it, and would 
> > potentially enable the sidecar to provide an official repair scheduling 
> > solution that is compatible with current or even previous versions of 
> > the database.
> >
> > Once TCM has landed, we’ll have much stronger primitives for repair 
> > orchestration in the database itself. But I don’t think that should 
> > block progress on a repair scheduling solution in the sidecar, and 
> > there is nothing that would prevent someone from continuing to use a 
> > sidecar-based solution in perpetuity if they preferred.
> >
> > - Scott
> >
> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad  
> > > wrote:
> > >
> > > I'm 100% in favor of repair being part of the core DB, not the 
> > > sidecar.  The current (and past) state of things where running the DB 
> > > correctly *requires* running a separate process (either community 
> > > maintained or official C* sidecar) is incredibly painful for folks.  
> > > The idea that your data integrity needs to be opt-in has never made 

Re: [Discuss] Repair inside C*

2024-09-19 Thread Patrick McFadin
Is this CEP ready for a VOTE thread?
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution

On Sun, Feb 25, 2024 at 12:25 PM Jaydeep Chovatia <
[email protected]> wrote:

> Thanks, Josh. I've just updated the CEP
> 
> and included all the solutions you mentioned below.
>
> Jaydeep
>
> On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie 
> wrote:
>
>> Very late response from me here (basically necro'ing this thread).
>>
>> I think it'd be useful to get this condensed into a CEP that we can then
>> discuss in that format. It's clearly something we all agree we need and
>> having an implementation that works, even if it's not in your preferred
>> execution domain, is vastly better than nothing IMO.
>>
>> I don't have cycles (nor background ;) ) to do that, but it sounds like
>> you do Jaydeep given the implementation you have on a private fork + design.
>>
>> A non-exhaustive list of things that might be useful incorporating into
>> or referencing from a CEP:
>> Slack thread:
>> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> Joey's old C* ticket:
>> https://issues.apache.org/jira/browse/CASSANDRA-14346
>> Even older automatic repair scheduling:
>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>> Your design gdoc:
>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>> PR with automated repair:
>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>
>> My intuition is that we're all basically in agreement that this is
>> something the DB needs, we're all willing to bikeshed for our personal
>> preference on where it lives and how it's implemented, and at the end of
>> the day, code talks. I don't think anyone's said they'll die on the hill of
>> implementation details, so that feels like CEP time to me.
>>
>> If you were willing and able to get a CEP together for automated repair
>> based on the above material, given you've done the work and have the proof
>> points it's working at scale, I think this would be a *huge contribution*
>> to the community.
>>
>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>>
>> Is anyone going to file an official CEP for this?
>> As mentioned in this email thread, here is one of the solution's design
>> doc
>> 
>> and source code on a private Apache Cassandra patch. Could you go through
>> it and let me know what you think?
>>
>> Jaydeep
>>
>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad 
>> wrote:
>>
>> > That said I would happily support an effort to bring repair scheduling
>> to the sidecar immediately. This has nothing blocking it, and would
>> potentially enable the sidecar to provide an official repair scheduling
>> solution that is compatible with current or even previous versions of the
>> database.
>>
>> This is something I hadn't thought much about, and is a pretty good
>> argument for using the sidecar initially.  There's a lot of deployments out
>> there and having an official repair option would be a big win.
>>
>>
>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
>> > I agree that it would be ideal for Cassandra to have a repair scheduler
>> in-DB.
>> >
>> > That said I would happily support an effort to bring repair scheduling
>> to the sidecar immediately. This has nothing blocking it, and would
>> potentially enable the sidecar to provide an official repair scheduling
>> solution that is compatible with current or even previous versions of the
>> database.
>> >
>> > Once TCM has landed, we’ll have much stronger primitives for repair
>> orchestration in the database itself. But I don’t think that should block
>> progress on a repair scheduling solution in the sidecar, and there is
>> nothing that would prevent someone from continuing to use a sidecar-based
>> solution in perpetuity if they preferred.
>> >
>> > - Scott
>> >
>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad 
>> wrote:
>> > >
>> > > I'm 100% in favor of repair being part of the core DB, not the
>> sidecar.  The current (and past) state of things where running the DB
>> correctly *requires* running a separate process (either community
>> maintained or official C* sidecar) is incredibly painful for folks.  The
>> idea that your data integrity needs to be opt-in has never made sense to me
>> from the perspective of either the product or the end user.
>> > >
>> > > I've worked with way too many teams that have either configured this
>> incorrectly or not at all.
>> > >
>> > > Ideally Cassandra would ship with repair built in and on by default.
>> Power users can disable if they want to continue to maintain their own
>> repair tooling for some reason.
>> > >
>> > > Jon
>> > >
>> > >> On

Re: [Discuss] Repair inside C*

2024-02-25 Thread Jaydeep Chovatia
Thanks, Josh. I've just updated the CEP

and included all the solutions you mentioned below.

Jaydeep

On Thu, Feb 22, 2024 at 9:33 AM Josh McKenzie  wrote:

> Very late response from me here (basically necro'ing this thread).
>
> I think it'd be useful to get this condensed into a CEP that we can then
> discuss in that format. It's clearly something we all agree we need and
> having an implementation that works, even if it's not in your preferred
> execution domain, is vastly better than nothing IMO.
>
> I don't have cycles (nor background ;) ) to do that, but it sounds like
> you do Jaydeep given the implementation you have on a private fork + design.
>
> A non-exhaustive list of things that might be useful incorporating into or
> referencing from a CEP:
> Slack thread:
> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> Joey's old C* ticket:
> https://issues.apache.org/jira/browse/CASSANDRA-14346
> Even older automatic repair scheduling:
> https://issues.apache.org/jira/browse/CASSANDRA-10070
> Your design gdoc:
> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
> PR with automated repair:
> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>
> My intuition is that we're all basically in agreement that this is
> something the DB needs, we're all willing to bikeshed for our personal
> preference on where it lives and how it's implemented, and at the end of
> the day, code talks. I don't think anyone's said they'll die on the hill of
> implementation details, so that feels like CEP time to me.
>
> If you were willing and able to get a CEP together for automated repair
> based on the above material, given you've done the work and have the proof
> points it's working at scale, I think this would be a *huge contribution*
> to the community.
>
> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>
> Is anyone going to file an official CEP for this?
> As mentioned in this email thread, here is one of the solution's design
> doc
> 
> and source code on a private Apache Cassandra patch. Could you go through
> it and let me know what you think?
>
> Jaydeep
>
> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad 
> wrote:
>
> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
>
> This is something I hadn't thought much about, and is a pretty good
> argument for using the sidecar initially.  There's a lot of deployments out
> there and having an official repair option would be a big win.
>
>
> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> > I agree that it would be ideal for Cassandra to have a repair scheduler
> in-DB.
> >
> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
> >
> > Once TCM has landed, we’ll have much stronger primitives for repair
> orchestration in the database itself. But I don’t think that should block
> progress on a repair scheduling solution in the sidecar, and there is
> nothing that would prevent someone from continuing to use a sidecar-based
> solution in perpetuity if they preferred.
> >
> > - Scott
> >
> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad 
> wrote:
> > >
> > > I'm 100% in favor of repair being part of the core DB, not the
> sidecar.  The current (and past) state of things where running the DB
> correctly *requires* running a separate process (either community
> maintained or official C* sidecar) is incredibly painful for folks.  The
> idea that your data integrity needs to be opt-in has never made sense to me
> from the perspective of either the product or the end user.
> > >
> > > I've worked with way too many teams that have either configured this
> incorrectly or not at all.
> > >
> > > Ideally Cassandra would ship with repair built in and on by default.
> Power users can disable if they want to continue to maintain their own
> repair tooling for some reason.
> > >
> > > Jon
> > >
> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> > >> All,
> > >> We had a brief discussion in [2] about the Uber article [1] where
> they talk about having integrated repair into Cassandra and how great that
> is. I expressed my disappointment that they didn't work with the community
> on that (Uber, if you are listening time to make amends 🙂

Re: [Discuss] Repair inside C*

2024-02-23 Thread Štefan Miklošovič
There are already some community solutions to scheduled repairs like this
(1), it runs along Cassandra node though ... anyway. I would like to see
that we pick what is the best already out there and try to integrate it
rather than trying to figure it all out again. That seems like a waste of
time and resources. If there is already something which "works" it would be
cool to spend some time first to get as much value from it as possible.

Just my 2 cents here

(1) https://github.com/Ericsson/ecchronos

On Fri, Feb 23, 2024 at 3:31 PM Josh McKenzie  wrote:

> we're all willing to bikeshed for our personal preference on where it
> lives and how it's implemented, and at the end of the day, code talks. I
> don't think anyone's said they'll die on the hill of implementation details
>
>
> :D
>
> I don't think we're going to be able to reach a consensus on an email
> thread with higher level abstractions and indicative statements. For
> instance: "a lot of complexity around repair in the main process" vs. "a
> lot of complexity in signaling between a sidecar and a main process and
> supporting multiple versions of C*". Both resonate with me at face value
> and neither contain enough detail to weigh against one another.
>
> A more granular, lower level CEP that includes a tradeoff of the two
> designs with a recommendation on a path forward might help unstick us from
> the ML back-and-forth.
>
> We could also take an indicative vote on "in-process vs. in-sidecar" to
> see if we can get a read on temperature.
>
> On Thu, Feb 22, 2024, at 2:06 PM, Paulo Motta wrote:
>
> Apologies, I just read the previous message and missed the previous
> discussion on sidecar vs main process on this thread. :-)
>
> It does not look like a final agreement was reached about this and there
> are lots of good arguments for both sides, but perhaps it would be nice to
> agree on this before a CEP is proposed since this will significantly
> influence the initial design?
>
> I tend to agree with Dinesh and Scott's pragmatic stance of providing
> initial support to repair scheduling on the sidecar, since this has fewer
> dependencies, and progressively move what makes sense to the main process
> as TCM/Accord primitives become available and mature.
>
> On Thu, Feb 22, 2024 at 1:44 PM Paulo Motta  wrote:
>
> +1 to Josh's points,  The project has considered native repair scheduling
> for a long time but it was never made a reality due to the complex
> considerations involved and availability of custom implementations/tools
> like cassandra-reaper, which is a popular way of scheduling repairs in
> Cassandra.
>
> Unfortunately I did not have cycles to review this proposal, but it looks
> promising from a quick glance.
>
> One important consideration that I think we need to discuss is: where
> should repair scheduling live: in the main process or the sidecar?
>
> I think there is a lot of complexity around repair in the main process and
> we need to be extra careful about adding additional complexity on top of
> that.
>
> Perhaps this could be a good opportunity to consider the sidecar to host
> repair scheduling, since this looks to be a control plane responsibility?
> One downside is that this would not make repair scheduling available to
> users who do not use the sidecar.
>
> What do you think? It would be great to have input from sidecar
> maintainers if this is something that would make sense for that subproject.
>
> On Thu, Feb 22, 2024 at 12:33 PM Josh McKenzie 
> wrote:
>
>
> Very late response from me here (basically necro'ing this thread).
>
> I think it'd be useful to get this condensed into a CEP that we can then
> discuss in that format. It's clearly something we all agree we need and
> having an implementation that works, even if it's not in your preferred
> execution domain, is vastly better than nothing IMO.
>
> I don't have cycles (nor background ;) ) to do that, but it sounds like
> you do Jaydeep given the implementation you have on a private fork + design.
>
> A non-exhaustive list of things that might be useful incorporating into or
> referencing from a CEP:
> Slack thread:
> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> Joey's old C* ticket:
> https://issues.apache.org/jira/browse/CASSANDRA-14346
> Even older automatic repair scheduling:
> https://issues.apache.org/jira/browse/CASSANDRA-10070
> Your design gdoc:
> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
> PR with automated repair:
> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>
> My intuition is that we're all basically in agreement that this is
> something the DB needs, we're all willing to bikeshed for our personal
> preference on where it lives and how it's implemented, and at the end of
> the day, code talks. I don't think anyone's said they'll die on the hill of
> implementation details, so that feels like CEP time to me.
>
> If you w

Re: [Discuss] Repair inside C*

2024-02-23 Thread Josh McKenzie
> we're all willing to bikeshed for our personal preference on where it lives 
> and how it's implemented, and at the end of the day, code talks. I don't 
> think anyone's said they'll die on the hill of implementation details

:D

I don't think we're going to be able to reach a consensus on an email thread 
with higher level abstractions and indicative statements. For instance: "a lot 
of complexity around repair in the main process" vs. "a lot of complexity in 
signaling between a sidecar and a main process and supporting multiple versions 
of C*". Both resonate with me at face value and neither contain enough detail 
to weigh against one another.

A more granular, lower level CEP that includes a tradeoff of the two designs 
with a recommendation on a path forward might help unstick us from the ML 
back-and-forth.

We could also take an indicative vote on "in-process vs. in-sidecar" to see if 
we can get a read on temperature.

On Thu, Feb 22, 2024, at 2:06 PM, Paulo Motta wrote:
> Apologies, I just read the previous message and missed the previous 
> discussion on sidecar vs main process on this thread. :-)
> 
> It does not look like a final agreement was reached about this and there are 
> lots of good arguments for both sides, but perhaps it would be nice to agree 
> on this before a CEP is proposed since this will significantly influence the 
> initial design?
> 
> I tend to agree with Dinesh and Scott's pragmatic stance of providing initial 
> support to repair scheduling on the sidecar, since this has fewer 
> dependencies, and progressively move what makes sense to the main process as 
> TCM/Accord primitives become available and mature.
> 
> On Thu, Feb 22, 2024 at 1:44 PM Paulo Motta  wrote:
>> +1 to Josh's points,  The project has considered native repair scheduling 
>> for a long time but it was never made a reality due to the complex 
>> considerations involved and availability of custom implementations/tools 
>> like cassandra-reaper, which is a popular way of scheduling repairs in 
>> Cassandra.
>> 
>> Unfortunately I did not have cycles to review this proposal, but it looks 
>> promising from a quick glance.
>> 
>> One important consideration that I think we need to discuss is: where should 
>> repair scheduling live: in the main process or the sidecar?
>> 
>> I think there is a lot of complexity around repair in the main process and 
>> we need to be extra careful about adding additional complexity on top of 
>> that.
>> 
>> Perhaps this could be a good opportunity to consider the sidecar to host 
>> repair scheduling, since this looks to be a control plane responsibility? 
>> One downside is that this would not make repair scheduling available to 
>> users who do not use the sidecar.
>> 
>> What do you think? It would be great to have input from sidecar maintainers 
>> if this is something that would make sense for that subproject.
>> 
>> On Thu, Feb 22, 2024 at 12:33 PM Josh McKenzie  wrote:
>>> __
>>> Very late response from me here (basically necro'ing this thread).
>>> 
>>> I think it'd be useful to get this condensed into a CEP that we can then 
>>> discuss in that format. It's clearly something we all agree we need and 
>>> having an implementation that works, even if it's not in your preferred 
>>> execution domain, is vastly better than nothing IMO.
>>> 
>>> I don't have cycles (nor background ;) ) to do that, but it sounds like you 
>>> do Jaydeep given the implementation you have on a private fork + design.
>>> 
>>> A non-exhaustive list of things that might be useful incorporating into or 
>>> referencing from a CEP:
>>> Slack thread: https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>> Joey's old C* ticket: https://issues.apache.org/jira/browse/CASSANDRA-14346
>>> Even older automatic repair scheduling: 
>>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>>> Your design gdoc: 
>>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>>> PR with automated repair: 
>>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>> 
>>> My intuition is that we're all basically in agreement that this is 
>>> something the DB needs, we're all willing to bikeshed for our personal 
>>> preference on where it lives and how it's implemented, and at the end of 
>>> the day, code talks. I don't think anyone's said they'll die on the hill of 
>>> implementation details, so that feels like CEP time to me.
>>> 
>>> If you were willing and able to get a CEP together for automated repair 
>>> based on the above material, given you've done the work and have the proof 
>>> points it's working at scale, I think this would be a *huge contribution* 
>>> to the community.
>>> 
>>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
 Is anyone going to file an official CEP for this?
 As mentioned in this email thread, here is one of the solution's design 
 doc 
 

Re: [Discuss] Repair inside C*

2024-02-22 Thread Paulo Motta
Apologies, I just read the previous message and missed the previous
discussion on sidecar vs main process on this thread. :-)

It does not look like a final agreement was reached about this and there
are lots of good arguments for both sides, but perhaps it would be nice to
agree on this before a CEP is proposed since this will significantly
influence the initial design?

I tend to agree with Dinesh and Scott's pragmatic stance of providing
initial support to repair scheduling on the sidecar, since this has fewer
dependencies, and progressively move what makes sense to the main process
as TCM/Accord primitives become available and mature.

On Thu, Feb 22, 2024 at 1:44 PM Paulo Motta  wrote:

> +1 to Josh's points,  The project has considered native repair scheduling
> for a long time but it was never made a reality due to the complex
> considerations involved and availability of custom implementations/tools
> like cassandra-reaper, which is a popular way of scheduling repairs in
> Cassandra.
>
> Unfortunately I did not have cycles to review this proposal, but it looks
> promising from a quick glance.
>
> One important consideration that I think we need to discuss is: where
> should repair scheduling live: in the main process or the sidecar?
>
> I think there is a lot of complexity around repair in the main process and
> we need to be extra careful about adding additional complexity on top of
> that.
>
> Perhaps this could be a good opportunity to consider the sidecar to host
> repair scheduling, since this looks to be a control plane responsibility?
> One downside is that this would not make repair scheduling available to
> users who do not use the sidecar.
>
> What do you think? It would be great to have input from sidecar
> maintainers if this is something that would make sense for that subproject.
>
> On Thu, Feb 22, 2024 at 12:33 PM Josh McKenzie 
> wrote:
>
>> Very late response from me here (basically necro'ing this thread).
>>
>> I think it'd be useful to get this condensed into a CEP that we can then
>> discuss in that format. It's clearly something we all agree we need and
>> having an implementation that works, even if it's not in your preferred
>> execution domain, is vastly better than nothing IMO.
>>
>> I don't have cycles (nor background ;) ) to do that, but it sounds like
>> you do Jaydeep given the implementation you have on a private fork + design.
>>
>> A non-exhaustive list of things that might be useful incorporating into
>> or referencing from a CEP:
>> Slack thread:
>> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> Joey's old C* ticket:
>> https://issues.apache.org/jira/browse/CASSANDRA-14346
>> Even older automatic repair scheduling:
>> https://issues.apache.org/jira/browse/CASSANDRA-10070
>> Your design gdoc:
>> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
>> PR with automated repair:
>> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>>
>> My intuition is that we're all basically in agreement that this is
>> something the DB needs, we're all willing to bikeshed for our personal
>> preference on where it lives and how it's implemented, and at the end of
>> the day, code talks. I don't think anyone's said they'll die on the hill of
>> implementation details, so that feels like CEP time to me.
>>
>> If you were willing and able to get a CEP together for automated repair
>> based on the above material, given you've done the work and have the proof
>> points it's working at scale, I think this would be a *huge contribution*
>> to the community.
>>
>> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>>
>> Is anyone going to file an official CEP for this?
>> As mentioned in this email thread, here is one of the solution's design
>> doc
>> 
>> and source code on a private Apache Cassandra patch. Could you go through
>> it and let me know what you think?
>>
>> Jaydeep
>>
>> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad 
>> wrote:
>>
>> > That said I would happily support an effort to bring repair scheduling
>> to the sidecar immediately. This has nothing blocking it, and would
>> potentially enable the sidecar to provide an official repair scheduling
>> solution that is compatible with current or even previous versions of the
>> database.
>>
>> This is something I hadn't thought much about, and is a pretty good
>> argument for using the sidecar initially.  There's a lot of deployments out
>> there and having an official repair option would be a big win.
>>
>>
>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
>> > I agree that it would be ideal for Cassandra to have a repair scheduler
>> in-DB.
>> >
>> > That said I would happily support an effort to bring repair scheduling
>> to the sidecar immediately. This has nothing blocking it, and would
>> poten

Re: [Discuss] Repair inside C*

2024-02-22 Thread Paulo Motta
+1 to Josh's points,  The project has considered native repair scheduling
for a long time but it was never made a reality due to the complex
considerations involved and availability of custom implementations/tools
like cassandra-reaper, which is a popular way of scheduling repairs in
Cassandra.

Unfortunately I did not have cycles to review this proposal, but it looks
promising from a quick glance.

One important consideration that I think we need to discuss is: where
should repair scheduling live: in the main process or the sidecar?

I think there is a lot of complexity around repair in the main process and
we need to be extra careful about adding additional complexity on top of
that.

Perhaps this could be a good opportunity to consider the sidecar to host
repair scheduling, since this looks to be a control plane responsibility?
One downside is that this would not make repair scheduling available to
users who do not use the sidecar.

What do you think? It would be great to have input from sidecar maintainers
if this is something that would make sense for that subproject.

On Thu, Feb 22, 2024 at 12:33 PM Josh McKenzie  wrote:

> Very late response from me here (basically necro'ing this thread).
>
> I think it'd be useful to get this condensed into a CEP that we can then
> discuss in that format. It's clearly something we all agree we need and
> having an implementation that works, even if it's not in your preferred
> execution domain, is vastly better than nothing IMO.
>
> I don't have cycles (nor background ;) ) to do that, but it sounds like
> you do Jaydeep given the implementation you have on a private fork + design.
>
> A non-exhaustive list of things that might be useful incorporating into or
> referencing from a CEP:
> Slack thread:
> https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> Joey's old C* ticket:
> https://issues.apache.org/jira/browse/CASSANDRA-14346
> Even older automatic repair scheduling:
> https://issues.apache.org/jira/browse/CASSANDRA-10070
> Your design gdoc:
> https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
> PR with automated repair:
> https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c
>
> My intuition is that we're all basically in agreement that this is
> something the DB needs, we're all willing to bikeshed for our personal
> preference on where it lives and how it's implemented, and at the end of
> the day, code talks. I don't think anyone's said they'll die on the hill of
> implementation details, so that feels like CEP time to me.
>
> If you were willing and able to get a CEP together for automated repair
> based on the above material, given you've done the work and have the proof
> points it's working at scale, I think this would be a *huge contribution*
> to the community.
>
> On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
>
> Is anyone going to file an official CEP for this?
> As mentioned in this email thread, here is one of the solution's design
> doc
> 
> and source code on a private Apache Cassandra patch. Could you go through
> it and let me know what you think?
>
> Jaydeep
>
> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad 
> wrote:
>
> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
>
> This is something I hadn't thought much about, and is a pretty good
> argument for using the sidecar initially.  There's a lot of deployments out
> there and having an official repair option would be a big win.
>
>
> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> > I agree that it would be ideal for Cassandra to have a repair scheduler
> in-DB.
> >
> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
> >
> > Once TCM has landed, we’ll have much stronger primitives for repair
> orchestration in the database itself. But I don’t think that should block
> progress on a repair scheduling solution in the sidecar, and there is
> nothing that would prevent someone from continuing to use a sidecar-based
> solution in perpetuity if they preferred.
> >
> > - Scott
> >
> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad 
> wrote:
> > >
> > > I'm 100% in favor of repair being part of the core DB, not the
> sidecar.  The current (and past) state of things where running the DB
> correctly *requires* running a separate process (either community
> maintained or officia

Re: [Discuss] Repair inside C*

2024-02-22 Thread Josh McKenzie
Very late response from me here (basically necro'ing this thread).

I think it'd be useful to get this condensed into a CEP that we can then 
discuss in that format. It's clearly something we all agree we need and having 
an implementation that works, even if it's not in your preferred execution 
domain, is vastly better than nothing IMO.

I don't have cycles (nor background ;) ) to do that, but it sounds like you do 
Jaydeep given the implementation you have on a private fork + design.

A non-exhaustive list of things that might be useful incorporating into or 
referencing from a CEP:
Slack thread: https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
Joey's old C* ticket: https://issues.apache.org/jira/browse/CASSANDRA-14346
Even older automatic repair scheduling: 
https://issues.apache.org/jira/browse/CASSANDRA-10070
Your design gdoc: 
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0
PR with automated repair: 
https://github.com/jaydeepkumar1984/cassandra/commit/ef6456d652c0d07cf29d88dfea03b73704814c2c

My intuition is that we're all basically in agreement that this is something 
the DB needs, we're all willing to bikeshed for our personal preference on 
where it lives and how it's implemented, and at the end of the day, code talks. 
I don't think anyone's said they'll die on the hill of implementation details, 
so that feels like CEP time to me.

If you were willing and able to get a CEP together for automated repair based 
on the above material, given you've done the work and have the proof points 
it's working at scale, I think this would be a *huge contribution* to the 
community.

On Thu, Aug 24, 2023, at 7:26 PM, Jaydeep Chovatia wrote:
> Is anyone going to file an official CEP for this?
> As mentioned in this email thread, here is one of the solution's design doc 
> 
>  and source code on a private Apache Cassandra patch. Could you go through it 
> and let me know what you think?
> 
> Jaydeep
> 
> On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad  wrote:
>> > That said I would happily support an effort to bring repair scheduling to 
>> > the sidecar immediately. This has nothing blocking it, and would 
>> > potentially enable the sidecar to provide an official repair scheduling 
>> > solution that is compatible with current or even previous versions of the 
>> > database.
>> 
>> This is something I hadn't thought much about, and is a pretty good argument 
>> for using the sidecar initially.  There's a lot of deployments out there and 
>> having an official repair option would be a big win. 
>> 
>> 
>> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
>> > I agree that it would be ideal for Cassandra to have a repair scheduler 
>> > in-DB.
>> >
>> > That said I would happily support an effort to bring repair scheduling to 
>> > the sidecar immediately. This has nothing blocking it, and would 
>> > potentially enable the sidecar to provide an official repair scheduling 
>> > solution that is compatible with current or even previous versions of the 
>> > database.
>> >
>> > Once TCM has landed, we’ll have much stronger primitives for repair 
>> > orchestration in the database itself. But I don’t think that should block 
>> > progress on a repair scheduling solution in the sidecar, and there is 
>> > nothing that would prevent someone from continuing to use a sidecar-based 
>> > solution in perpetuity if they preferred.
>> >
>> > - Scott
>> >
>> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad  
>> > > wrote:
>> > >
>> > > I'm 100% in favor of repair being part of the core DB, not the sidecar. 
>> > >  The current (and past) state of things where running the DB correctly 
>> > > *requires* running a separate process (either community maintained or 
>> > > official C* sidecar) is incredibly painful for folks.  The idea that 
>> > > your data integrity needs to be opt-in has never made sense to me from 
>> > > the perspective of either the product or the end user.
>> > >
>> > > I've worked with way too many teams that have either configured this 
>> > > incorrectly or not at all. 
>> > >
>> > > Ideally Cassandra would ship with repair built in and on by default.  
>> > > Power users can disable if they want to continue to maintain their own 
>> > > repair tooling for some reason.
>> > >
>> > > Jon
>> > >
>> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>> > >> All,
>> > >> We had a brief discussion in [2] about the Uber article [1] where they 
>> > >> talk about having integrated repair into Cassandra and how great that 
>> > >> is. I expressed my disappointment that they didn't work with the 
>> > >> community on that (Uber, if you are listening time to make amends 🙂) 
>> > >> and it turns out Joey already had the idea and wrote the code [3] - so 
>> > >> I wanted to start a discussion to gauge interest and maybe how to 
>> >

Re: [Discuss] Repair inside C*

2023-08-24 Thread Jaydeep Chovatia
Is anyone going to file an official CEP for this?
As mentioned in this email thread, here is one of the solution's design doc

and source code on a private Apache Cassandra patch. Could you go through
it and let me know what you think?

Jaydeep

On Wed, Aug 2, 2023 at 3:54 PM Jon Haddad 
wrote:

> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
>
> This is something I hadn't thought much about, and is a pretty good
> argument for using the sidecar initially.  There's a lot of deployments out
> there and having an official repair option would be a big win.
>
>
> On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> > I agree that it would be ideal for Cassandra to have a repair scheduler
> in-DB.
> >
> > That said I would happily support an effort to bring repair scheduling
> to the sidecar immediately. This has nothing blocking it, and would
> potentially enable the sidecar to provide an official repair scheduling
> solution that is compatible with current or even previous versions of the
> database.
> >
> > Once TCM has landed, we’ll have much stronger primitives for repair
> orchestration in the database itself. But I don’t think that should block
> progress on a repair scheduling solution in the sidecar, and there is
> nothing that would prevent someone from continuing to use a sidecar-based
> solution in perpetuity if they preferred.
> >
> > - Scott
> >
> > > On Jul 26, 2023, at 3:25 PM, Jon Haddad 
> wrote:
> > >
> > > I'm 100% in favor of repair being part of the core DB, not the
> sidecar.  The current (and past) state of things where running the DB
> correctly *requires* running a separate process (either community
> maintained or official C* sidecar) is incredibly painful for folks.  The
> idea that your data integrity needs to be opt-in has never made sense to me
> from the perspective of either the product or the end user.
> > >
> > > I've worked with way too many teams that have either configured this
> incorrectly or not at all.
> > >
> > > Ideally Cassandra would ship with repair built in and on by default.
> Power users can disable if they want to continue to maintain their own
> repair tooling for some reason.
> > >
> > > Jon
> > >
> > >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> > >> All,
> > >> We had a brief discussion in [2] about the Uber article [1] where
> they talk about having integrated repair into Cassandra and how great that
> is. I expressed my disappointment that they didn't work with the community
> on that (Uber, if you are listening time to make amends 🙂) and it turns
> out Joey already had the idea and wrote the code [3] - so I wanted to start
> a discussion to gauge interest and maybe how to revive that effort.
> > >> Thanks,
> > >> German
> > >> [1]
> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> > >> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> > >> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
> >
>


Re: [Discuss] Repair inside C*

2023-08-02 Thread Jon Haddad
> That said I would happily support an effort to bring repair scheduling to the 
> sidecar immediately. This has nothing blocking it, and would potentially 
> enable the sidecar to provide an official repair scheduling solution that is 
> compatible with current or even previous versions of the database.

This is something I hadn't thought much about, and is a pretty good argument 
for using the sidecar initially.  There's a lot of deployments out there and 
having an official repair option would be a big win.  


On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> I agree that it would be ideal for Cassandra to have a repair scheduler in-DB.
> 
> That said I would happily support an effort to bring repair scheduling to the 
> sidecar immediately. This has nothing blocking it, and would potentially 
> enable the sidecar to provide an official repair scheduling solution that is 
> compatible with current or even previous versions of the database.
> 
> Once TCM has landed, we’ll have much stronger primitives for repair 
> orchestration in the database itself. But I don’t think that should block 
> progress on a repair scheduling solution in the sidecar, and there is nothing 
> that would prevent someone from continuing to use a sidecar-based solution in 
> perpetuity if they preferred.
> 
> - Scott
> 
> > On Jul 26, 2023, at 3:25 PM, Jon Haddad  wrote:
> > 
> > I'm 100% in favor of repair being part of the core DB, not the sidecar.  
> > The current (and past) state of things where running the DB correctly 
> > *requires* running a separate process (either community maintained or 
> > official C* sidecar) is incredibly painful for folks.  The idea that your 
> > data integrity needs to be opt-in has never made sense to me from the 
> > perspective of either the product or the end user.
> > 
> > I've worked with way too many teams that have either configured this 
> > incorrectly or not at all.  
> > 
> > Ideally Cassandra would ship with repair built in and on by default.  Power 
> > users can disable if they want to continue to maintain their own repair 
> > tooling for some reason.
> > 
> > Jon
> > 
> >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> >> All,
> >> We had a brief discussion in [2] about the Uber article [1] where they 
> >> talk about having integrated repair into Cassandra and how great that is. 
> >> I expressed my disappointment that they didn't work with the community on 
> >> that (Uber, if you are listening time to make amends 🙂) and it turns out 
> >> Joey already had the idea and wrote the code [3] - so I wanted to start a 
> >> discussion to gauge interest and maybe how to revive that effort.
> >> Thanks,
> >> German
> >> [1] 
> >> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> >> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> >> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
> 


Re: [Discuss] Repair inside C*

2023-07-27 Thread Josh McKenzie
> The idea that your data integrity needs to be opt-in has never made sense to 
> me from the perspective of either the product or the end user.
I could not agree with this more. 100%.

> The current (and past) state of things where running the DB correctly 
> **requires* *running a separate process (either community maintained or 
> official C* sidecar) is incredibly painful for folks.
I'm 50/50 on this (and I have some opinions here; bear with me :D ).

To me this goes beyond the question of just "where do we coordinate repair" 
into "what role does a node play vs. the sidecar and how does that intersect 
w/the industry today".

Having just 1 process you run on N machines is much nicer from an operations 
standpoint and it's *much* cleaner and easier for us as a project to not have 
to deal with signaling, shmem, and going down the IPC rabbit hole. A modular 
monolith, if you will.

That said, I feel like zeitgeist has been all-in in terms of microservices and 
control planes, whether they're the right solution or not. The affordances on 
being able to build out independent teams and large organization dev velocity, 
never-mind the ideal of being able to cleanly upgrade or rewrite internal 
components, is attractive enough on paper that it feels like most groups have 
gone that direction and accepted the perceived costs; I view Cassandra as being 
something of an architectural anachronism at this point. And to call back to 
the prior paragraph, I *think* you get all those positive affordances w/a 
modular monolith. Sadly, google trends 

 don't really give me a lot of hope there.

In an ideal world operators (or better yet, an automated operations process) 
would be able to dynamically adjust resource allocation to nodes based on 
"burstiness of the buffering" (i.e. lots of data building up in CL's needing to 
be flushed, or compaction need, or repair need); It's not immediately obvious 
to me how we'd gracefully do that in a single process paradigm in containers 
w/out becoming a noisy neighbor but it's not impossible. Kind of goes meta 
outside C*'s scope into how you're coordinating your hardware and software 
interactions; maybe that's the cleaner route: we clearly signal metrics for 
each major operation the DB needs to do to indicate their backlog and an 
external orchestration process / system / ??? handles the resource allocation. 
i.e. we don't take that on.

Certainly we can do a lot better when it comes to internal scheduling of DB 
operations to one another than we are today (start using cql rate limiting, 
dynamically determine a rolling average of needs to smooth out burst requests, 
make byte-based rate-limiting an option, user-space threads w/loom and some 
kind of QoS prioritization based on backlogs, etc).

I personally view moving maintenance tasks into the sidecar as a reasonable 
"first step satisficing compromise". If anything, that'd potentially give us 
some breathing room to get our house in order on the "I/O" process (as opposed 
to sidecar as "maintenance process") to then re-integrate things back in in a 
more clean, planned fashion with some better tools to do it right.

~Josh


On Wed, Jul 26, 2023, at 7:20 PM, C. Scott Andreas wrote:
> I agree that it would be ideal for Cassandra to have a repair scheduler in-DB.
> 
> That said I would happily support an effort to bring repair scheduling to the 
> sidecar immediately. This has nothing blocking it, and would potentially 
> enable the sidecar to provide an official repair scheduling solution that is 
> compatible with current or even previous versions of the database.
> 
> Once TCM has landed, we’ll have much stronger primitives for repair 
> orchestration in the database itself. But I don’t think that should block 
> progress on a repair scheduling solution in the sidecar, and there is nothing 
> that would prevent someone from continuing to use a sidecar-based solution in 
> perpetuity if they preferred.
> 
> - Scott
> 
> > On Jul 26, 2023, at 3:25 PM, Jon Haddad  wrote:
> > 
> > I'm 100% in favor of repair being part of the core DB, not the sidecar.  
> > The current (and past) state of things where running the DB correctly 
> > *requires* running a separate process (either community maintained or 
> > official C* sidecar) is incredibly painful for folks.  The idea that your 
> > data integrity needs to be opt-in has never made sense to me from the 
> > perspective of either the product or the end user.
> > 
> > I've worked with way too many teams that have either configured this 
> > incorrectly or not at all.  
> > 
> > Ideally Cassandra would ship with repair built in and on by default.  Power 
> > users can disable if they want to continue to maintain their own repair 
> > tooling for some reason.
> > 
> > Jon
> > 
> >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> >> All,
> >> We had a brief discussi

Re: [Discuss] Repair inside C*

2023-07-26 Thread C. Scott Andreas
I agree that it would be ideal for Cassandra to have a repair scheduler in-DB.

That said I would happily support an effort to bring repair scheduling to the 
sidecar immediately. This has nothing blocking it, and would potentially enable 
the sidecar to provide an official repair scheduling solution that is 
compatible with current or even previous versions of the database.

Once TCM has landed, we’ll have much stronger primitives for repair 
orchestration in the database itself. But I don’t think that should block 
progress on a repair scheduling solution in the sidecar, and there is nothing 
that would prevent someone from continuing to use a sidecar-based solution in 
perpetuity if they preferred.

- Scott

> On Jul 26, 2023, at 3:25 PM, Jon Haddad  wrote:
> 
> I'm 100% in favor of repair being part of the core DB, not the sidecar.  The 
> current (and past) state of things where running the DB correctly *requires* 
> running a separate process (either community maintained or official C* 
> sidecar) is incredibly painful for folks.  The idea that your data integrity 
> needs to be opt-in has never made sense to me from the perspective of either 
> the product or the end user.
> 
> I've worked with way too many teams that have either configured this 
> incorrectly or not at all.  
> 
> Ideally Cassandra would ship with repair built in and on by default.  Power 
> users can disable if they want to continue to maintain their own repair 
> tooling for some reason.
> 
> Jon
> 
>> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>> All,
>> We had a brief discussion in [2] about the Uber article [1] where they talk 
>> about having integrated repair into Cassandra and how great that is. I 
>> expressed my disappointment that they didn't work with the community on that 
>> (Uber, if you are listening time to make amends 🙂) and it turns out Joey 
>> already had the idea and wrote the code [3] - so I wanted to start a 
>> discussion to gauge interest and maybe how to revive that effort.
>> Thanks,
>> German
>> [1] 
>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346


Re: [Discuss] Repair inside C*

2023-07-26 Thread Dinesh Joshi
I concur, repair is an intrinsic part of the database and belongs inside it. We 
can certainly expose a REST control plane API via the sidecar for triggering it 
on demand, scheduling, etc.

That said, there are various implementation of repair scheduling and 
orchestration that a lot of organizations maintain in their proprietary 
sidecars. It would be beneficial in the interim to consolidate on a common 
solution in the sidecar. Eventually we need a version of repair in the database 
that just works without the need of any operator intervention.


> On Jul 26, 2023, at 3:25 PM, Jon Haddad  wrote:
> 
> I'm 100% in favor of repair being part of the core DB, not the sidecar.  The 
> current (and past) state of things where running the DB correctly *requires* 
> running a separate process (either community maintained or official C* 
> sidecar) is incredibly painful for folks.  The idea that your data integrity 
> needs to be opt-in has never made sense to me from the perspective of either 
> the product or the end user.
> 
> I've worked with way too many teams that have either configured this 
> incorrectly or not at all.  
> 
> Ideally Cassandra would ship with repair built in and on by default.  Power 
> users can disable if they want to continue to maintain their own repair 
> tooling for some reason. 
> 
> Jon
> 
> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
>> All,
>> 
>> We had a brief discussion in [2] about the Uber article [1] where they talk 
>> about having integrated repair into Cassandra and how great that is. I 
>> expressed my disappointment that they didn't work with the community on that 
>> (Uber, if you are listening time to make amends 🙂) and it turns out Joey 
>> already had the idea and wrote the code [3] - so I wanted to start a 
>> discussion to gauge interest and maybe how to revive that effort.
>> 
>> Thanks,
>> German
>> 
>> [1] 
>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>> 



Re: [Discuss] Repair inside C*

2023-07-26 Thread Jon Haddad
I'm 100% in favor of repair being part of the core DB, not the sidecar.  The 
current (and past) state of things where running the DB correctly *requires* 
running a separate process (either community maintained or official C* sidecar) 
is incredibly painful for folks.  The idea that your data integrity needs to be 
opt-in has never made sense to me from the perspective of either the product or 
the end user.

I've worked with way too many teams that have either configured this 
incorrectly or not at all.  

Ideally Cassandra would ship with repair built in and on by default.  Power 
users can disable if they want to continue to maintain their own repair tooling 
for some reason. 

Jon

On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> All,
> 
> We had a brief discussion in [2] about the Uber article [1] where they talk 
> about having integrated repair into Cassandra and how great that is. I 
> expressed my disappointment that they didn't work with the community on that 
> (Uber, if you are listening time to make amends 🙂) and it turns out Joey 
> already had the idea and wrote the code [3] - so I wanted to start a 
> discussion to gauge interest and maybe how to revive that effort.
> 
> Thanks,
> German
> 
> [1] 
> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
> 


Re: [Discuss] Repair inside C*

2023-07-26 Thread David Capwell
+0 to sidecar, in order to make that work well we need to expose state that the 
node has so the sidecar can make good calls, if it runs in the node then 
nothing has to be exposed.  One thing to flesh out is where do the “smarts” 
live?  If the range has too many partitions, which system knows to subdivide 
the range and sequence the repairs (else you OOM)?  “Should” repair itself be 
better and take all input and make sure it works correctly, so the caller just 
worries about scheduling?  “Should” the scheduler understand limitations with 
repair and work around them?

> On Jul 25, 2023, at 11:26 AM, Jeremiah Jordan  
> wrote:
> 
> +1 for the side car being the right location.
> 
> -Jeremiah
> 
> On Jul 25, 2023 at 1:16:14 PM, Chris Lohfink  <mailto:[email protected]>> wrote:
>> I think a CEP is the next step. Considering the number of companies 
>> involved, this might necessitate several drafts and rounds of discussions. I 
>> appreciate your initiative in starting this process, and I'm eager to 
>> contribute to the ensuing discussions. Maybe in a google docs or something 
>> initially for more interactive feedback?
>> 
>> In regards to https://issues.apache.org/jira/browse/CASSANDRA-14346 we at 
>> Netflix are actually putting effort currently to move this into the sidecar 
>> as the idea was to start moving non-read/write path things into different 
>> process and jvms to not impact each other.
>> 
>> I think the sidecar/in process discussion might be a bit contentious as I 
>> know even things like compaction some feel should be moved out of process in 
>> future. On a personal note, my primary interest lies in seeing the 
>> implementation realized, so I am willing to support whatever consensus 
>> emerges. Whichever direction these go we will help with the implementation.
>> 
>> Chris
>> 
>> On Tue, Jul 25, 2023 at 1:09 PM Jaydeep Chovatia > <mailto:[email protected]>> wrote:
>>> Sounds good, German. Feel free to let me know if you need my help in filing 
>>> CEP, adding supporting content to the CEP, etc.
>>> As I mentioned previously, I have already been working (going through an 
>>> internal review) on creating a one-pager doc, code, etc., that has been 
>>> working for us for the last six years at an immense scale, and I will share 
>>> it soon on a private fork.
>>> 
>>> Thanks,
>>> Jaydeep
>>> 
>>> On Tue, Jul 25, 2023 at 9:48 AM German Eichberger via dev 
>>> mailto:[email protected]>> wrote:
>>>> In [2] we suggested that the next step should be a CEP.
>>>> 
>>>> I am happy to lend a hand to this effort as well.
>>>> 
>>>> Thanks Jaydeep and David - really appreciated.
>>>> 
>>>> German
>>>> 
>>>> From: David Capwell mailto:[email protected]>>
>>>> Sent: Tuesday, July 25, 2023 8:32 AM
>>>> To: dev mailto:[email protected]>>
>>>> Cc: German Eichberger >>> <mailto:[email protected]>>
>>>> Subject: [EXTERNAL] Re: [Discuss] Repair inside C*
>>>>  
>>>> As someone who has done a lot of work trying to make repair stable, I 
>>>> approve of this message ^_^
>>>> 
>>>> More than glad to help mentor this work
>>>> 
>>>> On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia >>> <mailto:[email protected]>> wrote:
>>>> 
>>>> To clarify the repair solution timing, the one we have listed in the 
>>>> article is not the recently developed one. We were hitting some 
>>>> high-priority production challenges back in early 2018, and to address 
>>>> that, we developed and rolled out the solution in production in just a few 
>>>> months. The timing-wise, the solution was developed and productized by Q3 
>>>> 2018, of course, continued to evolve thereafter. Usually, we explore the 
>>>> existing solutions we can leverage, but when we started our journey in 
>>>> early 2018, most of the solutions were based on sidecar solutions. There 
>>>> is nothing against the sidecar solution; it was just a pure business 
>>>> decision, and in that, we wanted to avoid the sidecar to avoid a 
>>>> dependency on the control plane. Every solution developed has its deep 
>>>> context, merits, and pros and cons; they are all great solutions! 
>>>> 
>>>> An appeal to the community members is to think one more time about having 
>>>&g

Re: [Discuss] Repair inside C*

2023-07-25 Thread Jeremiah Jordan
 +1 for the side car being the right location.

-Jeremiah

On Jul 25, 2023 at 1:16:14 PM, Chris Lohfink  wrote:

> I think a CEP is the next step. Considering the number of companies
> involved, this might necessitate several drafts and rounds of discussions.
> I appreciate your initiative in starting this process, and I'm eager to
> contribute to the ensuing discussions. Maybe in a google docs or something
> initially for more interactive feedback?
>
> In regards to https://issues.apache.org/jira/browse/CASSANDRA-14346 we at
> Netflix are actually putting effort currently to move this into the sidecar
> as the idea was to start moving non-read/write path things into different
> process and jvms to not impact each other.
>
> I think the sidecar/in process discussion might be a bit contentious as I
> know even things like compaction some feel should be moved out of process
> in future. On a personal note, my primary interest lies in seeing the
> implementation realized, so I am willing to support whatever consensus
> emerges. Whichever direction these go we will help with the implementation.
>
> Chris
>
> On Tue, Jul 25, 2023 at 1:09 PM Jaydeep Chovatia <
> [email protected]> wrote:
>
>> Sounds good, German. Feel free to let me know if you need my help
>> in filing CEP, adding supporting content to the CEP, etc.
>> As I mentioned previously, I have already been working (going through an
>> internal review) on creating a one-pager doc, code, etc., that has been
>> working for us for the last six years at an immense scale, and I will share
>> it soon on a private fork.
>>
>> Thanks,
>> Jaydeep
>>
>> On Tue, Jul 25, 2023 at 9:48 AM German Eichberger via dev <
>> [email protected]> wrote:
>>
>>> In [2] we suggested that the next step should be a CEP.
>>>
>>> I am happy to lend a hand to this effort as well.
>>>
>>> Thanks Jaydeep and David - really appreciated.
>>>
>>> German
>>>
>>> --
>>> *From:* David Capwell 
>>> *Sent:* Tuesday, July 25, 2023 8:32 AM
>>> *To:* dev 
>>> *Cc:* German Eichberger 
>>> *Subject:* [EXTERNAL] Re: [Discuss] Repair inside C*
>>>
>>> As someone who has done a lot of work trying to make repair stable, I
>>> approve of this message ^_^
>>>
>>> More than glad to help mentor this work
>>>
>>> On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia <
>>> [email protected]> wrote:
>>>
>>> To clarify the repair solution timing, the one we have listed in the
>>> article is not the recently developed one. We were hitting some
>>> high-priority production challenges back in early 2018, and to address
>>> that, we developed and rolled out the solution in production in just a few
>>> months. The timing-wise, the solution was developed and productized by Q3
>>> 2018, of course, continued to evolve thereafter. Usually, we explore the
>>> existing solutions we can leverage, but when we started our journey in
>>> early 2018, most of the solutions were based on sidecar solutions. There is
>>> nothing against the sidecar solution; it was just a pure business decision,
>>> and in that, we wanted to avoid the sidecar to avoid a dependency on the
>>> control plane. Every solution developed has its deep context, merits, and
>>> pros and cons; they are all great solutions!
>>>
>>> An appeal to the community members is to think one more time about
>>> having repairs in the Open Source Cassandra itself. As mentioned in my
>>> previous email, any solution getting adopted is fine; the important aspect
>>> is to have a repair solution in the OSS Cassandra itself!
>>>
>>> Yours Faithfully,
>>> Jaydeep
>>>
>>> On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia <
>>> [email protected]> wrote:
>>>
>>> Hi German,
>>>
>>> The goal is always to backport our learnings back to the community. For
>>> example, I have already successfully backported the following two
>>> enhancements/bug fixes back to the Open Source Cassandra, which are
>>> described in the article. I am already currently working on open-source a
>>> few more enhancements mentioned in the article back to the open-source.
>>>
>>>1. https://issues.apache.org/jira/browse/CASSANDRA-18555
>>>2. https://issues.apache.org/jira/browse/CASSANDRA-13740
>>>
>>> There is definitely heavy interest in having the 

Re: [Discuss] Repair inside C*

2023-07-25 Thread Chris Lohfink
I think a CEP is the next step. Considering the number of companies
involved, this might necessitate several drafts and rounds of discussions.
I appreciate your initiative in starting this process, and I'm eager to
contribute to the ensuing discussions. Maybe in a google docs or something
initially for more interactive feedback?

In regards to https://issues.apache.org/jira/browse/CASSANDRA-14346 we at
Netflix are actually putting effort currently to move this into the sidecar
as the idea was to start moving non-read/write path things into different
process and jvms to not impact each other.

I think the sidecar/in process discussion might be a bit contentious as I
know even things like compaction some feel should be moved out of process
in future. On a personal note, my primary interest lies in seeing the
implementation realized, so I am willing to support whatever consensus
emerges. Whichever direction these go we will help with the implementation.

Chris

On Tue, Jul 25, 2023 at 1:09 PM Jaydeep Chovatia 
wrote:

> Sounds good, German. Feel free to let me know if you need my help
> in filing CEP, adding supporting content to the CEP, etc.
> As I mentioned previously, I have already been working (going through an
> internal review) on creating a one-pager doc, code, etc., that has been
> working for us for the last six years at an immense scale, and I will share
> it soon on a private fork.
>
> Thanks,
> Jaydeep
>
> On Tue, Jul 25, 2023 at 9:48 AM German Eichberger via dev <
> [email protected]> wrote:
>
>> In [2] we suggested that the next step should be a CEP.
>>
>> I am happy to lend a hand to this effort as well.
>>
>> Thanks Jaydeep and David - really appreciated.
>>
>> German
>>
>> --
>> *From:* David Capwell 
>> *Sent:* Tuesday, July 25, 2023 8:32 AM
>> *To:* dev 
>> *Cc:* German Eichberger 
>> *Subject:* [EXTERNAL] Re: [Discuss] Repair inside C*
>>
>> As someone who has done a lot of work trying to make repair stable, I
>> approve of this message ^_^
>>
>> More than glad to help mentor this work
>>
>> On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia 
>> wrote:
>>
>> To clarify the repair solution timing, the one we have listed in the
>> article is not the recently developed one. We were hitting some
>> high-priority production challenges back in early 2018, and to address
>> that, we developed and rolled out the solution in production in just a few
>> months. The timing-wise, the solution was developed and productized by Q3
>> 2018, of course, continued to evolve thereafter. Usually, we explore the
>> existing solutions we can leverage, but when we started our journey in
>> early 2018, most of the solutions were based on sidecar solutions. There is
>> nothing against the sidecar solution; it was just a pure business decision,
>> and in that, we wanted to avoid the sidecar to avoid a dependency on the
>> control plane. Every solution developed has its deep context, merits, and
>> pros and cons; they are all great solutions!
>>
>> An appeal to the community members is to think one more time about having
>> repairs in the Open Source Cassandra itself. As mentioned in my previous
>> email, any solution getting adopted is fine; the important aspect is to
>> have a repair solution in the OSS Cassandra itself!
>>
>> Yours Faithfully,
>> Jaydeep
>>
>> On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia <
>> [email protected]> wrote:
>>
>> Hi German,
>>
>> The goal is always to backport our learnings back to the community. For
>> example, I have already successfully backported the following two
>> enhancements/bug fixes back to the Open Source Cassandra, which are
>> described in the article. I am already currently working on open-source a
>> few more enhancements mentioned in the article back to the open-source.
>>
>>1. https://issues.apache.org/jira/browse/CASSANDRA-18555
>>2. https://issues.apache.org/jira/browse/CASSANDRA-13740
>>
>> There is definitely heavy interest in having the repair solution inside
>> the Open Source Cassandra itself, very much like Compaction. As I write
>> this email, we are internally working on a one-pager proposal doc to all
>> the community members on having a repair inside the OSS Apache Cassandra
>> along with our private fork - I will share it soon.
>>
>> Generally, we are ok with any solution getting adopted (either Joey's
>> solution or our repair solution or any other solution). The primary
>> motivation is to have the repair embedded inside the open-source Cassandra

Re: [Discuss] Repair inside C*

2023-07-25 Thread Jaydeep Chovatia
Sounds good, German. Feel free to let me know if you need my help in filing
CEP, adding supporting content to the CEP, etc.
As I mentioned previously, I have already been working (going through an
internal review) on creating a one-pager doc, code, etc., that has been
working for us for the last six years at an immense scale, and I will share
it soon on a private fork.

Thanks,
Jaydeep

On Tue, Jul 25, 2023 at 9:48 AM German Eichberger via dev <
[email protected]> wrote:

> In [2] we suggested that the next step should be a CEP.
>
> I am happy to lend a hand to this effort as well.
>
> Thanks Jaydeep and David - really appreciated.
>
> German
>
> --
> *From:* David Capwell 
> *Sent:* Tuesday, July 25, 2023 8:32 AM
> *To:* dev 
> *Cc:* German Eichberger 
> *Subject:* [EXTERNAL] Re: [Discuss] Repair inside C*
>
> As someone who has done a lot of work trying to make repair stable, I
> approve of this message ^_^
>
> More than glad to help mentor this work
>
> On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia 
> wrote:
>
> To clarify the repair solution timing, the one we have listed in the
> article is not the recently developed one. We were hitting some
> high-priority production challenges back in early 2018, and to address
> that, we developed and rolled out the solution in production in just a few
> months. The timing-wise, the solution was developed and productized by Q3
> 2018, of course, continued to evolve thereafter. Usually, we explore the
> existing solutions we can leverage, but when we started our journey in
> early 2018, most of the solutions were based on sidecar solutions. There is
> nothing against the sidecar solution; it was just a pure business decision,
> and in that, we wanted to avoid the sidecar to avoid a dependency on the
> control plane. Every solution developed has its deep context, merits, and
> pros and cons; they are all great solutions!
>
> An appeal to the community members is to think one more time about having
> repairs in the Open Source Cassandra itself. As mentioned in my previous
> email, any solution getting adopted is fine; the important aspect is to
> have a repair solution in the OSS Cassandra itself!
>
> Yours Faithfully,
> Jaydeep
>
> On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia <
> [email protected]> wrote:
>
> Hi German,
>
> The goal is always to backport our learnings back to the community. For
> example, I have already successfully backported the following two
> enhancements/bug fixes back to the Open Source Cassandra, which are
> described in the article. I am already currently working on open-source a
> few more enhancements mentioned in the article back to the open-source.
>
>1. https://issues.apache.org/jira/browse/CASSANDRA-18555
>2. https://issues.apache.org/jira/browse/CASSANDRA-13740
>
> There is definitely heavy interest in having the repair solution inside
> the Open Source Cassandra itself, very much like Compaction. As I write
> this email, we are internally working on a one-pager proposal doc to all
> the community members on having a repair inside the OSS Apache Cassandra
> along with our private fork - I will share it soon.
>
> Generally, we are ok with any solution getting adopted (either Joey's
> solution or our repair solution or any other solution). The primary
> motivation is to have the repair embedded inside the open-source Cassandra
> itself, so we can retire all various privately developed solutions
> eventually :)
>
> I am also happy to help (drive conversation, discussion, etc.) in any way
> to have a repair solution adopted inside Cassandra itself, please let me
> know. Happy to help!
>
> Yours Faithfully,
> Jaydeep
>
> On Mon, Jul 24, 2023 at 1:44 PM German Eichberger via dev <
> [email protected]> wrote:
>
> All,
>
> We had a brief discussion in [2] about the Uber article [1] where they
> talk about having integrated repair into Cassandra and how great that is. I
> expressed my disappointment that they didn't work with the community on
> that (Uber, if you are listening time to make amends 🙂) and it turns out
> Joey already had the idea and wrote the code [3] - so I wanted to start a
> discussion to gauge interest and maybe how to revive that effort.
>
> Thanks,
> German
>
> [1]
> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>
>
>


Re: [Discuss] Repair inside C*

2023-07-25 Thread German Eichberger via dev
In [2] we suggested that the next step should be a CEP.

I am happy to lend a hand to this effort as well.

Thanks Jaydeep and David - really appreciated.

German


From: David Capwell 
Sent: Tuesday, July 25, 2023 8:32 AM
To: dev 
Cc: German Eichberger 
Subject: [EXTERNAL] Re: [Discuss] Repair inside C*

As someone who has done a lot of work trying to make repair stable, I approve 
of this message ^_^

More than glad to help mentor this work

On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia  
wrote:

To clarify the repair solution timing, the one we have listed in the article is 
not the recently developed one. We were hitting some high-priority production 
challenges back in early 2018, and to address that, we developed and rolled out 
the solution in production in just a few months. The timing-wise, the solution 
was developed and productized by Q3 2018, of course, continued to evolve 
thereafter. Usually, we explore the existing solutions we can leverage, but 
when we started our journey in early 2018, most of the solutions were based on 
sidecar solutions. There is nothing against the sidecar solution; it was just a 
pure business decision, and in that, we wanted to avoid the sidecar to avoid a 
dependency on the control plane. Every solution developed has its deep context, 
merits, and pros and cons; they are all great solutions!

An appeal to the community members is to think one more time about having 
repairs in the Open Source Cassandra itself. As mentioned in my previous email, 
any solution getting adopted is fine; the important aspect is to have a repair 
solution in the OSS Cassandra itself!

Yours Faithfully,
Jaydeep

On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia 
mailto:[email protected]>> wrote:
Hi German,

The goal is always to backport our learnings back to the community. For 
example, I have already successfully backported the following two 
enhancements/bug fixes back to the Open Source Cassandra, which are described 
in the article. I am already currently working on open-source a few more 
enhancements mentioned in the article back to the open-source.

  1.  https://issues.apache.org/jira/browse/CASSANDRA-18555
  2.  https://issues.apache.org/jira/browse/CASSANDRA-13740

There is definitely heavy interest in having the repair solution inside the 
Open Source Cassandra itself, very much like Compaction. As I write this email, 
we are internally working on a one-pager proposal doc to all the community 
members on having a repair inside the OSS Apache Cassandra along with our 
private fork - I will share it soon.

Generally, we are ok with any solution getting adopted (either Joey's solution 
or our repair solution or any other solution). The primary motivation is to 
have the repair embedded inside the open-source Cassandra itself, so we can 
retire all various privately developed solutions eventually :)

I am also happy to help (drive conversation, discussion, etc.) in any way to 
have a repair solution adopted inside Cassandra itself, please let me know. 
Happy to help!

Yours Faithfully,
Jaydeep

On Mon, Jul 24, 2023 at 1:44 PM German Eichberger via dev 
mailto:[email protected]>> wrote:
All,

We had a brief discussion in [2] about the Uber article [1] where they talk 
about having integrated repair into Cassandra and how great that is. I 
expressed my disappointment that they didn't work with the community on that 
(Uber, if you are listening time to make amends 🙂) and it turns out Joey 
already had the idea and wrote the code [3] - so I wanted to start a discussion 
to gauge interest and maybe how to revive that effort.

Thanks,
German

[1] https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
[2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
[3] https://issues.apache.org/jira/browse/CASSANDRA-14346



Re: [Discuss] Repair inside C*

2023-07-25 Thread David Capwell
As someone who has done a lot of work trying to make repair stable, I approve 
of this message ^_^

More than glad to help mentor this work

> On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia  
> wrote:
> 
> To clarify the repair solution timing, the one we have listed in the article 
> is not the recently developed one. We were hitting some high-priority 
> production challenges back in early 2018, and to address that, we developed 
> and rolled out the solution in production in just a few months. The 
> timing-wise, the solution was developed and productized by Q3 2018, of 
> course, continued to evolve thereafter. Usually, we explore the existing 
> solutions we can leverage, but when we started our journey in early 2018, 
> most of the solutions were based on sidecar solutions. There is nothing 
> against the sidecar solution; it was just a pure business decision, and in 
> that, we wanted to avoid the sidecar to avoid a dependency on the control 
> plane. Every solution developed has its deep context, merits, and pros and 
> cons; they are all great solutions! 
> 
> An appeal to the community members is to think one more time about having 
> repairs in the Open Source Cassandra itself. As mentioned in my previous 
> email, any solution getting adopted is fine; the important aspect is to have 
> a repair solution in the OSS Cassandra itself!
> 
> Yours Faithfully,
> Jaydeep
> 
> On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia  > wrote:
>> Hi German,
>> 
>> The goal is always to backport our learnings back to the community. For 
>> example, I have already successfully backported the following two 
>> enhancements/bug fixes back to the Open Source Cassandra, which are 
>> described in the article. I am already currently working on open-source a 
>> few more enhancements mentioned in the article back to the open-source.
>> https://issues.apache.org/jira/browse/CASSANDRA-18555
>> https://issues.apache.org/jira/browse/CASSANDRA-13740
>> There is definitely heavy interest in having the repair solution inside the 
>> Open Source Cassandra itself, very much like Compaction. As I write this 
>> email, we are internally working on a one-pager proposal doc to all the 
>> community members on having a repair inside the OSS Apache Cassandra along 
>> with our private fork - I will share it soon.
>> 
>> Generally, we are ok with any solution getting adopted (either Joey's 
>> solution or our repair solution or any other solution). The primary 
>> motivation is to have the repair embedded inside the open-source Cassandra 
>> itself, so we can retire all various privately developed solutions 
>> eventually :)
>> 
>> I am also happy to help (drive conversation, discussion, etc.) in any way to 
>> have a repair solution adopted inside Cassandra itself, please let me know. 
>> Happy to help!
>> 
>> Yours Faithfully,
>> Jaydeep
>> 
>> On Mon, Jul 24, 2023 at 1:44 PM German Eichberger via dev 
>> mailto:[email protected]>> wrote:
>>> All,
>>> 
>>> We had a brief discussion in [2] about the Uber article [1] where they talk 
>>> about having integrated repair into Cassandra and how great that is. I 
>>> expressed my disappointment that they didn't work with the community on 
>>> that (Uber, if you are listening time to make amends 🙂) and it turns out 
>>> Joey already had the idea and wrote the code [3] - so I wanted to start a 
>>> discussion to gauge interest and maybe how to revive that effort.
>>> 
>>> Thanks,
>>> German
>>> 
>>> [1] 
>>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>>> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>>> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346



Re: [Discuss] Repair inside C*

2023-07-24 Thread Jaydeep Chovatia
To clarify the repair solution timing, the one we have listed in the
article is not the recently developed one. We were hitting some
high-priority production challenges back in early 2018, and to address
that, we developed and rolled out the solution in production in just a few
months. The timing-wise, the solution was developed and productized by Q3
2018, of course, continued to evolve thereafter. Usually, we explore the
existing solutions we can leverage, but when we started our journey in
early 2018, most of the solutions were based on sidecar solutions. There is
nothing against the sidecar solution; it was just a pure business decision,
and in that, we wanted to avoid the sidecar to avoid a dependency on the
control plane. Every solution developed has its deep context, merits, and
pros and cons; they are all great solutions!

An appeal to the community members is to think one more time about having
repairs in the Open Source Cassandra itself. As mentioned in my previous
email, any solution getting adopted is fine; the important aspect is to
have a repair solution in the OSS Cassandra itself!

Yours Faithfully,
Jaydeep

On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia 
wrote:

> Hi German,
>
> The goal is always to backport our learnings back to the community. For
> example, I have already successfully backported the following two
> enhancements/bug fixes back to the Open Source Cassandra, which are
> described in the article. I am already currently working on open-source a
> few more enhancements mentioned in the article back to the open-source.
>
>1. https://issues.apache.org/jira/browse/CASSANDRA-18555
>2. https://issues.apache.org/jira/browse/CASSANDRA-13740
>
> There is definitely heavy interest in having the repair solution inside
> the Open Source Cassandra itself, very much like Compaction. As I write
> this email, we are internally working on a one-pager proposal doc to all
> the community members on having a repair inside the OSS Apache Cassandra
> along with our private fork - I will share it soon.
>
> Generally, we are ok with any solution getting adopted (either Joey's
> solution or our repair solution or any other solution). The primary
> motivation is to have the repair embedded inside the open-source Cassandra
> itself, so we can retire all various privately developed solutions
> eventually :)
>
> I am also happy to help (drive conversation, discussion, etc.) in any way
> to have a repair solution adopted inside Cassandra itself, please let me
> know. Happy to help!
>
> Yours Faithfully,
> Jaydeep
>
> On Mon, Jul 24, 2023 at 1:44 PM German Eichberger via dev <
> [email protected]> wrote:
>
>> All,
>>
>> We had a brief discussion in [2] about the Uber article [1] where they
>> talk about having integrated repair into Cassandra and how great that is. I
>> expressed my disappointment that they didn't work with the community on
>> that (Uber, if you are listening time to make amends 🙂) and it turns out
>> Joey already had the idea and wrote the code [3] - so I wanted to start a
>> discussion to gauge interest and maybe how to revive that effort.
>>
>> Thanks,
>> German
>>
>> [1]
>> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
>> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
>> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>>
>


Re: [Discuss] Repair inside C*

2023-07-24 Thread Jaydeep Chovatia
Hi German,

The goal is always to backport our learnings back to the community. For
example, I have already successfully backported the following two
enhancements/bug fixes back to the Open Source Cassandra, which are
described in the article. I am already currently working on open-source a
few more enhancements mentioned in the article back to the open-source.

   1. https://issues.apache.org/jira/browse/CASSANDRA-18555
   2. https://issues.apache.org/jira/browse/CASSANDRA-13740

There is definitely heavy interest in having the repair solution inside the
Open Source Cassandra itself, very much like Compaction. As I write this
email, we are internally working on a one-pager proposal doc to all the
community members on having a repair inside the OSS Apache Cassandra along
with our private fork - I will share it soon.

Generally, we are ok with any solution getting adopted (either Joey's
solution or our repair solution or any other solution). The primary
motivation is to have the repair embedded inside the open-source Cassandra
itself, so we can retire all various privately developed solutions
eventually :)

I am also happy to help (drive conversation, discussion, etc.) in any way
to have a repair solution adopted inside Cassandra itself, please let me
know. Happy to help!

Yours Faithfully,
Jaydeep

On Mon, Jul 24, 2023 at 1:44 PM German Eichberger via dev <
[email protected]> wrote:

> All,
>
> We had a brief discussion in [2] about the Uber article [1] where they
> talk about having integrated repair into Cassandra and how great that is. I
> expressed my disappointment that they didn't work with the community on
> that (Uber, if you are listening time to make amends 🙂) and it turns out
> Joey already had the idea and wrote the code [3] - so I wanted to start a
> discussion to gauge interest and maybe how to revive that effort.
>
> Thanks,
> German
>
> [1]
> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
>