Hi Kristijonas,
It is not possible to run two repairs, regardless whether they are
incremental or full, for the same token range and on the same table
concurrently. You have two options:
1. create a schedule that's don't overlap, e.g. run incremental repair
daily except the 1st of each month, and run full repair on the 1st of
each month. If you choose to do this, make sure you setup a monitor and
alert system for it and have someone respond to the alerts in weekends
or public holidays. If a repair took longer than usual and is at the
risk of overlapping with the next repair, a timely human intervention is
required to prevent that - either kill the currently running repair or
skip the next one.
2. use an orchestration tool, such as Cassandra Reaper, to take care of
that for you. You will still need monitor and alert to ensure the
repairs are run successfully, but fixing a stuck or failed repair is not
very time sensitive, you can usually leave it till Monday morning if it
happens at Friday night.
Personally I would recommend the 2nd option, because getting back to
your laptop at 10 pm on Friday night after you have had a few beers is
not fun.
Cheers,
Bowen
On 03/02/2024 01:59, Kristijonas Zalys wrote:
Hi Bowen,
Thank you for your help!
So given that we would need to run both incremental and full repair
for a given cluster, is it safe to have both types of repair running
for the same token ranges at the same time? Would it not create a race
condition?
Thanks,
Kristijonas
On Fri, Feb 2, 2024 at 3:36 PM Bowen Song via user
<user@cassandra.apache.org> wrote:
Hi Kristijonas,
To answer your questions:
1. It's still necessary to run full repair on a cluster on which
incremental repair is run periodically. The frequency of full
repair is more of an art than science. Generally speaking, the
less reliable the storage media, the more frequently full repair
should be run. The documentation on this topic is available here
<https://cassandra.apache.org/doc/stable/cassandra/operating/repair.html#incremental-and-full-repairs>
2. Run incremental repair for the first time on an existing
cluster does cause Cassandra to re-compact all SSTables, and can
lead to disk usage spikes. This can be avoided by following the
steps mentioned here
<https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsRepairNodesMigration.html>
I hope that helps.
Cheers,
Bowen
On 02/02/2024 20:57, Kristijonas Zalys wrote:
Hi folks,
I am working on switching from full to incremental repair in
Cassandra v4.0.6 (soon to be v4.1.3) and I have a few questions.
1.
Is it necessary to run regular full repair on a cluster if I
already run incremental repair? If yes, what frequency would
you recommend for full repair?
2.
Has anyone experienced disk usage spikes while using
incremental repair? I have noticed temporary disk footprint
increases of up to 2x (from ~15 GiB to ~30 GiB) caused by
anti-compaction while testing and am wondering how likely
that is to happen in bigger real world use cases?
Thank you all in advance!
Kristijonas