Kinda. It isn’t that you have to repair twice per se, just that the possibility of running repairs at least twice before GC grace seconds elapse means that clearly there is no chance of a tombstone not being subject to repair at least once before you hit your GC grace seconds.
Imagine a tombstone being created on the very first node that Reaper looked at in a repair cycle, but one second after Reaper completed repair of that particular token range. Repairs will be complete, but that particular tombstone just missed being part of the effort. Now your next repair run happens. What if Reaper doesn’t look at that same node first? It is easy to have happen, as there is a bunch of logic related to detection of existing repairs or things taking too long. So the box that was “the first node” in that first repair run, through bad luck gets kicked down to later in the second run. I’ve seen nodes get skipped multiple times (you can tune to reduce that, but bottom line… it happens). So, bad luck you’ve got. Eventually the node does get repaired, and the aging tombstone finally gets removed. All fine and dandy… Provided that the second repair run got to that point BEFORE you hit your GC grace seconds. That’s why you need enough time to run it twice. Because you need enough time to catch the oldest possible tombstone, even if it is dealt with at the very end of a repair run. Yes, it sounds like a bit of a degenerate case, but if you are writing a lot of data, the probability of not having the degenerate cases become real cases becomes vanishingly small. R From: Sergio <lapostadiser...@gmail.com> Date: Wednesday, January 22, 2020 at 1:41 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org>, Reid Pinchback <rpinchb...@tripadvisor.com> Subject: Re: Is there any concern about increasing gc_grace_seconds from 5 days to 8 days? Message from External Sender I was wondering if I should always complete 2 repairs cycles with reaper even if one repair cycle finishes in 7 hours. Currently, I have around 200GB in column family data size to be repaired and I was scheduling once repair a week and I was not having too much stress on my 8 nodes cluster with i3xlarge nodes. Thanks, Sergio Il giorno mer 22 gen 2020 alle ore 08:28 Sergio <lapostadiser...@gmail.com<mailto:lapostadiser...@gmail.com>> ha scritto: Thank you very much! Yes I am using reaper! Best, Sergio On Wed, Jan 22, 2020, 8:00 AM Reid Pinchback <rpinchb...@tripadvisor.com<mailto:rpinchb...@tripadvisor.com>> wrote: Sergio, if you’re looking for a new frequency for your repairs because of the change, if you are using reaper, then I’d go for repair_freq <= gc_grace / 2. Just serendipity with a conversation I was having at work this morning. When you actually watch the reaper logs then you can see situations where unlucky timing with skipped nodes can make the time to remove a tombstone be up to 2 x repair_run_time. If you aren’t using reaper, your mileage will vary, particularly if your repairs are consistent in the ordering across nodes. Reaper can be moderately non-deterministic hence the need to be sure you can complete at least two repair runs. R From: Sergio <lapostadiser...@gmail.com<mailto:lapostadiser...@gmail.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Tuesday, January 21, 2020 at 7:13 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Is there any concern about increasing gc_grace_seconds from 5 days to 8 days? Message from External Sender Thank you very much for your response. The considerations mentioned are the ones that I was expecting. I believe that I am good to go. I just wanted to make sure that there was no need to run any other extra command beside that one. Best, Sergio On Tue, Jan 21, 2020, 3:55 PM Jeff Jirsa <jji...@gmail.com<mailto:jji...@gmail.com>> wrote: Note that if you're actually running repairs within 5 days, and you adjust this to 8, you may stream a bunch of tombstones across in that 5-8 day window, which can increase disk usage / compaction (because as you pass 5 days, one replica may gc away the tombstones, the others may not because the tombstones shadow data, so you'll re-stream the tombstone to the other replicas) On Tue, Jan 21, 2020 at 3:28 PM Elliott Sims <elli...@backblaze.com<mailto:elli...@backblaze.com>> wrote: In addition to extra space, queries can potentially be more expensive because more dead rows and tombstones will need to be scanned. How much of a difference this makes will depend drastically on the schema and access pattern, but I wouldn't expect going from 5 days to 8 to be very noticeable. On Tue, Jan 21, 2020 at 2:14 PM Sergio <lapostadiser...@gmail.com<mailto:lapostadiser...@gmail.com>> wrote: https://stackoverflow.com/a/22030790<https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_a_22030790&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=qt1NAYTks84VVQ4WGXWkK6pw85m3FcuUjPRJPdIHMdw&s=aEgz5F5HRxPT3w4hpfNXQRhcchwRjrpf7KB3QyywO_Q&e=> For CQLSH alter table <table_name> with GC_GRACE_SECONDS = <seconds>; Il giorno mar 21 gen 2020 alle ore 13:12 Sergio <lapostadiser...@gmail.com<mailto:lapostadiser...@gmail.com>> ha scritto: Hi guys! I just wanted to confirm with you before doing such an operation. I expect to increase the space but nothing more than this. I need to perform just : UPDATE COLUMN FAMILY cf with GC_GRACE = 691,200; //8 days Is it correct? Thanks, Sergio