Re: Is there any concern about increasing gc_grace_seconds from 5 days to 8 days?

Sergio Wed, 22 Jan 2020 13:47:08 -0800

Thanks for the explanation. It should deserve a blog post

Sergio


On Wed, Jan 22, 2020, 1:22 PM Reid Pinchback <rpinchb...@tripadvisor.com>
wrote:

> The reaper logs will say if nodes are being skipped.  The web UI isn’t
> that good at making it apparent.  You can sometimes tell it is likely
> happening when you see time gaps between parts of the repair.  This is for
> when nodes are skipped because of a timeout, but not only that.  The gaps
> are mostly controlled by the combined results of segmentCountPerNode,
> repairIntensity, and hangingRepairTimeoutMins.  The last of those three is
> the most obvious influence on timeouts, but the other two have some impact
> on the work attempted and the size of the time gaps.  However the C*
> version also has some bearing, as it influences how hard it is to process
> the data needed for repairs.
>
>
>
> The more subtle aspect of node skipping isn’t the hanging repairs.  When
> repair of a token range is first attempted, Reaper uses JMX to ask C* if a
> repair is already underway.  The way it asks is very simplistic, so it
> doesn’t mean a repair is underway for that particular token range.  It just
> means something looking like a repair is going on.  Basically it just asks
> “hey is there a thread with the right magic naming pattern?”  The problem I
> think is that when you get some repair activity triggered on reads and
> writes for inconsistent data, I believe they show up as these kinds of
> threads too.  If you have a bad usage pattern of C* (where you write then
> very soon read back) then logically you’d expect this to happen quite a lot.
>
>
>
> I’m not an expert on the internals since I’m not one of the C*
> contributors, but having stared at that part of the source quite a bit this
> year, that’s my take on what can happen.  And if I’m correct, that’s not a
> thing you can tune for. It is a consequence of C*-unfriendly usage patterns.
>
>
>
> Bottom line though is that tuning repairs is only something you do if you
> find that repairs are taking longer than makes sense to you.  It’s totally
> separate from the notion that you should be able to run reaper-controlled
> repairs at least 2x per gc grace seconds.  That’s just a case of making
> some observations on the arithmetic of time intervals.
>
>
>
>
>
> *From: *Sergio <lapostadiser...@gmail.com>
> *Date: *Wednesday, January 22, 2020 at 4:08 PM
> *To: *Reid Pinchback <rpinchb...@tripadvisor.com>
> *Cc: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Is there any concern about increasing gc_grace_seconds
> from 5 days to 8 days?
>
>
>
> *Message from External Sender*
>
> Thank you very much for your extended response.
>
> Should I look in the log some particular message to detect such behavior?
>
> How do you tune it ?
>
>
>
> Thanks,
>
>
>
> Sergio
>
>
>
> On Wed, Jan 22, 2020, 12:59 PM Reid Pinchback <rpinchb...@tripadvisor.com>
> wrote:
>
> Kinda. It isn’t that you have to repair twice per se, just that the
> possibility of running repairs at least twice before GC grace seconds
> elapse means that clearly there is no chance of a tombstone not being
> subject to repair at least once before you hit your GC grace seconds.
>
>
>
> Imagine a tombstone being created on the very first node that Reaper
> looked at in a repair cycle, but one second after Reaper completed repair
> of that particular token range.  Repairs will be complete, but that
> particular tombstone just missed being part of the effort.
>
>
>
> Now your next repair run happens.  What if Reaper doesn’t look at that
> same node first?  It is easy to have happen, as there is a bunch of logic
> related to detection of existing repairs or things taking too long.  So the
> box that was “the first node” in that first repair run, through bad luck
> gets kicked down to later in the second run.  I’ve seen nodes get skipped
> multiple times (you can tune to reduce that, but bottom line… it happens).
> So, bad luck you’ve got.  Eventually the node does get repaired, and the
> aging tombstone finally gets removed.  All fine and dandy…
>
>
>
> Provided that the second repair run got to that point BEFORE you hit your
> GC grace seconds.
>
>
>
> That’s why you need enough time to run it twice.  Because you need enough
> time to catch the oldest possible tombstone, even if it is dealt with at
> the very end of a repair run.  Yes, it sounds like a bit of a degenerate
> case, but if you are writing a lot of data, the probability of not having
> the degenerate cases become real cases becomes vanishingly small.
>
>
>
> R
>
>
>
>
>
> *From: *Sergio <lapostadiser...@gmail.com>
> *Date: *Wednesday, January 22, 2020 at 1:41 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>, Reid
> Pinchback <rpinchb...@tripadvisor.com>
> *Subject: *Re: Is there any concern about increasing gc_grace_seconds
> from 5 days to 8 days?
>
>
>
> *Message from External Sender*
>
> I was wondering if I should always complete 2 repairs cycles with reaper
> even if one repair cycle finishes in 7 hours.
>
> Currently, I have around 200GB in column family data size to be repaired
> and I was scheduling once repair a week and I was not having too much
> stress on my 8 nodes cluster with i3xlarge nodes.
>
> Thanks,
>
> Sergio
>
>
>
> Il giorno mer 22 gen 2020 alle ore 08:28 Sergio <lapostadiser...@gmail.com>
> ha scritto:
>
> Thank you very much! Yes I am using reaper!
>
>
>
> Best,
>
>
>
> Sergio
>
>
>
> On Wed, Jan 22, 2020, 8:00 AM Reid Pinchback <rpinchb...@tripadvisor.com>
> wrote:
>
> Sergio, if you’re looking for a new frequency for your repairs because of
> the change, if you are using reaper, then I’d go for repair_freq <=
> gc_grace / 2.
>
>
>
> Just serendipity with a conversation I was having at work this morning.
> When you actually watch the reaper logs then you can see situations where
> unlucky timing with skipped nodes can make the time to remove a tombstone
> be up to 2 x repair_run_time.
>
>
>
> If you aren’t using reaper, your mileage will vary, particularly if your
> repairs are consistent in the ordering across nodes.  Reaper can be
> moderately non-deterministic hence the need to be sure you can complete at
> least two repair runs.
>
>
>
> R
>
>
>
> *From: *Sergio <lapostadiser...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Tuesday, January 21, 2020 at 7:13 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Is there any concern about increasing gc_grace_seconds
> from 5 days to 8 days?
>
>
>
> *Message from External Sender*
>
> Thank you very much for your response.
>
> The considerations mentioned are the ones that I was expecting.
>
> I believe that I am good to go.
>
> I just wanted to make sure that there was no need to run any other extra
> command beside that one.
>
>
>
> Best,
>
>
>
> Sergio
>
>
>
> On Tue, Jan 21, 2020, 3:55 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
> Note that if you're actually running repairs within 5 days, and you adjust
> this to 8, you may stream a bunch of tombstones across in that 5-8 day
> window, which can increase disk usage / compaction (because as you pass 5
> days, one replica may gc away the tombstones, the others may not because
> the tombstones shadow data, so you'll re-stream the tombstone to the other
> replicas)
>
>
>
> On Tue, Jan 21, 2020 at 3:28 PM Elliott Sims <elli...@backblaze.com>
> wrote:
>
> In addition to extra space, queries can potentially be more expensive
> because more dead rows and tombstones will need to be scanned.  How much of
> a difference this makes will depend drastically on the schema and access
> pattern, but I wouldn't expect going from 5 days to 8 to be very noticeable.
>
>
>
> On Tue, Jan 21, 2020 at 2:14 PM Sergio <lapostadiser...@gmail.com> wrote:
>
> https://stackoverflow.com/a/22030790
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_a_22030790&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=qt1NAYTks84VVQ4WGXWkK6pw85m3FcuUjPRJPdIHMdw&s=aEgz5F5HRxPT3w4hpfNXQRhcchwRjrpf7KB3QyywO_Q&e=>
>
>
>
> For CQLSH
>
> alter table <table_name> with GC_GRACE_SECONDS = <seconds>;
>
>
>
>
>
> Il giorno mar 21 gen 2020 alle ore 13:12 Sergio <lapostadiser...@gmail.com>
> ha scritto:
>
> Hi guys!
>
> I just wanted to confirm with you before doing such an operation. I expect
> to increase the space but nothing more than this. I  need to perform just :
>
> UPDATE COLUMN FAMILY cf with GC_GRACE = 691,200; //8 days
>
> Is it correct?
>
> Thanks,
>
> Sergio
>
>

Re: Is there any concern about increasing gc_grace_seconds from 5 days to 8 days?

Reply via email to