Re: Is periodic manual repair necessary?

Thakrar, Jayesh Tue, 28 Feb 2017 07:07:23 -0800

Thanks - getting a better picture of things.

So "entropy" is tendency of a C* datastore to be inconsistent due to 
writes/updates not taking place across ALL nodes that carry replica of a row 
(can happen if nodes are down for maintenance)
It can also happen due to node crashes/restarts that can result in loss of 
uncommitted data.
This can result in either stale data or ghost data (column/row re-appearing 
after a delete).
So there are the "anti-entropy" processes in place to help with this
- hinted handoff
- read repair (can happen while performing a consistent read OR also async as 
driven/configured by *_read_repair_chance AFTER consistent read)
- commit logs
- explicit/manual repair via command
- compaction (compaction is indirect mechanism to purge tombstone, thereby 
ensuring that stale data will NOT resurrect)

So for an application where you have only timeseries data or where data is 
always inserted, I would like to know the need for manual repair?

I see/hear advice that there should always be a periodic (mostly weekly) 
manual/explicit repair in a C* system - and that's what I am trying to 
understand.
Repair is a real expensive process and would like to justify the need to expend 
resources (when and how much) for it.

Among other things, this advice also gives an impression to people not familiar 
with C* (e.g. me) that it is too fragile and needs substantial manual 
intervention.

Appreciate all the feedback and details that you have been sharing.

From: Edward Capriolo <edlinuxg...@gmail.com>
Date: Monday, February 27, 2017 at 8:00 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Cc: Benjamin Roth <benjamin.r...@jaumo.com>
Subject: Re: Is periodic manual repair necessary?

There are 4 anti entropy systems in cassandra.

Hinted handoff
Read repair
Commit logs
Repair commamd

All are basically best effort.

Commit logs get corrupt and only flush periodically.

Bits rot on disk and while crossing networks network

Read repair is async and only happens randomly

Hinted handoff stops after some time and is not guarenteed.
On Monday, February 27, 2017, Thakrar, Jayesh 
<jthak...@conversantmedia.com<mailto:jthak...@conversantmedia.com>> wrote:
Thanks Roth and Oskar for your quick responses.

This is a single datacenter, multi-rack setup.

> A TTL is technically similar to a delete - in the end both create tombstones.
>If you want to eliminate the possibility of resurrected deleted data, you 
>should run repairs.
So why do I need to worry about data resurrection?
Because, the TTL for the data is specified at the row level (atleast in this 
case) i.e. across ALL columns across ALL replicas.
So they all will have the same data or wont have the data at all (i.e. it would 
have been tombstoned).

> If you can guarantuee a 100% that data is read-repaired before 
> gc_grace_seconds after the data has been TTL'ed, you won't need an extra 
> repair.
Why read-repaired before "gc_grace_period"?
Isn't gc_grace_period the grace period for compaction to occur?
So if the data was not consistent and read-repair happens before that, then 
well and good.
Does read-repair not happen after gc/compaction?
If this table has data being constantly/periodically inserted, then compaction 
will also happen accordingly, right?

Thanks,
Jayesh

From: Benjamin Roth 
<benjamin.r...@jaumo.com<javascript:_e(%7B%7D,'cvml','benjamin.r...@jaumo.com');>>
Date: Monday, February 27, 2017 at 11:53 AM
To: 
<user@cassandra.apache.org<javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>>
Subject: Re: Is periodic manual repair necessary?

A TTL is technically similar to a delete - in the end both create tombstones.
If you want to eliminate the possibility of resurrected deleted data, you 
should run repairs.

If you can guarantuee a 100% that data is read-repaired before gc_grace_seconds 
after the data has been TTL'ed, you won't need an extra repair.

2017-02-27 18:29 GMT+01:00 Oskar Kjellin 
<oskar.kjel...@gmail.com<javascript:_e(%7B%7D,'cvml','oskar.kjel...@gmail.com');>>:
Are you running multi dc?

Skickat från min iPad

27 feb. 2017 kl. 16:08 skrev Thakrar, Jayesh 
<jthak...@conversantmedia.com<javascript:_e(%7B%7D,'cvml','jthak...@conversantmedia.com');>>:
Suppose I have an application, where there are no deletes, only 5-10% of rows 
being occasionally updated (and that too only once) and a lot of reads.

Furthermore, I have replication = 3 and both read and write are configured for 
local_quorum.

Occasionally, servers do go into maintenance.

I understand when the maintenance is longer than the period for hinted_handoffs 
to be preserved, they are lost and servers may have stale data.
But I do expect it to be rectified on reads. If the stale data is not read 
again, I don’t care for it to be corrected as then the data will be 
automatically purged because of TTL.

In such a situation, do I need to have a periodic (weekly?) manual/batch 
read_repair process?

Thanks,
Jayesh Thakrar

--
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com<http://www.jaumo.com>
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

--
Sorry this was sent from mobile. Will do less grammar and spell check than 
usual.

Re: Is periodic manual repair necessary?

Reply via email to