Setting gc_grace_seconds to zero and skipping "nodetool repair (was RE: Timeseries with TTL)

Donald Smith Mon, 07 Apr 2014 11:01:24 -0700

This statement is significant: “BTW if you never delete and only ttl your 
values at a constant value, you can set gc=0 and forget about periodic repair 
of the table, saving some space, IO, CPU, and an operational step.”


Setting gc_grace_seconds to zero has the effect of not storing hinted handoffs 
(which prevent deleted data from reappearing), I believe.   “Periodic repair” 
refers to running “nodetool repair” (aka Anti-Entropy).

I too have wondered if setting gc_grace_seconds to zero and skipping “nodetool 
repair” are safe.

We’re using C* 2.0.6. In the 2.0.X versions, with vnodes, “nodetool repair …” 
is very slow (see https://issues.apache.org/jira/browse/CASSANDRA-5220 and 
https://issues.apache.org/jira/browse/CASSANDRA-6611).    We found read repairs 
via “nodetool repair” unacceptably slow, even when we restricted it to one 
table, and often the repairs hung or failed.  We also tried subrange repairs 
and the other options.

Our app does no deletes and only rarely updates a row (if there was bad data 
that needs to be replaced).  So it’s very tempting to set gc_grace_seconds = 0 
in the table definitions and skip read repairs.

But there is Cassandra documentation that warns that read repairs are necessary 
even if you don’t do deletes. For example, 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
 says:

     Note: If deletions never occur, you should still schedule regular repairs. 
Be aware that setting a column to null is a delete.

The apache wiki  
https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair says:

Unless your application performs no deletes, it is strongly recommended that 
production clusters run nodetool repair periodically on all nodes in the 
cluster.

*IF* your operations team is sufficiently on the ball, you can get by without 
repair as long as you do not have hardware failure -- in that case, 
HintedHandoff<https://wiki.apache.org/cassandra/HintedHandoff> is adequate to 
repair successful updates that some replicas have missed. Hinted handoff is 
active for max_hint_window_in_ms after a replica fails.

Full repair or re-bootstrap is necessary to re-replicate data lost to hardware 
failure (see below).
So, if there are hardware failures, “nodetool repair” is needed.  And 
http://planetcassandra.org/general-faq/ says:

Anti-Entropy Node Repair – For data that is not read frequently, or to update 
data on a node that has been down for an extended period, the node repair 
process (also referred to as anti-entropy repair) ensures that all data on a 
replica is made consistent. Node repair (using the nodetool utility) should be 
run routinely as part of regular cluster maintenance operations.

If RF=2, ReadConsistency is ONE and data failed to get replicated to the second 
node, then during a read might the app incorrectly return “missing data”?

It seems to me that the need to run “nodetool repair” reflects a design bug; it 
should be automated.

Don

From: Laing, Michael [mailto:[email protected]]
Sent: Sunday, April 06, 2014 11:31 AM
To: [email protected]
Subject: Re: Timeseries with TTL

Since you are using LeveledCompactionStrategy there is no major/minor 
compaction - just compaction.

Leveled compaction does more work - your logs don't look unreasonable to me - 
the real question is whether your nodes can keep up w the IO. SSDs work best.

BTW if you never delete and only ttl your values at a constant value, you can 
set gc=0 and forget about periodic repair of the table, saving some space, IO, 
CPU, and an operational step.

If your nodes cannot keep up the IO, switch to SizeTieredCompaction and monitor 
read response times. Or add SSDs.

In my experience, for smallish nodes running C* 2 without SSDs, 
LeveledCompactionStrategy can cause the disk cache to churn, reducing read 
performance substantially. So watch out for that.

Good luck,

Michael

On Sun, Apr 6, 2014 at 10:25 AM, Vicent Llongo 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

Most of the queries to that table are just getting a range of values for a 
metric:
SELECT val FROM metrics_5min WHERE uid = ? AND metric = ? AND ts >= ? AND ts <= 
?

I'm not sure from the logs what kind of compactions they are. This is what I 
see in system.log (grepping for that specific table):

...
INFO [CompactionExecutor:742] 2014-04-06 13:30:11,223 CompactionTask.java (line 
105) Compacting 
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14991-Data.db'),
 
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14990-Data.db')]
INFO [CompactionExecutor:753] 2014-04-06 13:35:22,495 CompactionTask.java (line 
105) Compacting 
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14992-Data.db'),
 
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14993-Data.db')]
INFO [CompactionExecutor:770] 2014-04-06 13:41:09,146 CompactionTask.java (line 
105) Compacting 
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14995-Data.db'),
 
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14994-Data.db')]
INFO [CompactionExecutor:783] 2014-04-06 13:46:21,250 CompactionTask.java (line 
105) Compacting 
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14996-Data.db'),
 
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14997-Data.db')]
INFO [CompactionExecutor:798] 2014-04-06 13:51:28,369 CompactionTask.java (line 
105) Compacting 
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14998-Data.db'),
 
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14999-Data.db')]
INFO [CompactionExecutor:816] 2014-04-06 13:57:17,585 CompactionTask.java (line 
105) Compacting 
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-15000-Data.db'),
 
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-15001-Data.db')]
...

As you can see every ~5 minutes there's a compaction going on.


On Sun, Apr 6, 2014 at 4:33 PM, Sergey Murylev 
<[email protected]<mailto:[email protected]>> wrote:
Hi Vincent,



Is that a good pattern for Cassandra? Is there some compaction tunings I should 
take into account?
Actually it depends on how you use Cassandra :). If you use it as key-value 
storage TTL works fine. But if you would use rather complex CQL queries to this 
table I not sure that it would be good.



With this structure is obvious that after one week inserting data, from that 
moment there's gonna be new expired columns every 5 minutes in that table. 
Because of that I've noticed that this table is being compacted every 5 minutes.
Compaction doesn't triggered when some column expired. It triggered on 
gc_grace_seconds timeout and according compaction strategy. You can see more 
detailed description of LeveledCompactionStrategy in following article: Leveled 
compaction in 
Cassandra<http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra>.

There are 2 types of compaction: minor and major, which kind of compaction do 
you see and how come to conclusion that compaction triggered every 5 minutes? 
If you see major compaction that situation is very bad otherwise it is normal 
case.

--
Thanks,
Sergey


On 06/04/14 15:48, Vicent Llongo wrote:
Hi there,
I have this table where I'm inserting timeseries values with a TTL of 86400*7 
(1week):

CREATE TABLE metrics_5min (
  object_id varchar,
  metric varchar,
  ts timestamp,
  val double,
  PRIMARY KEY ((object_id, metric), ts)
)
WITH gc_grace_seconds = 86400
AND compaction = {'class': 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 
100};

With this structure is obvious that after one week inserting data, from that 
moment there's gonna be new expired columns every 5 minutes in that table. 
Because of that I've noticed that this table is being compacted every 5 minutes.

Is that a good pattern for Cassandra? Is there some compaction tunings I should 
take into account?
Thanks!

Setting gc_grace_seconds to zero and skipping "nodetool repair (was RE: Timeseries with TTL)

Reply via email to