Re: Catastrophy Recovery.

Jean Tremblay Mon, 15 Jun 2015 08:59:51 -0700

That is really wonderful. Thank you very much Alain. You gave me a lot of 
trails to investigate. Thanks again for you help.


On 15 Jun 2015, at 17:49 , Alain RODRIGUEZ 
<arodr...@gmail.com<mailto:arodr...@gmail.com>> wrote:

Hi, it looks like your starting to use Cassandra.

Welcome.

I invite you to read from here as much as you can 
http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html.

When a node lose some data you have various anti entropy mechanism

Hinted Handoff --> For writes that occurred while node was down and known as 
such by other nodes (exclusively)
Read repair --> On each read, you can set a chance to check other nodes for 
auto correction.
Repair ( called either manual / anti entropy / full / ...) : Which takes care 
to give back a node its missing data only for the range this node handles (-pr) 
or for all its data (its range plus its replica). This is something you 
generally want to perform on all nodes on a regular basis (lower than the 
lowest gc_grace_period set on any of your tables).

Also, you are having wrong values because you probably have a Consistency Level 
(CL) too low. If you want this to never happen you have to set Read (R) / Write 
(W) consistency level as follow : R + W > RF (Refplication Factor), if not you 
can see what you are currently seeing. I advise you to set your consistency to 
"local_quorum" or "quorum" on single DC environment. Also, with 3 nodes, you 
should set RF to 3, if not you won't be able to reach a strong consistency due 
to the formula I just give you.

There is a lot more to know, you should read about this all. Using Cassandra 
without knowing about its internals would lead you to very poor and unexpected 
results.

To answer your questions:

"For what I understand, if you have a fixed node with no data it will 
automatically bootstrap and recover all its old data from its neighbour while 
doing the joining phase. Is this correct?"

--> Not at all, unless it join the ring for the first time, which is not your 
case. Through it will (by default) slowly recover while you read.

"After such catastrophe, and after the joining phase is done should the cluster 
not be ready to deliver always consistent data if there was no inserts or 
delete during the catastrophe?"

No, we can't ensure that, excepted dropping the node and bootstrapping a new 
one. What we can make sure of is that there is enough replica remaining to 
serve consistent data (search for RF and CL)

"After the bootstrap of a broken node is finish, i.e. after the joining phase, 
is there not simply a repair to be done on that node using “node repair"?"

This sentence is false bootstrap / joining phase ≠ from broken node coming 
back. You are right on repair, if a broken node (or down for too long - default 
3 hours) come back you have to repair. But repair is slow, make sure you can 
afford a node, see my previous answer.

Testing is a really good idea but you also have to read a lot imho.

Good luck,

C*heers,

Alain


2015-06-15 11:13 GMT+02:00 Jean Tremblay 
<jean.tremb...@zen-innovations.com<mailto:jean.tremb...@zen-innovations.com>>:

Hi,

I have a cluster of 3 nodes RF: 2.
There are about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput 
has been change.

I am have tested a scenario where one node crashes and loose all its data.
I have deleted all data on this node after having stopped Cassandra.
At this point I noticed that the cluster was giving proper results. What I was 
expecting from a cluster DB.

I then restarted that node and I observed that the node was joining the cluster.
After an hour or so the old “defect” node was up and normal.
I noticed that its hard disk loaded with much less data than its neighbours.

When I was querying the DB, the cluster was giving me different results for 
successive identical queries.
I guess the old “defect” node was giving me less rows than it should have.

1) For what I understand, if you have a fixed node with no data it will 
automatically bootstrap and recover all its old data from its neighbour while 
doing the joining phase. Is this correct?
2) After such catastrophe, and after the joining phase is done should the 
cluster not be ready to deliver always consistent data if there was no inserts 
or delete during the catastrophe?
3) After the bootstrap of a broken node is finish, i.e. after the joining 
phase, is there not simply a repair to be done on that node using “node repair"?


Thanks for your comments.

Kind regards

Jean

Re: Catastrophy Recovery.

Reply via email to