Thanks for the explanation Sylvain!
2013/1/16 Sylvain Lebresne <[email protected]>: >> I mean if a node is down, then >> we get that node up and running again, wouldn't it be synchronized >> automatically? > > > It will, thanks to hinted handoff (not gossip, gossip only handle the ring > topology and a bunch of metadata, it doesn't deal with data synchronization > at all). But hinted handoff are not bulletproof (if only because hinted > handoff expire after some time if they are not delivered). And you're right, > that's probably why Carlos' example worked as he observed it, especially > since he didn't mentioned reads between his stop/erase/restart steps. > Anyway, my description of read_repair_chance is still correct if someone > wonder about that :) > > -- > Sylvain > > >> >> Thanks! >> >> >> Renato M. >> >> 2013/1/16 Carlos Pérez Miguel <[email protected]>: >> > ahhhh, ok. Now I understand where the data came from. When using CL.ALL >> > read_repair always repairs inconsistent data. >> > >> > Thanks a lot, Sylvain. >> > >> > >> > Carlos Pérez Miguel >> > >> > >> > 2013/1/17 Sylvain Lebresne <[email protected]> >> >> >> >> You're missing the correct definition of read_repair_chance. >> >> >> >> When you do a read at CL.ALL, all replicas are wait upon and the >> >> results >> >> from all those replicas are compared. From that, we can extract which >> >> nodes >> >> are not up to date, i.e. which ones can be read repair. And if some >> >> node >> >> need to be repair, we do it. Always, whatever the value of >> >> read_repair_chance is. >> >> >> >> Now if you do a read at CL.ONE, if you only end up querying 1 replica, >> >> you >> >> will never be able to do read repair. That's where read_repair_chance >> >> come >> >> into play. What it really control, is how often we query *more* replica >> >> than >> >> strictly required by the consistency level. And it happens that the >> >> reason >> >> you would want to do that is because of read repair and hence the >> >> option >> >> name. But read repair potentially kicks in anytime more than replica >> >> answer >> >> a query. One corollary is that read_repair_chance has no impact >> >> whatsoever >> >> at CL.ALL. >> >> >> >> -- >> >> Sylvain >> >> >> >> >> >> On Wed, Jan 16, 2013 at 1:55 PM, Carlos Pérez Miguel >> >> <[email protected]> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> I am trying to understand the read path in Cassandra. I've read >> >>> Cassandra's documentation and it seems that the read path is like >> >>> this: >> >>> >> >>> - Client contacts with a proxy node which performs the operation over >> >>> certain object >> >>> - Proxy node sends requests to every replica of that object >> >>> - Replica nodes answers eventually if they are up >> >>> - After the first R replicas answer, the proxy node returns value to >> >>> the >> >>> client. >> >>> - If some of the replicas are non updated and readrepair is active, >> >>> proxy >> >>> node updates those replicas. >> >>> >> >>> Ok, so far so good. >> >>> >> >>> But now I found some incoherences that I don't understand: >> >>> >> >>> Let's suppose that we have a 5 node cluster: x1, x2, x3, x4 and x5 >> >>> each with replication factor 3, read_repair_chance=0.0, >> >>> autobootstrap=false and caching=NONE >> >>> We have keyspace KS1 and colunfamily CF1. >> >>> >> >>> With this configuration, we know that if any node crashes and erases >> >>> its >> >>> data directories it will be necesary to run nodetool repair in that >> >>> node in >> >>> order to repair that node and gather information from its replica >> >>> companions. >> >>> >> >>> So, let's suppose that x1, x2 and x3 are the endpoint which stores the >> >>> data KS1.CF1['data1'] >> >>> If x1 crashes (loosing all its data), and we execute get >> >>> KS1.CF1['data1'] >> >>> with consistency level ALL, the operation will fail. That is ok to my >> >>> understanding. >> >>> >> >>> If we restart x1 node and doesn't execute nodetool repair and repeat >> >>> the >> >>> operation get KS1.CF1['data1'] using consistency ALL, we will obtain >> >>> the >> >>> original data! Why? one of the nodes doesn't have any data about >> >>> KS1.CF1['data1']. Ok, let's suppose that as all the required nodes >> >>> answer, >> >>> even if one doesn't have data, the operation ends correctly. >> >>> >> >>> Now let's repeat the same procedure with the rest of nodes, that is: >> >>> >> >>> 1- stop x1, erase data, logs, cache and commitlog from x1 >> >>> 2- restart x1 adn don't repair it >> >>> 3- stop x2, erase data, logs, cache and commitlog from x2 >> >>> 4- restart x2 adn don't repair it >> >>> 5- stop x3, erase data, logs, cache and commitlog from x3 >> >>> 6- restart x3 adn don't repair it >> >>> 7- execute get KS1.CF1['data1'] with consistency level ALL -> still >> >>> return the correct data! >> >>> >> >>> Where did that data come from? the endpoint is supposed to be empty of >> >>> data. I tried this using cassandra-cli and cassandra's ruby client and >> >>> the >> >>> result is always the same. What did I miss? >> >>> >> >>> Thank you for reading until the end, ;) >> >>> >> >>> Bye >> >>> >> >>> Carlos Pérez Miguel >> >> >> >> >> >
