never mind , I see that if leader/owner dies, the other replicas can simply use whoever has the highest count of the leader bucket, though not the authoritative number
On Tue, May 31, 2011 at 1:21 AM, Yang <teddyyyy...@gmail.com> wrote: > thanks Sylvain, I agree with what you said for the first few paragraphs > ---- Jeremy corrected me just now. > > regarding the last point, you are right in using the term "by operation", > but you should also note that it's a leader > "data ownership", in the meaning that the leader has the authoritative > power when it comes to reconciliation on that > bucket of count owned by the leader ----- yes you've convinced me that we > DO need to use CL > ONE, but for the sake of > argument, if CL = ONE is used, the leader's data loss causes the other > replicas to not being able to reconcile, that's what I mean. > but anyway it's not relevant now since CL can be > ONE > > > but I'd really appreciate if you could give some review to my newer post on > FIFO, I think that could be an interesting approach > > > yang > > > On Tue, May 31, 2011 at 12:59 AM, Sylvain Lebresne > <sylv...@datastax.com>wrote: >> >> >apart from the questions, some thoughts on Counters: >> >the idea of distributed counters can be seen, in distributed algorithms >> terms, as a state machine (see Fred Schneider 93'), where ideally we send >> the messages (delta increments) to each node, and the final state (sum of >> deltas, or the counter value) is deduced independently at each node. in the >> current implementation, it's really not a distributed state machine, since >> state is deduced only at the leader, and what is replicated is just the >> final state. in fact, the data from different leaders are orthogonal, and >> within the data flow from one leader, it's really just a master-slave >> system. then we realize that this system is prone to single master failure. >> >> Don't get fooled by the term 'leader': there is one leader *by >> operation*, not one global leader. Again, the leader of an operation >> is really just 'the first of the replica we're replicating to'. >> >> It's not more a master-slave design than regular writes are because >> they use a distinguished coordinator node for each operation. And it's >> not prone to single node failure because if you do counter increments >> at CL.QUORUM against say a cluster with RF=3, then you will still be >> able to write and read even if one node is down and which node exactly >> doesn't matter at all. >> >> -- >> Sylvain >> > >