thanks Sylvain, I agree with what you said for the first few paragraphs ---- Jeremy corrected me just now.
regarding the last point, you are right in using the term "by operation", but you should also note that it's a leader "data ownership", in the meaning that the leader has the authoritative power when it comes to reconciliation on that bucket of count owned by the leader ----- yes you've convinced me that we DO need to use CL > ONE, but for the sake of argument, if CL = ONE is used, the leader's data loss causes the other replicas to not being able to reconcile, that's what I mean. but anyway it's not relevant now since CL can be > ONE but I'd really appreciate if you could give some review to my newer post on FIFO, I think that could be an interesting approach yang On Tue, May 31, 2011 at 12:59 AM, Sylvain Lebresne <sylv...@datastax.com>wrote: > > >apart from the questions, some thoughts on Counters: > >the idea of distributed counters can be seen, in distributed algorithms > terms, as a state machine (see Fred Schneider 93'), where ideally we send > the messages (delta increments) to each node, and the final state (sum of > deltas, or the counter value) is deduced independently at each node. in the > current implementation, it's really not a distributed state machine, since > state is deduced only at the leader, and what is replicated is just the > final state. in fact, the data from different leaders are orthogonal, and > within the data flow from one leader, it's really just a master-slave > system. then we realize that this system is prone to single master failure. > > Don't get fooled by the term 'leader': there is one leader *by > operation*, not one global leader. Again, the leader of an operation > is really just 'the first of the replica we're replicating to'. > > It's not more a master-slave design than regular writes are because > they use a distinguished coordinator node for each operation. And it's > not prone to single node failure because if you do counter increments > at CL.QUORUM against say a cluster with RF=3, then you will still be > able to write and read even if one node is down and which node exactly > doesn't matter at all. > > -- > Sylvain >