> using CL=ONE (read) and CL=ALL(write). Using this setting you are saying the application should fail in the case of a network partition. You are valuing Consistency and Availability over Partition Tolerance. Mixing the CL levels in response to a partition will make it difficult to reason about the consistency of the data. Consider other approaches such as CL QUORUM.
While using CL ONE for read looks good. If you require strong consistency you have to use ALL for writes. QUORUM for reads and writes may be a better choice. Using RF 3 and 6 nodes would give you a pretty good availability in the face of node failures (for background http://thelastpickle.com/2011/06/13/Down-For-Me/ ) Or relax the Consistency and us CL ONE for reads and CL QUORUM for writes. The writes are still sent to RF nodes. But we can no longer guarantee reads will see them. If the high RF is for > Suppose that connectivity breaks down (for whatever reason) causing two > isolated segments: > S1 = {A,B,C,D} and S2 = {E,F}. Do clients still have access to the entire cluster ? Normally we would expect clients to try different nodes until they either fail or find a partition with enough UP nodes to service the request. > to be able to write at all, the CL strategy definitely needs to be changed. > In S1, for instance change to CL=QUORUM for both reads/writes > In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO Whatever the choice you can imagine a partition where the only thing that works for writes is CL ONE. e.g. if it split 3/3 QUOURM would not work. > So now to the interesting question, what happens when S1 and S2 reestablish > full connectivity again ? If you are using CL ALL for writes the easiest things to do is stop writing when the cluster partitions. And resume when it comes back. If you drop the CL during writes reads will be inconsistent until either HH has finished or you run repairs. > It is extremly important that reads will continue to operate in both S1 and > S2 if it's important that reads continue and are Consistent, I would look at RF 3 with QUOURM / QUOURM. If it's important that reads continue and consistency can be relaxed I would look at RF 3 (or 6) and read ONE write QUOURM Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 11:00 PM, Robert Hellmans <robert.hellm...@aastra.com> wrote: > Hi ! > > I'm preparing the test below. I've found a lot of information about deadnode > replacements and adding extra nodes to increase capacity, but didn't find > anything about this segementation issue. Anyone that can share > experience/ideas ? > > > Setup: > Cluster with 6 nodes {A,B,C,D,E,F}, RF=6, using CL=ONE (read) and > CL=ALL(write). > > > Suppose that connectivity breaks down (for whatever reason) causing two > isolated segments: > S1 = {A,B,C,D} and S2 = {E,F}. > > Cluster connectivity anomalities will be detected by all nodes in this setup, > so clients in S1 and S2 can be advised > to change their CL strategy. It is extremly important that reads will > continue to operate in both S1 and S2 > and I don't see any reason why they shouldn't. It is almost that important > that writes in each segment can continue, but > to be able to write at all, the CL strategy definitely needs to be changed. > In S1, for instance change to CL=QUORUM for both reads/writes > In S2, CL(write) change to TWO/ONE/ANY. CL(read) may be changed to TWO > > During the connectivity breakdown, clients in both S1 and S2 simultaneously > change/add/delete data. > > > > So now to the interesting question, what happens when S1 and S2 reestablish > full connectivity again ? > Again, the re-connectivity event will be detected, so should I trig some > special repair sequence ? > Or should I've been doing some actions already when the connectivity broke ? > What about connectivity dropout time, longer/shorter than max_hint_window ? > > > > > Rds /Robert > > >