There is a talk from cassandra summit 2016 about coordinator nodes by Eric Lubow from SimpleReach. He explains how you can use that join_ring=false.
On Tue, Dec 20, 2016 at 10:23 PM, kurt Greaves <[email protected]> wrote: > It seems that you're correct in saying that writes don't propagate to a > node that has join_ring set to false, so I'd say this is a flaw. In reality > I can't see many actual use cases in regards to node outages with the > current implementation. The main usage I'd think would be to have > additional coordinators for CPU heavy workloads. > > It seems to make it actually useful for repairs/outages we'd need to have > another option to turn on writes so that it behaved similarly to write > survey mode (but on already bootstrapped nodes). > > Is there a reason we don't have this already? Or does it exist somewhere > I'm not aware of? > > On 20 December 2016 at 17:40, Anuj Wadehra <[email protected]> wrote: > >> No responses yet :) >> >> Any C* expert who could help on join_ring use case and the concern raised? >> >> Thanks >> Anuj >> >> On Tue, 13 Dec, 2016 at 11:31 PM, Anuj Wadehra >> <[email protected]> wrote: >> Hi, >> >> I need to understand the use case of join_ring=false in case of node >> outages. As per https://issues.apache.org/jira/browse/CASSANDRA-6961, >> you would want join_ring=false when you have to repair a node before >> bringing a node back after some considerable outage. The problem I see with >> join_ring=false is that unlike autobootstrap, the node will NOT accept >> writes while you are running repair on it. If a node was down for 5 hours >> and you bring it back with join_ring=false, repair the node for 7 hours and >> then make it join the ring, it will STILL have missed writes because while >> the time repair was running (7 hrs), writes only went to other others. >> So, if you want to make sure that reads served by the restored node at CL >> ONE will return consistent data after the node has joined, you wont get >> that as writes have been missed while the node is being repaired. And if >> you work with Read/Write CL=QUORUM, even if you bring back the node without >> join_ring=false, you would anyways get the desired consistency. So, how >> join_ring would provide any additional consistency in this case ?? >> >> I can see join_ring=false useful only when I am restoring from Snapshot >> or bootstrapping and there are dropped mutations in my cluster which are >> not fixed by hinted handoff. >> >> For Example: 3 nodes A,B,C working at Read/Write CL QUORUM. Hinted >> Handoff=3 hrs. >> 10 AM Snapshot taken on all 3 nodes >> 11 AM: Node B goes down for 4 hours >> 3 PM: Node B comes up but data is not repaired. So, 1 hr of dropped >> mutations (2-3 PM) not fixed via Hinted Handoff. >> 5 PM: Node A crashes. >> 6 PM: Node A restored from 10 AM Snapshot, Node A started with >> join_ring=false, repaired and then joined the cluster. >> >> In above restore snapshot example, updates from 2-3 PM were outside >> hinted handoff window of 3 hours. Thus, node B wont get those updates. >> Node A data for 2-3 PM is already lost. So, 2-3 PM updates are only on one >> replica i.e. node C and minimum consistency needed is QUORUM so >> join_ring=false would help. But this is very specific use case. >> >> Thanks >> Anuj >> >> >
