Re: Join_ring=false Use Cases

Anuj Wadehra Wed, 21 Dec 2016 19:17:19 -0800

Thanks All !!
I think the intent of the JIRA https://issues.apache.org/ 
jira/browse/CASSANDRA-6961 was to primarily deal with stale information after 
outages and give opportunity for repairing the data before a node joins the 
cluster. If a node started with join_ring=false doesn't accept writes while the 
repair is happening, the purpose of JIRA is defeated as it will anyways lead to 
stale information. Seems to be a defect.


ThanksAnuj


    On Wednesday, 21 December 2016 2:53 AM, kurt Greaves <[email protected]> 
wrote:
 

 It seems that you're correct in saying that writes don't propagate to a node 
that has join_ring set to false, so I'd say this is a flaw. In reality I can't 
see many actual use cases in regards to node outages with the current 
implementation. The main usage I'd think would be to have additional 
coordinators for CPU heavy workloads.

It seems to make it actually useful for repairs/outages we'd need to have 
another option to turn on writes so that it behaved similarly to write survey 
mode (but on already bootstrapped nodes).

Is there a reason we don't have this already? Or does it exist somewhere I'm 
not aware of? 

On 20 December 2016 at 17:40, Anuj Wadehra <[email protected]> wrote:

No responses yet :)
Any C* expert who could help on join_ring use case and the concern raised?
Thanks
Anuj 
 
 On Tue, 13 Dec, 2016 at 11:31 PM, Anuj Wadehra<[email protected]> wrote:  
 Hi,
I need to understand the use case of join_ring=false in case of node outages. 
As per https://issues.apache.org/ jira/browse/CASSANDRA-6961, you would want 
join_ring=false when you have to repair a node before bringing a node back 
after some considerable outage. The problem I see with join_ring=false is that 
unlike autobootstrap, the node will NOT accept writes while you are running 
repair on it. If a node was down for 5 hours and you bring it back with 
join_ring=false, repair the node for 7 hours and then make it join the ring, it 
will STILL have missed writes because while the time repair was running (7 
hrs), writes only went to other others. So, if you want to make sure that reads 
served by the restored node at CL ONE will return consistent data after the 
node has joined, you wont get that as writes have been missed while the node is 
being repaired. And if you work with Read/Write CL=QUORUM, even if you bring 
back the node without join_ring=false, you would anyways get the desired 
consistency. So, how join_ring would provide any additional consistency in this 
case ??
I can see join_ring=false useful only when I am restoring from Snapshot or 
bootstrapping and there are dropped mutations in my cluster which are not fixed 
by hinted handoff.
For Example: 3 nodes A,B,C working at Read/Write CL QUORUM. Hinted Handoff=3 
hrs.10 AM Snapshot taken on all 3 nodes11 AM: Node B goes down for 4 hours3 PM: 
Node B comes up but data is not repaired. So, 1 hr of dropped mutations (2-3 
PM) not fixed via Hinted Handoff.5 PM: Node A crashes.6 PM: Node A restored 
from 10 AM Snapshot, Node A started with join_ring=false, repaired and then 
joined the cluster.
In above restore snapshot example, updates from 2-3 PM were outside hinted 
handoff window of 3 hours. Thus, node B wont get those updates. Node A data for 
2-3 PM is already lost. So, 2-3 PM updates are only on one replica i.e. node C 
and minimum consistency needed is QUORUM so join_ring=false would help. But 
this is very specific use case.  
ThanksAnuj

Re: Join_ring=false Use Cases

Reply via email to