Re: Join_ring=false Use Cases

Matija Gobec Tue, 20 Dec 2016 13:27:42 -0800

There is a talk from cassandra summit 2016 about coordinator nodes by Eric
Lubow from SimpleReach. He explains how you can use that join_ring=false.


On Tue, Dec 20, 2016 at 10:23 PM, kurt Greaves <[email protected]> wrote:

> It seems that you're correct in saying that writes don't propagate to a
> node that has join_ring set to false, so I'd say this is a flaw. In reality
> I can't see many actual use cases in regards to node outages with the
> current implementation. The main usage I'd think would be to have
> additional coordinators for CPU heavy workloads.
>
> It seems to make it actually useful for repairs/outages we'd need to have
> another option to turn on writes so that it behaved similarly to write
> survey mode (but on already bootstrapped nodes).
>
> Is there a reason we don't have this already? Or does it exist somewhere
> I'm not aware of?
>
> On 20 December 2016 at 17:40, Anuj Wadehra <[email protected]> wrote:
>
>> No responses yet :)
>>
>> Any C* expert who could help on join_ring use case and the concern raised?
>>
>> Thanks
>> Anuj
>>
>> On Tue, 13 Dec, 2016 at 11:31 PM, Anuj Wadehra
>> <[email protected]> wrote:
>> Hi,
>>
>> I need to understand the use case of join_ring=false in case of node
>> outages. As per https://issues.apache.org/jira/browse/CASSANDRA-6961,
>> you would want join_ring=false when you have to repair a node before
>> bringing a node back after some considerable outage. The problem I see with
>> join_ring=false is that unlike autobootstrap, the node will NOT accept
>> writes while you are running repair on it. If a node was down for 5 hours
>> and you bring it back with join_ring=false, repair the node for 7 hours and
>> then make it join the ring, it will STILL have missed writes because while
>> the time repair was running (7 hrs), writes only went to other others.
>> So, if you want to make sure that reads served by the restored node at CL
>> ONE will return consistent data after the node has joined, you wont get
>> that as writes have been missed while the node is being repaired. And if
>> you work with Read/Write CL=QUORUM, even if you bring back the node without
>> join_ring=false, you would anyways get the desired consistency. So, how
>> join_ring would provide any additional consistency in this case ??
>>
>> I can see join_ring=false useful only when I am restoring from Snapshot
>> or bootstrapping and there are dropped mutations in my cluster which are
>> not fixed by hinted handoff.
>>
>> For Example: 3 nodes A,B,C working at Read/Write CL QUORUM. Hinted
>> Handoff=3 hrs.
>> 10 AM Snapshot taken on all 3 nodes
>> 11 AM: Node B goes down for 4 hours
>> 3 PM: Node B comes up but data is not repaired. So, 1 hr of dropped
>> mutations (2-3 PM) not fixed via Hinted Handoff.
>> 5 PM: Node A crashes.
>> 6 PM: Node A restored from 10 AM Snapshot, Node A started with
>> join_ring=false, repaired and then joined the cluster.
>>
>> In above restore snapshot example, updates from 2-3 PM were outside
>> hinted handoff window of 3 hours. Thus, node B wont get those updates.
>> Node A data for 2-3 PM is already lost. So, 2-3 PM updates are only on one
>> replica i.e. node C and minimum consistency needed is QUORUM so
>> join_ring=false would help. But this is very specific use case.
>>
>> Thanks
>> Anuj
>>
>>
>

Re: Join_ring=false Use Cases

Reply via email to