Re: Is there a way to add a new node to a cluster but not sync old data?

Kai Wang Thu, 22 Jan 2015 05:52:27 -0800

In last year's summit there was a presentation from Instaclustr -
https://www.instaclustr.com/meetups/presentation-by-ben-bromhead-at-cassandra-summit-2014-san-francisco/.
It could be the solution you are looking for. However I don't see the code
being checked in or JIRA being created. So for now you'd better plan the
capacity carefully.


On Wed, Jan 21, 2015 at 11:21 PM, Yatong Zhang <bluefl...@gmail.com> wrote:

> Yes, my cluster is almost full and there are lots of pending tasks. You
> helped me a lot and thank you Eric~
>
> On Thu, Jan 22, 2015 at 11:59 AM, Eric Stevens <migh...@gmail.com> wrote:
>
>> Yes, bootstrapping a new node will cause read loads on your existing
>> nodes - it is becoming the owner and replica of a whole new set of existing
>> data.  To do that it needs to know what data it's now responsible for, and
>> that's what bootstrapping is for.
>>
>> If you're at the point where bootstrapping a new node is placing a
>> too-heavy burden on your existing nodes, you may be dangerously close to or
>> even past the tipping point where you ought to have already grown your
>> cluster.  You need to grow your cluster as soon as possible, and chances
>> are you're close to no longer being able to keep up with compaction (see
>> nodetool compactionstats, make sure pending tasks is <5, preferably 0 or
>> 1).  Once you're falling behind on compaction, it becomes difficult to
>> successfully bootstrap new nodes, and you're in a very tough spot.
>>
>>
>> On Wed, Jan 21, 2015 at 7:43 PM, Yatong Zhang <bluefl...@gmail.com>
>> wrote:
>>
>>> Thanks for the reply. The bootstrap of new node put a heavy burden on
>>> the whole cluster and I don't know why. So that' the issue I want to fix
>>> actually.
>>>
>>> On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens <migh...@gmail.com> wrote:
>>>
>>>> Yes, but it won't do what I suspect you're hoping for.  If you disable
>>>> auto_bootstrap in cassandra.yaml the node will join the cluster and will
>>>> not stream any old data from existing nodes.
>>>>
>>>> The cluster will now be in an inconsistent state.  If you bring enough
>>>> nodes online this way to violate your read consistency level (eg RF=3,
>>>> CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
>>>> missing data that they ought to have returned.
>>>>
>>>> There is no way to bring a new node online and have it be responsible
>>>> just for new data, and have no responsibility for old data.  It *will* be
>>>> responsible for old data, it just won't *know* about the old data it
>>>> should be responsible for.  Executing a repair will fix this, but only
>>>> because the existing nodes will stream all the missing data to the new
>>>> node.  This will create more pressure on your cluster than just normal
>>>> bootstrapping would have.
>>>>
>>>> I can't think of any reason you'd want to do that unless you needed to
>>>> grow your cluster really quickly, and were ok with corrupting your old 
>>>> data.
>>>>
>>>> On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang <bluefl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> I am using C* 2.0.10 and I was trying to add a new node to a
>>>>> cluster(actually replace a dead node). But after added the new node some
>>>>> other nodes in the cluster had a very high work-load and affected the 
>>>>> whole
>>>>> performance of the cluster.
>>>>> So I am wondering is there a way to add a new node and this node only
>>>>> afford new data?
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Is there a way to add a new node to a cluster but not sync old data?

Reply via email to