We’ve successfully used the rsynch method you outline quite a few times in
situations where we’ve had clusters that take forever to add new nodes
(mainly due to secondary indexes) and need to do a quick replacement for
one reason or another. As you mention, the main disadvantage we ran into is
that the node doesn’t get cleaned up through the replacement process like a
newly streamed node does (plus the extra operational complexity).


On Thu, 15 Sep 2016 at 19:47 Vasileios Vlachos <vasileiosvlac...@gmail.com>

> Hello and thanks for your responses,
> OK, so increasing stream_throughput_outbound_megabits_per_sec makes no
> difference. Any ideas why streaming is limited to only two of the three
> nodes available?
> As an alternative to slow streaming I tried this:
>   - install C* on a new node, stop the service and delete
> /var/lib/cassandra/*
>  - rsync /etc/cassandra from old node to new node
>  - rsync /var/lib/cassandra from old node to new node
>  - stop C* on the old node
>  - rsync /var/lib/cassandra from old node to new node
>  - move the old node to a different IP
>  - move the new node to the old node's original IP
>  - start C* on the new node (no need for the replace_node option in
> cassandra-env.sh)
> This technique has been successful so far for a demo cluster with fewer
> data. The only disadvantage for us is that we were hoping that by streaming
> the SSTables to the new node, tombstones would be discarded (freeing a lot
> of disk space on our live cluster). This is exactly what happened for the
> one node we streamed so far; unfortunately, the slow streaming generates a
> lot of hints which makes recovery a very long process.
> Do you guys see any other problems with the rsync method that I've skipped?
> Regarding the tombstones issue (if we finally do what I described above),
> I'm thinking sstablsplit. Then compaction should deal with it (I think). I
> have not used sstablesplit in the past, so another thing I'd like to ask is
> if you guys find this a good/bad idea for what I'm trying to do.
> Many thanks,
> Vasilis
> On Mon, Sep 12, 2016 at 6:42 PM, Jeff Jirsa <jji...@apache.org> wrote:
>> On 2016-09-12 09:38 (-0700), daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>> > Re. throughput. That looks slow for jumbo with 10g. Check your networks.
>> >
>> >
>> It's extremely unlikely you'll be able to saturate a 10g link with a
>> single instance cassandra.
>> Faster Cassandra streaming is a work in progress - being able to send
>> more than one file at a time is probably the most obvious area for
>> improvement, and being able to better deal with the CPU / garbage generated
>> on the receiving side is just behind that. You'll likely be able to stream
>> 10-15 MB/s per sending server or cpu core, whichever is less (in a vnode
>> setup, you'll be cpu bound - in a single-token setup, you'll be stream
>> bound).
> --
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798

Reply via email to