We’ve successfully used the rsynch method you outline quite a few times in situations where we’ve had clusters that take forever to add new nodes (mainly due to secondary indexes) and need to do a quick replacement for one reason or another. As you mention, the main disadvantage we ran into is that the node doesn’t get cleaned up through the replacement process like a newly streamed node does (plus the extra operational complexity).
Cheers Ben On Thu, 15 Sep 2016 at 19:47 Vasileios Vlachos <vasileiosvlac...@gmail.com> wrote: > Hello and thanks for your responses, > > OK, so increasing stream_throughput_outbound_megabits_per_sec makes no > difference. Any ideas why streaming is limited to only two of the three > nodes available? > > As an alternative to slow streaming I tried this: > > - install C* on a new node, stop the service and delete > /var/lib/cassandra/* > - rsync /etc/cassandra from old node to new node > - rsync /var/lib/cassandra from old node to new node > - stop C* on the old node > - rsync /var/lib/cassandra from old node to new node > - move the old node to a different IP > - move the new node to the old node's original IP > - start C* on the new node (no need for the replace_node option in > cassandra-env.sh) > > This technique has been successful so far for a demo cluster with fewer > data. The only disadvantage for us is that we were hoping that by streaming > the SSTables to the new node, tombstones would be discarded (freeing a lot > of disk space on our live cluster). This is exactly what happened for the > one node we streamed so far; unfortunately, the slow streaming generates a > lot of hints which makes recovery a very long process. > > Do you guys see any other problems with the rsync method that I've skipped? > > Regarding the tombstones issue (if we finally do what I described above), > I'm thinking sstablsplit. Then compaction should deal with it (I think). I > have not used sstablesplit in the past, so another thing I'd like to ask is > if you guys find this a good/bad idea for what I'm trying to do. > > Many thanks, > Vasilis > > On Mon, Sep 12, 2016 at 6:42 PM, Jeff Jirsa <jji...@apache.org> wrote: > >> >> >> On 2016-09-12 09:38 (-0700), daemeon reiydelle <daeme...@gmail.com> >> wrote: >> > Re. throughput. That looks slow for jumbo with 10g. Check your networks. >> > >> > >> >> It's extremely unlikely you'll be able to saturate a 10g link with a >> single instance cassandra. >> >> Faster Cassandra streaming is a work in progress - being able to send >> more than one file at a time is probably the most obvious area for >> improvement, and being able to better deal with the CPU / garbage generated >> on the receiving side is just behind that. You'll likely be able to stream >> 10-15 MB/s per sending server or cpu core, whichever is less (in a vnode >> setup, you'll be cpu bound - in a single-token setup, you'll be stream >> bound). >> >> >> > -- ———————— Ben Slater Chief Product Officer Instaclustr: Cassandra + Spark - Managed | Consulting | Support +61 437 929 798