Re: New node overstreaming data ?

Ben Bromhead Thu, 13 Oct 2016 12:23:29 -0700

Over streaming is pretty common, especially with vnodes in 2.x. When
Cassandra streams data to a bootstrapping node it sends the entire SSTable
that contains the data the new node requires even if that table might only
have 1 row out of thousands. This can be exacerbated by STCS with large
SSTables.

Generally reducing the network streaming throughput and increasing
concurrent_compactors (and un-throttling compaction throughput) is the way
to go. If you are in the cloud (e.g. AWS) you can also attach large block
store volumes (EBS) to the instance to act as overflow.

There is some ongoing work in Cassandra 3.x that will make streaming more
efficient and allow Cassandra to only stream portions of the SSTable that
are required.

On Thu, 13 Oct 2016 at 07:57 Anubhav Kale <anubhav.k...@microsoft.com>
wrote:

> Hello,
>
>
>
> We run 2.1.13 and seeing an odd issue. A node went down, and stayed down
> for a while so it went out of gossip. When we try to bootstrap it again (as
> a new node), it overstreams from other nodes, eventually disk becomes full
> and crashes. This repeated 3 times.
>
>
>
> Does anyone have any insights on what to try next (both in terms of root
> causing, and working around) ? To work around, we tried increasing
> #compactors and reducing stream throughput so that at least incoming
> #SSTables would be controlled.
>
>
>
> This has happened to us few times in the past too, so I am wondering if
> this is a known problem (I couldn’t find any JIRAs).
>
>
>
> Thanks !
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Re: New node overstreaming data ?

Reply via email to