Memtable really isn't involved here, each data file is copied over as-is
and turned into a new data file, it doesn't read into the memtable (though
it does deserialize and re-serialize, which temporarily has it in memory,
but isn't in the memtable itself).

You can cut down on the number of data files copied in by using fewer
vnodes, or by changing your compaction parameters (e.g. if you're using
LCS, change sstable size from 160M to something higher), but there's no
magic to join / compact those data files on the sending side before sending.


On Mon, Aug 3, 2020 at 4:15 AM onmstester onmstester
<onmstes...@zoho.com.invalid> wrote:

> IMHO (reading system.log) each streamed-in file from any node would be
> write down as a separate sstable to the disk and won't be wait in memtable
> until enough amount of memtable has been created inside memory, so there
> would be more compactions because of multiple small sstables. Is there any
> configuration in cassandra to force streamed-in to pass memtable-sstable
> cycle, to have bigger sstables at first place?
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ============ Forwarded message ============
> From: onmstester onmstester <onmstes...@zoho.com.INVALID>
> To: "user"<user@cassandra.apache.org>
> Date: Sun, 02 Aug 2020 08:35:30 +0430
> Subject: Re: streaming stuck on joining a node with TBs of data
> ============ Forwarded message ============
>
> Thanks Jeff,
>
> Already used netstats and it only shows that streaming from a single node
> remained and stuck and bunch of dropped messages, next time i will check
> tpstats too.
> Currently i stopped the joining/stucked node, make the auto_bootstrap
> false and started the node and its UN now, is this OK too?
>
> What about streaming tables one by one, any idea?
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ---- On Sat, 01 Aug 2020 21:44:09 +0430 *Jeff Jirsa <jji...@gmail.com
> <jji...@gmail.com>>* wrote ----
>
>
> Nodetool tpstats and netstats should give you a hint why it’s not joining
>
> If you don’t care about consistency and you just want it joined in its
> current form (which is likely strictly incorrect but I get it), “nodetool
> disablegossip && nodetool enablegossip” in rapid succession (must be less
> than 30 seconds in between commands) will PROBABLY change it from joining
> to normal (unclean, unsafe, do this at your own risk).
>
>
> On Jul 31, 2020, at 11:46 PM, onmstester onmstester <
> onmstes...@zoho.com.invalid> wrote:
>
> 
> No Secondary index, No SASI, No materialized view
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ---- On Sat, 01 Aug 2020 11:02:54 +0430 *Jeff Jirsa <jji...@gmail.com
> <jji...@gmail.com>>* wrote ----
>
> Are there secondary indices involved?
>
> On Jul 31, 2020, at 10:51 PM, onmstester onmstester <
> onmstes...@zoho.com.invalid> wrote:
>
> 
> Hi,
>
> I'm going to join multiple new nodes to already existed and running
> cluster. Each node should stream in >2TB of data, and it took a few days
> (with 500Mb streaming) to almost get finished. But it stuck on streaming-in
> from one final node, but i can not see any bottleneck on any side (source
> or destination node), the only problem is 400 pending compactions on
> joining node, which i disabled auto_compaction, but no improvement.
>
> 1. How can i safely stop streaming/joining the new node and make it UN,
> then run repair on the node?
> 2. On bootstrap a new node, multiple tables would be streamed-in
> simultaneously and i think that this would increase number of compactions
> in compare with a scenario that "the joining node first stream-in one table
> then switch to another one and etc". Am i right and this would decrease
> compactions? If so, is there a config or hack in cassandra to force that?
>
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>
>
>
>
>
>
>
>

Reply via email to