Re: Large number of tiny sstables flushed constantly

2021-08-17 Thread Jiayong Sun
Thank you so much Bowen for your advice on this. Really appreciate it! Thanks,Jiayong Sun On Monday, August 16, 2021, 11:56:39 AM PDT, Bowen Song wrote: Hi Jiayong, You will need to reduce the num_tokens on all existing nodes in the cluster in order to "fix" the repair. Only

Re: Large number of tiny sstables flushed constantly

2021-08-16 Thread Jiayong Sun
Hi Bowen, > "how many tables are being written frequently" - there are normally about >less then 10 tables being written concurrently.Regarding the "num_token", >unfortunately we can change it now since it would require rebuilding all >rings/nodes of the cluster. Actually we used to add a ring

Re: Large number of tiny sstables flushed constantly

2021-08-16 Thread Bowen Song
Hi Jiayong, You will need to reduce the num_tokens on all existing nodes in the cluster in order to "fix" the repair. Only adding new DCs with lower num_tokens value is not going to solve the problem for you. In practice, you have two ways to reduce it on all existing nodes. You can either

Re: Large number of tiny sstables flushed constantly

2021-08-16 Thread Bowen Song
Hello Jiayong, />//There is only one major table taking 90% of writes. / In your case, what matters isn't the 90% writes, but the remaining 10%. If the remaining 10% writes are spread over 100 different tables, your Cassandra node will end up flushing over 600 times per minute. Therefore,

Re: Large number of tiny sstables flushed constantly

2021-08-16 Thread Jiayong Sun
Hi Bowen, There is only one major table taking 90% of writes. I will try to increase the  "commitlog_segment_size_in_mb" value to 1 GB and set "max_mutation_size_in_kb" to 16MB. Currently we don't set the "commitlog_total_space_in_mb" value so it should be using default of 8192 MB (8 GB). What

Re: Large number of tiny sstables flushed constantly

2021-08-15 Thread Bowen Song
Hi Jiayong, Based on this statement: /> //We see the commit logs switched about 10 times per minutes/ I'd also like to know, roughly speaking, how many tables in Cassandra are being written frequently? I'm asking this because the commit log segments are being created (and recycled) so

Re: Large number of tiny sstables flushed constantly

2021-08-14 Thread Jiayong Sun
Hi Bowen, Thanks for digging into source code so deep.Here are answers to your questions: - Does your application changes the table schema frequently? Specifically: alter table, drop table and drop keyspace. - No, either admin or apps doesn't frequently alter/drop/create table schema in

Re: Large number of tiny sstables flushed constantly

2021-08-13 Thread Jiayong Sun
Hi Jeff, The cluster was build with 2.1 and has been upgraded to 3.11. It's using "num_token: 64"  and RF=3 for all keyspaces in all DCs.There are over 100 nodes in each of 6 rings.I don't see any messages of the repair/steaming in system.log so this shouldn't be related with the repair.I can

Re: Large number of tiny sstables flushed constantly

2021-08-13 Thread Jiayong Sun
Hi Bowen, There are many nodes having this issue and some of them repeatedly having it.Replacing a node by wiping out everything and streaming in good shape of sstables would work, but if we don't know the root cause the node would be in the bad shape again.Yes, we know the reaper repair

Re: Large number of tiny sstables flushed constantly

2021-08-13 Thread Jiayong Sun
Hi Bowen, We do have reaper repair job scheduled periodically and it can take days even weeks to complete one round of repair due to large number of rings/nodes. However, we have paused the repair since we are facing this issue.We do not use the MV in this cluster.There is major table taking

Re: Large number of tiny sstables flushed constantly

2021-08-13 Thread Bowen Song
Hi Jiayong, I'm sorry to hear that. I did not know many nodes were/are experiencing the same issue. A bit of dig in the source code indicates the log below comes from the ColumnFamilyStore.logFlush() method. DEBUG [NativePoolCleaner] ColumnFamilyStore.java:932 - Enqueuing flush of

Re: Large number of tiny sstables flushed constantly

2021-08-13 Thread Jeff Jirsa
A very large cluster using vnodes will cause lots of small sstables to stream in during repair if the cluster is out of sync. This is one of the reasons that the default number of vnodes was decreased in 4.0. How many nodes in the cluster, how many DCs, how many vnodes per node, and how many

Re: Large number of tiny sstables flushed constantly

2021-08-13 Thread Bowen Song
Hi Jiayong, That doesn't really match the situation described in the SO question. I suspected it was related to repairing a table with MV and large partitions, but based on the information you've given, I was clearly wrong. A few hundreds MB partitions is not exactly unusual, I don't see

Re: Large number of tiny sstables flushed constantly

2021-08-13 Thread Bowen Song
Hi Jiayong, Sorry I didn't make it clear in my previous email. When I commented on the RAID0 setup, it was only a comment on the RAID0 setup vs JBOD, and that was not in relation to the SSTable flushing issue. The part of my previous email after the "On the frequent SSTable flush issue" line

Re: Large number of tiny sstables flushed constantly

2021-08-12 Thread Jiayong Sun
Hello Bowen, Thanks for your response.Yes, we are aware of the theory that RAID0 vs individual JBOD, but all of our clusters are using this RAID0 configuration through Azure, while only on this cluster we see this issue so it's hardly to conclude root cause to the disk. This is more like

Re: Large number of tiny sstables flushed constantly

2021-08-12 Thread Bowen Song
Hello Jiayong, Using multiple disks in a RAID0 for Cassandra data directory is not recommended. You will get better fault tolerance and often better performance too with multiple data directories, one on each disk. If you stick with RAID0, it's not 4 disks, it's 1 from Cassandra's point of

Re: Large number of tiny sstables flushed constantly

2021-08-11 Thread Jiayong Sun
Hi Erick, The nodes have 4 SSD (1TB for each but we only use 2.4TB of space. Current disk usage is about 50%) with RAID0. Based on number of disks we increased  memtable_flush_writers: 4 instead of default of 2. For the following we set:- max heap size - 31GB- memtable_heap_space_in_mb (use

Re: Large number of tiny sstables flushed constantly

2021-08-11 Thread Jiayong Sun
Hi Erick, Thanks for your response.Actually we did not set the memtable_cleanup_threshold in the cassandra.yaml.However, we have memtable_flush_writers: 4 defined in the yaml, and VM node has 16-core.Any advice for this param's value? Thanks again.Jiayong SunOn Tuesday, August 10, 2021,

Re: Large number of tiny sstables flushed constantly

2021-08-11 Thread Erick Ramirez
4 flush writers isn't bad since the default is 2. It doesn't make a difference if you have fast disks (like NVMe SSDs) because only 1 thread gets used. But if flushes are slow, the work gets distributed to 4 flush writers so you end up with smaller flush sizes although it's difficult to tell how

Re: Large number of tiny sstables flushed constantly

2021-08-10 Thread Erick Ramirez
Is it possible that you've got memtable_cleanup_threshold set in cassandra.yaml with a low value? It's been deprecated in C* 3.10 ( CASSANDRA-12228 ). If you do have it configured, I'd recommend removing it completely and restart C* when you

Large number of tiny sstables flushed constantly

2021-08-10 Thread Jiayong Sun
Hello, everyone, We have a large cluster with the following info:Cassandra version: 3.11.6Multi-DC and 100 nodes per DC. We recently have seen many nodes with hundreds of thousands tiny sstables flushed to disk constantly. We can see the following messages in debug.log:DEBUG [NativePoolCleaner]