Hello everyone, I have configured a Cassandra cluster with 3 nodes, however I am not getting the write speed that I was expecting. I have tested against a counter table because it is the bottleneck of the system. So with the system iddle I run the attached sample code (very simple async writes with a throttle) against an schema with RF=2 and a table with SizeTieredCompactationStrategy.
The speeds that I get are around 65k updates-writes/second and I was hoping for at least 150k updates-writes/second. Even if I run the test in 2 machines in parallel, the execution is 35k updates-writes/second in each. I have executed the test in the nodes themselves (1 and 2 of the 3 nodes). The nodes are fairly powerful. Each has the following configuration running Cassandra 3.11.1 - RAM: 256GB - HDD Disks: 9 (7 configured for cassandra data, 1 for the OS and 1 configured for cassandra commits) - CPU: 8 processors with hyperthreading => 16 processors The RAM, CPU and HDDs are far from being maxed out when running the tests. The test command line class uses two parameters: max executions and parallelism. Parallelism is the max number of AsyncExecutions running in parallel. Any other execution will have to wait for available slots. I tried increasing the parallelism (64, 128, 256...) but the results are the same, 128 seems enough. Table definition: CREATE TABLE counttest ( key_column bigint, cluster_column int, count1_column counter, count2_column counter, count3_column counter, count4_column counter, count5_column counter, PRIMARY KEY ((key_column),cluster_column) ); Write test data generation (from the class attached). Each insert is prepared with uniform random values from below: long key_column = getRandom(0, 5000000); int cluster_column = getRandom(0, 4096); long count1_column = getRandom(0, 10); long count2_column = getRandom(0, 10); long count3_column = getRandom(0, 10); long count4_column = getRandom(0, 10); long count5_column = getRandom(0, 10); *I suspect that we took the wrong approach when designing the hardware: Should we have used more nodes and less drives per node? If this is the case, I am trying to understand why or if there is any change that we could do to the configuration (other than getting more nodes) to improve that.* Will an SSD dedicated for the commit log improve things dramatically? Best Regards, Javier
Description: Binary data
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org