Hi again,

Two more thoughts with respect to my question:
- I have configured all 3 nodes to act as seeds but I don't think this
affects write performance.
- The hints_directory and the saved_caches_directory use the same drive as
the commitlog_directory. The data is in the other 7 drives as I explained
earlier. Could the saved_cached, specially because of the counters, have a
meaningful impact on the write performance?
- If more nodes are needed for whatever reason, would a layer of
virtualization on top of each machine help. Each virtual machine will have
assigned dedicated drives (there are plenty of them) and only share the CPU
and RAM.

The only bottleneck in the writes as far as I understand it is the commit
log. Shall I create RAID0 (for speed) or install an SSD just for the


On Fri, Mar 2, 2018 at 12:21 PM, Javier Pareja

> Hello everyone,
> I have configured a Cassandra cluster with 3 nodes, however I am not
> getting the write speed that I was expecting. I have tested against a
> counter table because it is the bottleneck of the system.
> So with the system iddle I run the attached sample code (very simple async
> writes with a throttle) against an schema with RF=2 and a table with
> SizeTieredCompactationStrategy.
> The speeds that I get are around 65k updates-writes/second and I was
> hoping for at least 150k updates-writes/second. Even if I run the test in
> 2 machines in parallel, the execution is 35k updates-writes/second in
> each. I have executed the test in the nodes themselves (1 and 2 of the 3
> nodes).
> The nodes are fairly powerful. Each has the following configuration
> running Cassandra 3.11.1
> - RAM: 256GB
> - HDD Disks: 9 (7 configured for cassandra data, 1 for the OS and 1
> configured for cassandra commits)
> - CPU: 8 processors with hyperthreading => 16 processors
> The RAM, CPU and HDDs are far from being maxed out when running the tests.
> The test command line class uses two parameters: max executions and
> parallelism. Parallelism is the max number of AsyncExecutions running in
> parallel. Any other execution will have to wait for available slots.
> I tried increasing the parallelism (64, 128, 256...) but the results are
> the same, 128 seems enough.
> Table definition:
> CREATE TABLE counttest (
>    key_column bigint,
>    cluster_column int,
>    count1_column counter,
>    count2_column counter,
>    count3_column counter,
>    count4_column counter,
>    count5_column counter,
>    PRIMARY KEY ((key_column),cluster_column)
> );
> Write test data generation (from the class attached). Each insert is
> prepared with uniform random values from below:
>             long key_column = getRandom(0, 5000000);
>             int cluster_column = getRandom(0, 4096);
>             long count1_column = getRandom(0, 10);
>             long count2_column = getRandom(0, 10);
>             long count3_column = getRandom(0, 10);
>             long count4_column = getRandom(0, 10);
>             long count5_column = getRandom(0, 10);
> *I suspect that we took the wrong approach when designing the hardware:
> Should we have used more nodes and less drives per node? If this is the
> case, I am trying to understand why or if there is any change that we could
> do to the configuration (other than getting more nodes) to improve that.*
> Will an SSD dedicated for the commit log improve things dramatically?
> Best Regards,
> Javier

