It's using the async API, so why would it need multiple threads? Using the
exact same approach I'm able to get 38k / second with periodic
commitlog_sync. For what it's worth, I do see 100% CPU utilization in every
single one of these tests.

On Tue, Apr 23, 2024 at 11:01 AM Bowen Song via user <> wrote:

> Have you checked the thread CPU utilisation of the client side? You likely
> will need more than one thread to do insertion in a loop to achieve tens of
> thousands of inserts per second.
> On 23/04/2024 21:55, Nathan Marz wrote:
> Thanks for the explanation.
> I tried again with commitlog_sync_group_window at 2ms, concurrent_writes
> at 512, and doing 1000 individual inserts at a time with the same loop +
> semaphore approach. This only nets 9k / second.
> I got much higher throughput for the other modes with BatchStatement of
> 100 inserts rather than 100x more individual inserts.
> On Tue, Apr 23, 2024 at 10:45 AM Bowen Song via user <
>> wrote:
>> I suspect you are abusing batch statements. Batch statements should only
>> be used where atomicity or isolation is needed. Using batch statements
>> won't make inserting multiple partitions faster. In fact, it often will
>> make that slower.
>> Also, the liner relationship between commitlog_sync_group_window and
>> write throughput is expected. That's because the max number of uncompleted
>> writes is limited by the write concurrency, and a write is not considered
>> "complete" before it is synced to disk when commitlog sync is in group or
>> batch mode. That means within each interval, only limited number of writes
>> can be done. The ways to increase that including: add more nodes, sync
>> commitlog at shorter intervals and allow more concurrent writes.
>> On 23/04/2024 20:43, Nathan Marz wrote:
>> Thanks. I raised concurrent_writes to 128 and
>> set commitlog_sync_group_window to 20ms. This causes a single execute of a
>> BatchStatement containing 100 inserts to succeed. However, the throughput
>> I'm seeing is atrocious.
>> With these settings, I'm executing 10 BatchStatement concurrently at a
>> time using the semaphore + loop approach I showed in my first message. So
>> as requests complete, more are sent out such that there are 10 in-flight at
>> a time. Each BatchStatement has 100 individual inserts. I'm seeing only 730
>> inserts / second. Again, with periodic mode I see 38k / second and with
>> batch I see 14k / second. My expectation was that group commit mode
>> throughput would be somewhere between those two.
>> If I set commitlog_sync_group_window to 100ms, the throughput drops to 14
>> / second.
>> If I set commitlog_sync_group_window to 10ms, the throughput increases to
>> 1587 / second.
>> If I set commitlog_sync_group_window to 5ms, the throughput increases to
>> 3200 / second.
>> If I set commitlog_sync_group_window to 1ms, the throughput increases to
>> 13k / second, which is slightly less than batch commit mode.
>> Is group commit mode supposed to have better performance than batch mode?
>> On Tue, Apr 23, 2024 at 8:46 AM Bowen Song via user <
>>> wrote:
>>> The default commitlog_sync_group_window is very long for SSDs. Try
>>> reduce it if you are using SSD-backed storage for the commit log. 10-15 ms
>>> is a good starting point. You may also want to increase the value of
>>> concurrent_writes, consider at least double or quadruple it from the
>>> default. You'll need even higher write concurrency for longer
>>> commitlog_sync_group_window.
>>> On 23/04/2024 19:26, Nathan Marz wrote:
>>> "batch" mode works fine. I'm having trouble with "group" mode. The only
>>> config for that is "commitlog_sync_group_window", and I have that set to
>>> the default 1000ms.
>>> On Tue, Apr 23, 2024 at 8:15 AM Bowen Song via user <
>>>> wrote:
>>>> Why would you want to set commitlog_sync_batch_window to 1 second long
>>>> when commitlog_sync is set to batch mode? The documentation
>>>> <>
>>>> on this says:
>>>> *This window should be kept short because the writer threads will be
>>>> unable to do extra work while waiting. You may need to increase
>>>> concurrent_writes for the same reason*
>>>> If you want to use batch mode, at least ensure
>>>> commitlog_sync_batch_window is reasonably short. The default is 2
>>>> millisecond.
>>>> On 23/04/2024 18:32, Nathan Marz wrote:
>>>> I'm doing some benchmarking of Cassandra on a single m6gd.large
>>>> instance. It works fine with periodic or batch commitlog_sync options, but
>>>> I'm having tons of issues when I change it to "group". I have
>>>> "commitlog_sync_group_window" set to 1000ms.
>>>> My client is doing writes like this (pseudocode):
>>>> Semaphore sem = new Semaphore(numTickets);
>>>> while(true) {
>>>> sem.acquire();
>>>> session.executeAsync(insert.bind(genUUIDStr(), genUUIDStr(),
>>>> genUUIDStr())
>>>>             .whenComplete((t, u) -> sem.release())
>>>> }
>>>> If I set numTickets higher than 20, I get tons of timeout errors.
>>>> I've also tried doing single commands with BatchStatement with many
>>>> inserts at a time, and that fails with timeout when the batch size gets
>>>> more than 20.
>>>> Increasing the write request timeout in cassandra.yaml makes it time
>>>> out at slightly higher numbers of concurrent requests.
>>>> With periodic I'm able to get about 38k writes / second, and with batch
>>>> I'm able to get about 14k / second.
>>>> Any tips on what I should be doing to get group commitlog_sync to work
>>>> properly? I didn't expect to have to do anything other than change the
>>>> config.

Reply via email to