Okay, that proves I was wrong on the client side bottleneck.

On 24/04/2024 17:55, Nathan Marz wrote:
I tried running two client processes in parallel and the numbers were unchanged. The max throughput is still a single client doing 10 in-flight BatchStatement containing 100 inserts.

On Tue, Apr 23, 2024 at 10:24 PM Bowen Song via user <user@cassandra.apache.org> wrote:

    You might have run into the bottleneck of the driver's IO thread.
    Try increase the driver's connections-per-server limit to 2 or 3
    if you've only got 1 server in the cluster. Or alternatively, run
    two client processes in parallel.


    On 24/04/2024 07:19, Nathan Marz wrote:
    Tried it again with one more client thread, and that had no
    effect on performance. This is unsurprising as there's only 2 CPU
    on this node and they were already at 100%. These were good
    ideas, but I'm still unable to even match the performance of
    batch commit mode with group commit mode.

    On Tue, Apr 23, 2024 at 12:46 PM Bowen Song via user
    <user@cassandra.apache.org> wrote:

        To achieve 10k loop iterations per second, each iteration
        must take 0.1 milliseconds or less. Considering that each
        iteration needs to lock and unlock the semaphore (two
        syscalls) and make network requests (more syscalls), that's a
        lots of context switches. It may a bit too much to ask for a
        single thread. I would suggest try multi-threading or
        multi-processing, and see if the combined insert rate is higher.

        I should also note that executeAsync() also has implicit
        limits on the number of in-flight requests, which default to
        1024 requests per connection and 1 connection per server. See
        
https://docs.datastax.com/en/developer/java-driver/4.17/manual/core/pooling/


        On 23/04/2024 23:18, Nathan Marz wrote:
        It's using the async API, so why would it need multiple
        threads? Using the exact same approach I'm able to get 38k /
        second with periodic commitlog_sync. For what it's worth, I
        do see 100% CPU utilization in every single one of these tests.

        On Tue, Apr 23, 2024 at 11:01 AM Bowen Song via user
        <user@cassandra.apache.org> wrote:

            Have you checked the thread CPU utilisation of the
            client side? You likely will need more than one thread
            to do insertion in a loop to achieve tens of thousands
            of inserts per second.


            On 23/04/2024 21:55, Nathan Marz wrote:
            Thanks for the explanation.

            I tried again with commitlog_sync_group_window at 2ms,
            concurrent_writes at 512, and doing 1000 individual
            inserts at a time with the same loop + semaphore
            approach. This only nets 9k / second.

            I got much higher throughput for the other modes with
            BatchStatement of 100 inserts rather than 100x more
            individual inserts.

            On Tue, Apr 23, 2024 at 10:45 AM Bowen Song via user
            <user@cassandra.apache.org> wrote:

                I suspect you are abusing batch statements. Batch
                statements should only be used where atomicity or
                isolation is needed. Using batch statements won't
                make inserting multiple partitions faster. In fact,
                it often will make that slower.

                Also, the liner relationship between
                commitlog_sync_group_window and write throughput is
                expected. That's because the max number of
                uncompleted writes is limited by the write
                concurrency, and a write is not considered
                "complete" before it is synced to disk when
                commitlog sync is in group or batch mode. That
                means within each interval, only limited number of
                writes can be done. The ways to increase that
                including: add more nodes, sync commitlog at
                shorter intervals and allow more concurrent writes.


                On 23/04/2024 20:43, Nathan Marz wrote:
                Thanks. I raised concurrent_writes to 128 and
                set commitlog_sync_group_window to 20ms. This
                causes a single execute of a BatchStatement
                containing 100 inserts to succeed. However, the
                throughput I'm seeing is atrocious.

                With these settings, I'm executing 10
                BatchStatement concurrently at a time using the
                semaphore + loop approach I showed in my first
                message. So as requests complete, more are sent
                out such that there are 10 in-flight at a time.
                Each BatchStatement has 100 individual inserts.
                I'm seeing only 730 inserts / second. Again, with
                periodic mode I see 38k / second and with batch I
                see 14k / second. My expectation was that group
                commit mode throughput would be somewhere between
                those two.

                If I set commitlog_sync_group_window to 100ms, the
                throughput drops to 14 / second.

                If I set commitlog_sync_group_window to 10ms, the
                throughput increases to 1587 / second.

                If I set commitlog_sync_group_window to 5ms, the
                throughput increases to 3200 / second.

                If I set commitlog_sync_group_window to 1ms, the
                throughput increases to 13k / second, which is
                slightly less than batch commit mode.

                Is group commit mode supposed to have better
                performance than batch mode?


                On Tue, Apr 23, 2024 at 8:46 AM Bowen Song via
                user <user@cassandra.apache.org> wrote:

                    The default commitlog_sync_group_window is
                    very long for SSDs. Try reduce it if you are
                    using SSD-backed storage for the commit log.
                    10-15 ms is a good starting point. You may
                    also want to increase the value of
                    concurrent_writes, consider at least double or
                    quadruple it from the default. You'll need
                    even higher write concurrency for longer
                    commitlog_sync_group_window.


                    On 23/04/2024 19:26, Nathan Marz wrote:
                    "batch" mode works fine. I'm having trouble
                    with "group" mode. The only config for that
                    is "commitlog_sync_group_window", and I have
                    that set to the default 1000ms.

                    On Tue, Apr 23, 2024 at 8:15 AM Bowen Song
                    via user <user@cassandra.apache.org> wrote:

                        Why would you want to set
                        commitlog_sync_batch_window to 1 second
                        long when commitlog_sync is set to batch
                        mode? The documentation
                        
<https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html>
                        on this says:

                            /This window should be kept short
                            because the writer threads will be
                            unable to do extra work while
                            waiting. You may need to increase
                            concurrent_writes for the same reason/

                        If you want to use batch mode, at least
                        ensure commitlog_sync_batch_window is
                        reasonably short. The default is 2
                        millisecond.


                        On 23/04/2024 18:32, Nathan Marz wrote:
                        I'm doing some benchmarking of Cassandra
                        on a single m6gd.large instance. It
                        works fine with periodic or batch
                        commitlog_sync options, but I'm having
                        tons of issues when I change it to
                        "group". I have
                        "commitlog_sync_group_window" set to 1000ms.

                        My client is doing writes like this
                        (pseudocode):

                        Semaphore sem = new Semaphore(numTickets);
                        while(true) {

                            sem.acquire();
                            session.executeAsync(insert.bind(genUUIDStr(),
                            genUUIDStr(), genUUIDStr())
                            .whenComplete((t, u) -> sem.release())

                        }

                        If I set numTickets higher than 20, I
                        get tons of timeout errors.

                        I've also tried doing single commands
                        with BatchStatement with many inserts at
                        a time, and that fails with timeout when
                        the batch size gets more than 20.

                        Increasing the write request timeout in
                        cassandra.yaml makes it time out at
                        slightly higher numbers of concurrent
                        requests.

                        With periodic I'm able to get about 38k
                        writes / second, and with batch I'm able
                        to get about 14k / second.

                        Any tips on what I should be doing to
                        get group commitlog_sync to work
                        properly? I didn't expect to have to do
                        anything other than change the config.

Reply via email to