[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846725#comment-17846725 ] Stefan Miklosovic commented on CASSANDRA-19429: --- I think that because this is targetted primarily to 4.0 / 4.1 and all eyes are on 5.0 for now, I do not see this has high priority at the moment, especially as we are just releasing 4.1.5 / 4.0.13. I would however say that you can expect this to be eventually merged, I just do not know about when. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846721#comment-17846721 ] Dipietro Salvatore commented on CASSANDRA-19429: [~smiklosovic] by any chance, do you have any update regarding this? What are the following steps? Thanks > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844256#comment-17844256 ] Stefan Miklosovic commented on CASSANDRA-19429: --- To summarize the changes I did, I basically made sure that getCapacity() is not used and we read size of caches from DatabaseDescriptor instead. Doing that required a little bit more changes in DD, it is currently quite hairy to resolve sizes and use getters / setters, there is another set of variables in DatabaseDescriptor we do not need and everything goes through Config instead. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844205#comment-17844205 ] Stefan Miklosovic commented on CASSANDRA-19429: --- 4.0 looks reasonable OK too, failing test_pkey_requirement is the fault of recently merged CASSANDRA-19341 [CASSANDRA-19429-4.0|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19429-4.0] {noformat} java11_pre-commit_tests java11_separate_tests java8_pre-commit_tests ✓ j8_build 1m 33s ✓ j8_cqlsh-dtests-py2-no-vnodes5m 17s ✓ j8_cqlsh-dtests-py2-with-vnodes 6m 34s ✓ j8_cqlsh_dtests_py3 6m 55s ✓ j8_cqlsh_dtests_py3118m 10s ✓ j8_cqlsh_dtests_py311_vnode 6m 49s ✓ j8_cqlsh_dtests_py38 6m 12s ✓ j8_cqlsh_dtests_py38_vnode 5m 31s ✓ j8_cqlsh_dtests_py3_vnode5m 11s ✓ j8_cqlshlib_tests 9m 6s ✓ j8_jvm_dtests14m 9s ✓ j11_cqlsh_dtests_py3_vnode 5m 25s ✓ j11_cqlsh_dtests_py38_vnode 5m 33s ✓ j11_cqlsh_dtests_py385m 34s ✓ j11_cqlsh_dtests_py311_vnode 5m 33s ✓ j11_cqlsh_dtests_py311 5m 45s ✓ j11_cqlsh-dtests-py2-with-vnodes 5m 37s ✓ j11_cqlsh-dtests-py2-no-vnodes 5m 38s ✕ j8_dtests 31m 17s json_test.TestJsonFullRowInsertSelect test_pkey_requirement ✕ j8_dtests_vnode 36m 54s json_test.TestJsonFullRowInsertSelect test_pkey_requirement ✕ j8_unit_tests 10m 45s org.apache.cassandra.cql3.MemtableSizeTest testTruncationReleasesLogSpace ✕ j8_utests_system_keyspace_directory 9m 40s org.apache.cassandra.cql3.BatchTest testNonCounterInLoggedBatch org.apache.cassandra.cql3.MemtableSizeTest testTruncationReleasesLogSpace ✕ j11_unit_tests7m 8s org.apache.cassandra.cql3.MemtableSizeTest testTruncationReleasesLogSpace ✕ j11_dtests_vnode43m 44s bootstrap_test.TestBootstrap test_bootstrap_binary_disabled json_test.TestJsonFullRowInsertSelect test_pkey_requirement ✕ j11_dtests 31m 20s json_test.TestJsonFullRowInsertSelect test_pkey_requirement ✕ j11_cqlsh_dtests_py3 5m 21s compression_test.TestCompression test_compression_cql_options java8_separate_tests {noformat} [java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4287/workflows/62e98e6f-2687-4ba0-943e-17b990ad5737] [java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4287/workflows/946942f4-66a9-4aa2-aed7-581dda35f814] [java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4287/workflows/151e79a6-1494-40c6-9c22-3ac71cf6161b] [java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4287/workflows/3db92b5a-3faf-440d-9b31-c40e56fb0778] > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843813#comment-17843813 ] Stefan Miklosovic commented on CASSANDRA-19429: --- I think that size and capacity are trully two different concepts as already explained by Brandon and myself and I believe that having capacity set to 0 practically disables a respective cache. CI for 4.1 looks good with the below patch. I identified couple more places where we could use reading the capacity from DatabaseDescriptor rather than calling getCapacity() on the cache to see if it is bigger than zero to proceed. One just needs to be cautious to not forget to update DatabaseDescriptor when capacity is set to zero, e.g. from JMX etc if we are going to use DD as the source of truth whether a cache is enabled or not. For other branches, this ticket, as I write this, specifies only 4.0 and 4.1 ax fix versions but that should be extended up to trunk. This is all being said if there is an appetite to do this across all versions instead of just fixing the trace messages as suggested by Jon. The bellow build tells me that we are on something here and it would be just a matter of applying this elsewhere. [CASSANDRA-19429-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19429-4.1] {noformat} java11_pre-commit_tests java11_separate_tests java8_pre-commit_tests ✓ j8_build 6m 53s ✓ j8_cqlsh_dtests_py3 6m 53s ✓ j8_cqlsh_dtests_py3118m 26s ✓ j8_cqlsh_dtests_py311_vnode 6m 53s ✓ j8_cqlsh_dtests_py38 8m 3s ✓ j8_cqlsh_dtests_py38_vnode 7m 40s ✓ j8_cqlsh_dtests_py3_vnode8m 16s ✓ j8_cqlshlib_cython_tests13m 32s ✓ j8_cqlshlib_tests 11m 44s ✓ j8_dtests 33m 23s ✓ j8_dtests_vnode 38m 26s ✓ j8_jvm_dtests 19m 30s ✓ j8_jvm_dtests_vnode 12m 23s ✓ j8_simulator_dtests 5m 51s ✓ j11_jvm_dtests_vnode11m 50s ✓ j11_jvm_dtests 19m 47s ✓ j11_dtests_vnode 35m 6s ✓ j11_dtests 34m 46s ✓ j11_cqlshlib_tests 6m 35s ✓ j11_cqlshlib_cython_tests9m 27s ✓ j11_cqlsh_dtests_py3_vnode 5m 47s ✓ j11_cqlsh_dtests_py38_vnode 6m 7s ✓ j11_cqlsh_dtests_py385m 41s ✓ j11_cqlsh_dtests_py311_vnode 5m 50s ✓ j11_cqlsh_dtests_py311 5m 56s ✓ j11_cqlsh_dtests_py3 5m 38s ✕ j8_unit_tests9m 53s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] org.apache.cassandra.net.ConnectionTest testMessageDeliveryOnReconnect ✕ j8_utests_system_keyspace_directory 11m 5s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] ✕ j11_unit_tests 8m 12s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] java8_separate_tests {noformat} [java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/7470a92f-1cb3-487d-9d0a-dd1f781a79e8] [java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/2116e61d-1e6a-4589-85e7-1980acfbdb05] [java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/30501a49-c2b0-4aaf-a504-f087f27e88f7] [java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/232134f1-5956-4346-afa3-bd556b6c5f60] > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843589#comment-17843589 ] Jon Haddad commented on CASSANDRA-19429: {quote}there is 63 instances in the production codebase where we do not check tracing level and just log. We should take more holistic approach here and just fix this everywhere which would be probably better suited for a new ticket. {quote} SGTM {quote}Anyway, I think we should still deliver this with the changes OP suggested (and myself improved on top of that). This seems like a fairly innocent change and I do not see where it might go wrong. We should double check that our understanding of getSize vs getCapacity is correct though. {quote} Yeah, I agree with this train of thought, I just don't know if they're equivalent. My point was more that if we're unsure, simply merging in the check on the trace would deliver the same exact performance benefit. If you want to verify the correctness aspect, that SGTM, but since I don't have time to do it myself I would look for someone else to approve the patch from a correctness perspective. A +1 on just a trace check is a no brainer and I'd be fine doing that. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843588#comment-17843588 ] Stefan Miklosovic commented on CASSANDRA-19429: --- [~rustyrazorblade] there is 63 instances in the production codebase where we do not check tracing level and just log. We should take more holistic approach here and just fix this everywhere which would be probably better suited of a new ticket. Anyway, I think we should still deliver this with the changes OP suggested (and myself improved on top of that). This seems like a fairy innocent change and I do not see where it might go wrong. We should double check that our understanding of getSize vs getCapacity is correct though. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843527#comment-17843527 ] Jon Haddad commented on CASSANDRA-19429: I took a quick look at the patch, and while I can't say offhand if size() and getCapacity() are equivalent, I agree with [~smiklosovic] about adding a check if trace logging is enabled before logging the trace. Regardless of the difference between size() and getCapacity(), adding the check alone avoids the lookup in almost all cases and will deliver a performance win if there's any to be had. I can't think of any reason we can't merge that. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839865#comment-17839865 ] Dipietro Salvatore commented on CASSANDRA-19429: [~smiklosovic] We have tested the patch also on smaller instances( > 4xl) using the same instructions I have shared above. Here the results: |*Instance type*| *4.1.3 (on JDK11)*|*4.1.3 with patch* {*}(on JDK11){*}{*}{*}| |r7i.4xlarge|91k/ops 4.9ms@P99|129k/ops 2.4ms@P99 (1.4x)| |r7i.8xlarge|139k/ops 2.5ms@P99|201k/ops 1.3ms@P99 (1.44x)| |r7i.16xlarge|152k/ops 2.2ms@P99|277k/ops 0.8ms@P99 (1.8x)| Starting from 4xl instances we saw a 40% increase in performance. Can you please try to test it on your side? Otherwise can you let us know how we can help to move forward with it (provide instances to test this, run some specific benchmarks, more validation tests, etc.)? Thanks > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837784#comment-17837784 ] Dipietro Salvatore commented on CASSANDRA-19429: [~smiklosovic] What if we provide the instance for few days? Would you be able to test this? > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837621#comment-17837621 ] Stefan Miklosovic commented on CASSANDRA-19429: --- I asked internally our techops to test this and we do not have such a big instances we run Cassandra on regularly so it is of a low interest to deal with this for now. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836522#comment-17836522 ] Stefan Miklosovic commented on CASSANDRA-19429: --- Sorry for the delay, I still plan to take a look. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835883#comment-17835883 ] Jon Haddad commented on CASSANDRA-19429: Unfortunately, I won't have time to look at this anytime soon, I'm pretty overwhelmed with work. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835873#comment-17835873 ] Dipietro Salvatore commented on CASSANDRA-19429: [~smiklosovic] did you got any chance to check more on this issue? [~rustyrazorblade] Will you have some time this or next week to test it on such instances? I can help you provisioning them > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830582#comment-17830582 ] Dipietro Salvatore commented on CASSANDRA-19429: [~rustyrazorblade] sure, let me know when you will be able to work on it so that I can provide that > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830297#comment-17830297 ] Jon Haddad commented on CASSANDRA-19429: Thanks, I appreciate the offer. I won't have time to look at this in the next several days, but can probably look in early April. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829257#comment-17829257 ] Dipietro Salvatore commented on CASSANDRA-19429: [~rustyrazorblade] I understand. Would it help if I can provide a r7g.16xlarge instance for a couple of days to you to run your test? If so, I would just need your public SSH key that I can add to the instance to allow you to access to it > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828548#comment-17828548 ] Stefan Miklosovic commented on CASSANDRA-19429: --- I will try that sometimes next week. I am very curious to test it on more powerful machines. Thank you [~rustyrazorblade] for your willingness to test this, very much appreciated. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828547#comment-17828547 ] Jon Haddad commented on CASSANDRA-19429: Unfortunately I don't have any other information since I tore down the environment. I'm not backed by a company, I have to pay for these tests out of my own pocket, and on my own time, so my test was extremely limited. It certainly looks like the patch runs better non-patched based on the information you've shared. It's unfortunate that I did not see the same results you did. I'd love to figure out why we're seeing different results, but for the reasons I've mentioned, it's not something I'll be able to contribute to. It may be that the issue only reveals itself in only in very specific type of test, or it could have been related to the number of SSTables I had, or one of the OS flags I set. I honestly don't know. Unfortunately figuring this one out for me is a lower priority than a lot of other things on my plate. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828541#comment-17828541 ] Dipietro Salvatore commented on CASSANDRA-19429: [~rustyrazorblade] Can you provide more info on how you have test it? What commands did you run to load the DB and benchmark it? What throughput did you register? I have tried to reproduce it following your settings but I was not able to see the same results and, in my case, I still see 1.5x improvement on c5.18xl with the [~smiklosovic] patch. I have also tried to test it without all the OS and Cassandra optimizations you provided, and the results are similar. Full commands list used below: {code:java} # From fresh AWS Ubuntu 22.04 instance # Basic upgrade of packages and install few tools sudo apt update && sudo apt upgrade -y sudo apt install -y unzip tmux ## JAVA 11 OpenJDKs sudo apt install -y openjdk-11-jdk ant java --version # Download Cassandra cd wget https://github.com/apache/cassandra/archive/refs/tags/cassandra-4.1.3.zip && unzip cassandra-4.1.3.zip && cd ~/cassandra-cassandra-4.1.3/ # Patched # git clone https://github.com/instaclustr/cassandra.git cassandra-instaclustr && cd cassandra-instaclustr && git checkout CASSANDRA-19429-4.1 && git log -10 --oneline # Start DB CASSANDRA_USE_JDK11=true ant realclean && \ CASSANDRA_USE_JDK11=true ant jar && \ CASSANDRA_USE_JDK11=true ant stress-build && \ rm -rf data && \ bin/cassandra -f -R # Start workload bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ bin/nodetool clearsnapshot --all && \ tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && \ bin/nodetool compact keyspace1 && \ sleep 30s && \ tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html |& tee stress.txt {code} The difference in performance it is also clear on the graph.html report at the end of the execution. After around 300sec into the benchmark, the CPU utilization drops to 30% and so the performance while, with the patch, it remains much more constant through the entire experiment with a avg cpu utilization of 75% and 5x lower P99 latency. !Screenshot 2024-03-19 at 15.22.50.png! > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825755#comment-17825755 ] Dipietro Salvatore commented on CASSANDRA-19429: thanks, I will try to reproduce it this week > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825465#comment-17825465 ] Jon Haddad commented on CASSANDRA-19429: I'm using the latest 4.1 and the branch off 4.1 I found above. Cassandra 4.1 ships configured to use ParNew + CMS, but there's not really a good reason to use that. We changed to G1 in 5.0, it's far superior to ParNew / CMS for general usage. Both versions were set up using G1. I built my clusters using easy-cass-lab (formerly tlp-cluster). Here's the cassandra.patch.yaml I used along with the JMX options and a bunch of other settings. You can see the workloads I ran and my flame graphs in a previous comment. I ran these tests on the same node, with the same exact dataset. In fact, I ran many tests, switching back and forth between 4.1 and the patched branch in order to try to find some way of replicating what you found, however I was unable to. {noformat} --- cluster_name: "Test Cluster" num_tokens: 4 seed_provider: class_name: "org.apache.cassandra.locator.SimpleSeedProvider" parameters: seeds: "172.31.42.182" hints_directory: "/mnt/cassandra/hints" data_file_directories: - "/mnt/cassandra/data" commitlog_directory: "/mnt/cassandra/commitlog" concurrent_reads: 128 concurrent_writes: 64 trickle_fsync: true endpoint_snitch: "Ec2Snitch" memtable_heap_space: 16MiB{noformat} {noformat} ### G1 Settings ## Use the Hotspot garbage-first collector. -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxTenuringThreshold=4 -XX:G1HeapRegionSize=16m -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=50 -XX:G1MaxNewSizePercent=70-Xmx24G -Xms24G{noformat} {noformat} sudo sysctl kernel.perf_event_paranoid=1 sudo sysctl kernel.kptr_restrict=0 echo 0 > /proc/sys/vm/zone_reclaim_mode{noformat} {noformat} # /etc/security/limits.d/cassandra.conf cassandra soft memlock unlimited cassandra hard memlock unlimited cassandra soft nofile 10 cassandra hard nofile 10 cassandra soft nproc 32768 cassandra hard nproc 32768 cassandra - as unlimited{noformat} {noformat} # /etc/sysctl.d/60-cassandra.conf vm.max_map_count = 1048575{noformat} {noformat} sudo swapoff --allsudo sysctl -p{noformat} > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825442#comment-17825442 ] Dipietro Salvatore commented on CASSANDRA-19429: > I'm working with a single node, with compaction disabled, and I reduced my > memtable space to 16MB in order to constantly flush. I wrote 10m rows and > have 1928 SStables. These boxes have 72 CPU. I'm using G1GC with a 24GB > heap. I've tested concurrent_reads at 64 and 128 since there's enough cores > on here to handle and we don't need to bottleneck on reads. [~rustyrazorblade] Can you provided full detailed and step-by-step instructions on how you tested? So I will try to reproduce as well. BTW which Cassandra version did you use? AFAIK, Cassandra 4 uses UseConcMarkSweepGC and not G1GC. Am I right? > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824889#comment-17824889 ] Jon Haddad commented on CASSANDRA-19429: Unfortunately I don't have the time or a budget to do anymore testing on this. I'm still not sure why [~dipiets] was unable to push Cassandra past 50% CPU. I'm seeing 72 CPUs at 90%+ on my tests. My CPU flame graphs are virtually identical between the two versions as well. It would be better if we could demonstrate the difference using JMH, as it's pretty expensive for me to fire up those big nodes. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824886#comment-17824886 ] Jon Haddad commented on CASSANDRA-19429: I've tried several different workloads and I am unable to replicate the performance results from above. 4.1 and the branch are behaving almost identically for both workloads. 4.1 !image-2024-03-08-15-51-30-439.png! branch: !image-2024-03-08-15-52-07-902.png! > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824885#comment-17824885 ] Jon Haddad commented on CASSANDRA-19429: When I try to spin up those instance types in us-west-2 I get an error that they're invalid, so I'm running a test with c5.18xlarge. I'm working with a single node, with compaction disabled, and I reduced my memtable space to 16MB in order to constantly flush. I wrote 10m rows and have 1928 SStables. These boxes have 72 CPU. I'm using G1GC with a 24GB heap. I've tested concurrent_reads at 64 and 128 since there's enough cores on here to handle and we don't need to bottleneck on reads. So, right off the bat, I'm not able to duplicate the original observation about CPU not going over 50% utilization. 4.1 has reached 90+% CPU utilization: {noformat} 23:29:57 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 23:29:58 all 90.11 0.00 1.07 0.00 0.00 0.24 0.01 0.00 0.00 8.57 23:29:59 all 90.12 0.00 0.84 0.00 0.00 0.27 0.00 0.00 0.00 8.77 23:30:00 all 89.82 0.00 0.83 0.03 0.00 0.38 0.01 0.00 0.00 8.93 23:30:01 all 90.13 0.03 1.08 0.00 0.00 0.30 0.00 0.00 0.00 8.47 23:30:02 all 89.95 0.00 0.89 0.00 0.00 0.34 0.01 0.00 0.00 8.82 23:30:03 all 89.86 0.00 1.08 0.00 0.00 0.24 0.00 0.00 0.00 8.83 23:30:04 all 87.90 0.00 0.97 0.00 0.00 0.24 0.01 0.00 0.00 10.88 {noformat} Using a variety of easy-cass-stress KeyValue workloads with different settings for --rate, I'm unable to see any meaningful difference, performance-wise. {noformat} easy-cass-stress run KeyValue -d 20m --rate 20k -p 10m -t 16 -r 1{noformat} For each workload I've run, I've seen virtually identical results. Both are pushing C* to use 90% CPU and achieve roughly 25K reads / second. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824809#comment-17824809 ] Jon Haddad commented on CASSANDRA-19429: Looking at this today. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824698#comment-17824698 ] Stefan Miklosovic commented on CASSANDRA-19429: --- hi [~rustyrazorblade] any progress here? > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822669#comment-17822669 ] Dipietro Salvatore commented on CASSANDRA-19429: if you encounter problems to find r8g.24xl instances due to availability constrains, you can test it on r7i.24xl or r7g.16xl instance type. The gain results should be slightly less than r8g.24xl but anyway close to 2x based on our tests > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822178#comment-17822178 ] Stefan Miklosovic commented on CASSANDRA-19429: --- Yes, correct. When it comes to disabling compaction, that was an attempt to rule out any additional processing from Cassandra side, to just get "pure reads and writes" without any interference of Cassandra doing the compactions in the background. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822176#comment-17822176 ] Jon Haddad commented on CASSANDRA-19429: Yeah I'd definitely like to take a look at this. Scaling better with more cores would definitely be nice :) Please confirm I understand the situation. There's a bit of back and forth above and I want to make sure I'm clear on what to expect. * we're looking at the 4.1 patch here: [https://github.com/apache/cassandra/pull/3133] * Using r8g.24xlarge instances shows the difference * This test is single node * Performance is impacted by the number of SSTables, and you're disabling compaction to achieve that * High thread count * Mostly read workload (90%) I can take a look at this next week. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822061#comment-17822061 ] Stefan Miklosovic commented on CASSANDRA-19429: --- Hi [~rustyrazorblade], what do you think about this? > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821460#comment-17821460 ] Dipietro Salvatore commented on CASSANDRA-19429: from ~500k to ~150k ops. I am using all default settings - no optimizations on Cassandra or OS. yes that instance has 96 cores CPU but the CPU utilization is around 25% when it reaches 150k ops. In addition, this is not something specific to Graviton CPU since it happens also on Intel. I have tested the 50/50 R/W settings with : {code:java} bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=50,read=50\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html {code} Results: - 4.1.3 released: {code:java} Results: Op rate : 142,571 op/s [READ: 71,293 op/s, WRITE: 71,278 op/s] Partition rate : 142,571 pk/s [READ: 71,293 pk/s, WRITE: 71,278 pk/s] Row rate : 142,571 row/s [READ: 71,293 row/s, WRITE: 71,278 row/s] Latency mean : 0.7 ms [READ: 1.3 ms, WRITE: 0.1 ms] Latency median : 0.2 ms [READ: 1.2 ms, WRITE: 0.1 ms] Latency 95th percentile : 2.0 ms [READ: 2.3 ms, WRITE: 0.2 ms] Latency 99th percentile : 2.6 ms [READ: 2.9 ms, WRITE: 0.2 ms] Latency 99.9th percentile : 8.4 ms [READ: 9.6 ms, WRITE: 0.4 ms] Latency max : 51.0 ms [READ: 51.0 ms, WRITE: 47.7 ms] Total partitions : 85,661,309 [READ: 42,835,266, WRITE: 42,826,043] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 1,310 Total GC memory : 2067.821 GiB Total GC time : 9.1 seconds Avg GC time : 7.0 ms StdDev GC time : 3.6 ms Total operation time : 00:10:00 {code} - 4.1.3 with patch: {code:java} Results: Op rate : 459,728 op/s [READ: 229,910 op/s, WRITE: 229,818 op/s] Partition rate : 459,728 pk/s [READ: 229,910 pk/s, WRITE: 229,818 pk/s] Row rate : 459,728 row/s [READ: 229,910 row/s, WRITE: 229,818 row/s] Latency mean : 0.2 ms [READ: 0.3 ms, WRITE: 0.2 ms] Latency median : 0.2 ms [READ: 0.2 ms, WRITE: 0.1 ms] Latency 95th percentile : 0.3 ms [READ: 0.3 ms, WRITE: 0.2 ms] Latency 99th percentile : 0.4 ms [READ: 0.6 ms, WRITE: 0.3 ms] Latency 99.9th percentile : 8.4 ms [READ: 8.9 ms, WRITE: 7.4 ms] Latency max : 1887.4 ms [READ: 1,887.4 ms, WRITE: 48.1 ms] Total partitions : 275,966,298 [READ: 138,010,917, WRITE: 137,955,381] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 4,438 Total GC memory : 6971.464 GiB Total GC time : 33.7 seconds Avg GC time : 7.6 ms StdDev GC time : 3.8 ms Total operation time : 00:10:00 {code} Increasing the percentage of writes in the workload, increase the difference in performance between the patch and without (3.2x) Did you have the chance to test it with big instances your end? > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > >
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821448#comment-17821448 ] Stefan Miklosovic commented on CASSANDRA-19429: --- down to ~150k from what number exactly? I don't think flushes should have such an impact. Your machine is a beast so maybe the flushing is the bottleneck? What are memtable_flush_writers, memtable_cleanup_threshold, memtable_allocation_type, flush_compression and similar configuration parameters? > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821439#comment-17821439 ] Dipietro Salvatore commented on CASSANDRA-19429: yeah, interesting... let me re-test it again on my side. One thing I have noticed is that at the beginning of the benchmark both versions has similar performance. After few minutes (and few flushes), the released version's performance starts to drop down to ~150k and then stay constant. Since it is read only workload, we do not have flushes and so not have a drop in performance. I will try to see if I can understand it more > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821437#comment-17821437 ] Stefan Miklosovic commented on CASSANDRA-19429: --- How is it so? Because we are making changes in SSTableReader which _reads_ right? So reading path should be affected. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821436#comment-17821436 ] Dipietro Salvatore commented on CASSANDRA-19429: Test without compaction after writes and with nodetool disableautocompaction and just READs Cmd: {code:java} bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool disableautocompaction && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=0,read=100\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html {code} Results using Ubuntu22.04 on r8g.24xlarge with stress test colocated on the same instance : * 4.1.3 released: {code:java} Results: Op rate : 637,636 op/s [READ: 637,636 op/s] Partition rate : 637,636 pk/s [READ: 637,636 pk/s] Row rate : 637,636 row/s [READ: 637,636 row/s] Latency mean : 0.1 ms [READ: 0.1 ms] Latency median : 0.1 ms [READ: 0.1 ms] Latency 95th percentile : 0.2 ms [READ: 0.2 ms] Latency 99th percentile : 0.2 ms [READ: 0.2 ms] Latency 99.9th percentile : 2.6 ms [READ: 2.6 ms] Latency max : 21.8 ms [READ: 21.8 ms] Total partitions : 382,809,565 [READ: 382,809,565] Total errors : 0 [READ: 0] Total GC count : 3,379 Total GC memory : 5406.356 GiB Total GC time : 5.5 seconds Avg GC time : 1.6 ms StdDev GC time : 0.8 ms Total operation time : 00:10:00 {code} * 4.1.3 with patch: {code:java} Results: Op rate : 636,043 op/s [READ: 636,043 op/s] Partition rate : 636,043 pk/s [READ: 636,043 pk/s] Row rate : 636,043 row/s [READ: 636,043 row/s] Latency mean : 0.1 ms [READ: 0.1 ms] Latency median : 0.1 ms [READ: 0.1 ms] Latency 95th percentile : 0.2 ms [READ: 0.2 ms] Latency 99th percentile : 0.2 ms [READ: 0.2 ms] Latency 99.9th percentile : 2.7 ms [READ: 2.7 ms] Latency max : 16.2 ms [READ: 16.2 msz] Total partitions : 381,776,733 [READ: 381,776,733] Total errors : 0 [READ: 0] Total GC count : 3,396 Total GC memory : 5433.623 GiB Total GC time : 5.6 seconds Avg GC time : 1.6 ms StdDev GC time : 0.5 ms Total operation time : 00:10:00 {code} The patch doesn't have any effect on the read only workload > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 &&
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821388#comment-17821388 ] Dipietro Salvatore commented on CASSANDRA-19429: 2. Test without compaction after writes and nodetool disableautocompaction Cmd: {code:java} bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool disableautocompaction && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html ## Compact and re-run it bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html |& tee stress.txt {code} Results using Ubuntu22.04 on r8g.24xlarge with stress test colocated on the same instance : * 4.1.3 released: {code:java} Results: Op rate : 135,805 op/s [READ: 122,231 op/s, WRITE: 13,574 op/s] Partition rate : 135,805 pk/s [READ: 122,231 pk/s, WRITE: 13,574 pk/s] Row rate : 135,805 row/s [READ: 122,231 row/s, WRITE: 13,574 row/s] Latency mean : 0.7 ms [READ: 0.8 ms, WRITE: 0.2 ms] Latency median : 0.6 ms [READ: 0.7 ms, WRITE: 0.1 ms] Latency 95th percentile : 1.9 ms [READ: 2.0 ms, WRITE: 0.2 ms] Latency 99th percentile : 2.6 ms [READ: 2.6 ms, WRITE: 0.3 ms] Latency 99.9th percentile : 7.0 ms [READ: 7.2 ms, WRITE: 1.3 ms] Latency max : 51.3 ms [READ: 51.3 ms, WRITE: 48.8 ms] Total partitions : 81,488,855 [READ: 73,343,700, WRITE: 8,145,155] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 1,583 Total GC memory : 2522.153 GiB Total GC time : 7.4 seconds Avg GC time : 4.7 ms StdDev GC time : 2.2 ms Total operation time : 00:10:00 ## Compact and re-run it bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html |& tee stress.txt ... Results: Op rate : 136,878 op/s [READ: 123,177 op/s, WRITE: 13,701 op/s] Partition rate : 136,878 pk/s [READ: 123,177 pk/s, WRITE: 13,701 pk/s] Row rate : 136,878 row/s [READ: 123,177 row/s, WRITE: 13,701 row/s] Latency mean : 0.7 ms [READ: 0.8 ms, WRITE: 0.2 ms] Latency median : 0.6 ms [READ: 0.7 ms, WRITE: 0.1 ms] Latency 95th percentile : 1.9 ms [READ: 2.0 ms, WRITE: 0.2 ms] Latency 99th percentile : 2.6 ms [READ: 2.6 ms, WRITE: 0.3 ms] Latency 99.9th percentile : 6.5 ms [READ: 6.7 ms, WRITE: 1.2 ms] Latency max : 52.6 ms [READ: 52.6 ms, WRITE: 50.2 ms] Total partitions : 82,197,489 [READ: 73,969,820, WRITE: 8,227,669] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 1,395 Total GC memory : 2225.329 GiB Total GC time : 6.6 seconds Avg GC time : 4.7 ms StdDev GC time : 2.2 ms Total operation time : 00:10:00{code} * 4.1.3 with patch: {code:java} Results: Op rate : 241,176 op/s [READ: 217,059 op/s, WRITE: 24,117 op/s] Partition rate : 241,176 pk/s [READ: 217,059 pk/s, WRITE: 24,117 pk/s] Row rate : 241,176 row/s [READ: 217,059 row/s, WRITE: 24,117 row/s] Latency mean : 0.4 ms [READ: 0.4 ms, WRITE: 0.2 ms] Latency median : 0.3 ms [READ: 0.3 ms, WRITE: 0.1 ms] Latency 95th percentile : 0.7 ms [READ: 0.7 ms, WRITE: 0.2 ms] Latency 99th percentile : 0.8 ms [READ: 0.8 ms, WRITE: 0.3 ms] Latency 99.9th percentile : 7.2 ms [READ: 7.3 ms, WRITE: 5.1 ms] Latency max : 5003.8 ms [READ: 5,003.8 ms, WRITE: 48.5 ms] Total partitions : 144,931,367 [READ: 130,438,344, WRITE: 14,493,023] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 4,186 Total GC memory : 6673.759 GiB Total GC time : 23.3 seconds Avg GC time : 5.6 ms StdDev GC time : 3.7 ms Total operation time : 00:10:00 ## Compact and re-run it bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html |& tee stress.txt ... Results: Op rate : 232,130 op/s [READ: 208,904 op/s,
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821384#comment-17821384 ] Stefan Miklosovic commented on CASSANDRA-19429: --- I will try to provision similar machine in our AWS infrastructure and I will let you know. I just need to see this increase on my own, don't take it personally. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821383#comment-17821383 ] Dipietro Salvatore commented on CASSANDRA-19429: 1. Test without compaction after writes. Cmd used: {code:java} sbin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html {code} Results using Ubuntu22.04 on r8g.24xlarge with stress test colocated on the same instance : - 4.1.3 released: {code:java} Results: Op rate : 167,478 op/s [READ: 150,727 op/s, WRITE: 16,750 op/s] Partition rate : 167,478 pk/s [READ: 150,727 pk/s, WRITE: 16,750 pk/s] Row rate : 167,478 row/s [READ: 150,727 row/s, WRITE: 16,750 row/s] Latency mean : 0.6 ms [READ: 0.6 ms, WRITE: 0.2 ms] Latency median : 0.4 ms [READ: 0.5 ms, WRITE: 0.1 ms] Latency 95th percentile : 1.6 ms [READ: 1.6 ms, WRITE: 0.2 ms] Latency 99th percentile : 2.2 ms [READ: 2.2 ms, WRITE: 0.3 ms] Latency 99.9th percentile : 7.1 ms [READ: 7.3 ms, WRITE: 1.2 ms] Latency max : 48.3 ms [READ: 48.3 ms, WRITE: 46.0 ms] Total partitions : 100,600,039 [READ: 90,538,533, WRITE: 10,061,506] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 1,515 Total GC memory : 2411.407 GiB Total GC time : 8.3 seconds Avg GC time : 5.5 ms StdDev GC time : 2.4 ms Total operation time : 00:10:00{code} - 4.1.3 with patch: {code:java} Results: Op rate : 435,180 op/s [READ: 391,655 op/s, WRITE: 43,525 op/s] Partition rate : 435,180 pk/s [READ: 391,655 pk/s, WRITE: 43,525 pk/s] Row rate : 435,180 row/s [READ: 391,655 row/s, WRITE: 43,525 row/s] Latency mean : 0.2 ms [READ: 0.2 ms, WRITE: 0.2 ms] Latency median : 0.2 ms [READ: 0.2 ms, WRITE: 0.2 ms] Latency 95th percentile : 0.3 ms [READ: 0.3 ms, WRITE: 0.2 ms] Latency 99th percentile : 0.4 ms [READ: 0.4 ms, WRITE: 0.3 ms] Latency 99.9th percentile : 6.7 ms [READ: 6.7 ms, WRITE: 6.2 ms] Latency max : 1057.5 ms [READ: 1,057.5 ms, WRITE: 47.7 ms] Total partitions : 261,410,615 [READ: 235,265,301, WRITE: 26,145,314] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 4,543 Total GC memory : 7225.988 GiB Total GC time : 25.7 seconds Avg GC time : 5.7 ms StdDev GC time : 3.1 ms Total operation time : 00:10:00{code} Patch seems to have huge benefit on this case as well (2.6x). Noticed also huge benefit in P99 latency (5.5x decrease). > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821377#comment-17821377 ] Dipietro Salvatore commented on CASSANDRA-19429: > But yeah ... I am not running this on r8g.24xlarge or r7i.24xlarge . Not sure what is your HW but, based on our tests, the benefit of removing this lock starts using 4xlarge instance and above with ~1.40x improvement (tested on r7g.4xl and r7i.4xl). On small instances like r7i.xlarge, not a significant different has been recorded. > [~dipiets] I think the mistake you do is that you do "nodetool compact" after >you write the data and then you run mixed workload against that. Not sure I have got this completely. Why the "nodetool compact" has such huge difference with the patch and not with the released version? I see that it uses 1 SSTable but why does not have similar performance if this is the problem? I am testing the patch as you suggesting > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821245#comment-17821245 ] Stefan Miklosovic commented on CASSANDRA-19429: --- [~dipiets] I think the mistake you do is that you do "nodetool compact" after you write the data and they you run mixed workload against that. nodetool compact will make just 1 SSTable instead of whatever number of them you have there. I tested this locally too and, indeed, if you have just one table, that will be way more performance-friendly than having multiple of them, because ... there is just one table. So Cassandra does not need to look into any other - which is more performant by definition. If you do not run nodetool compact after writes, try to also do nodetool disableautocompaction, then run the tests. After it is finished, compact it all into one SSTable and run it again. You will see that it is slower and I do not think that looking into capacity etc has anything to do with it. I mean ... sure, we see less locking etc and we might add that change there, but in general I do not think that the effect it has is 2x hardly. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821211#comment-17821211 ] Stefan Miklosovic commented on CASSANDRA-19429: --- Well ... third time is a charm. I think I suffer from what is called a confirmation bias. I turned off autocompaction (nodetool disableautocompaction) and run the test just for reads (0 writes, 100 reads) for longer time and the difference between before and after is really negligible, at least in my case. I am wondering where [~dipiets] can see 3x here. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821200#comment-17821200 ] Stefan Miklosovic commented on CASSANDRA-19429: --- Well, I do see some speedup, just not 2x / 3x, I think this change is amplified more powerful machine a node runs on without this patch {noformat} Op rate : 89,392 op/s [READ: 80,439 op/s, WRITE: 8,953 op/s] Partition rate: 89,392 pk/s [READ: 80,439 pk/s, WRITE: 8,953 pk/s] Row rate : 89,392 row/s [READ: 80,439 row/s, WRITE: 8,953 row/s] Latency mean :2.2 ms [READ: 2.2 ms, WRITE: 2.2 ms] Latency median:1.6 ms [READ: 1.6 ms, WRITE: 1.6 ms] Latency 95th percentile :5.8 ms [READ: 5.8 ms, WRITE: 5.9 ms] Latency 99th percentile : 10.6 ms [READ: 10.6 ms, WRITE: 10.7 ms] Latency 99.9th percentile : 23.5 ms [READ: 23.4 ms, WRITE: 23.8 ms] Latency max : 180.5 ms [READ: 180.5 ms, WRITE: 122.7 ms] Total partitions : 5,408,128 [READ: 4,866,473, WRITE: 541,655] Total errors : 0 [READ: 0, WRITE: 0] Total GC count: 0 Total GC memory : 0.000 KiB Total GC time :0.0 seconds Avg GC time :NaN ms StdDev GC time:0.0 ms Total operation time : 00:01:00 {noformat} with this patch, two independent runs: {noformat} Op rate : 119,782 op/s [READ: 107,849 op/s, WRITE: 11,933 op/s] Partition rate: 119,782 pk/s [READ: 107,849 pk/s, WRITE: 11,933 pk/s] Row rate : 119,782 row/s [READ: 107,849 row/s, WRITE: 11,933 row/s] Latency mean :1.7 ms [READ: 1.6 ms, WRITE: 1.7 ms] Latency median:1.3 ms [READ: 1.3 ms, WRITE: 1.4 ms] Latency 95th percentile :3.8 ms [READ: 3.8 ms, WRITE: 4.0 ms] Latency 99th percentile :7.7 ms [READ: 7.7 ms, WRITE: 8.0 ms] Latency 99.9th percentile : 13.7 ms [READ: 13.7 ms, WRITE: 14.1 ms] Latency max : 114.6 ms [READ: 61.5 ms, WRITE: 114.6 ms] Total partitions : 7,188,152 [READ: 6,472,051, WRITE: 716,101] Total errors : 0 [READ: 0, WRITE: 0] Total GC count: 0 Total GC memory : 0.000 KiB Total GC time :0.0 seconds Avg GC time :NaN ms StdDev GC time:0.0 ms Total operation time : 00:01:00 {noformat} {noformat} Results: Op rate : 104,456 op/s [READ: 94,016 op/s, WRITE: 10,440 op/s] Partition rate: 104,456 pk/s [READ: 94,016 pk/s, WRITE: 10,440 pk/s] Row rate : 104,456 row/s [READ: 94,016 row/s, WRITE: 10,440 row/s] Latency mean :1.9 ms [READ: 1.9 ms, WRITE: 2.0 ms] Latency median:1.5 ms [READ: 1.4 ms, WRITE: 1.5 ms] Latency 95th percentile :4.7 ms [READ: 4.6 ms, WRITE: 4.8 ms] Latency 99th percentile :8.6 ms [READ: 8.6 ms, WRITE: 8.8 ms] Latency 99.9th percentile : 13.9 ms [READ: 13.8 ms, WRITE: 14.1 ms] Latency max : 85.4 ms [READ: 77.2 ms, WRITE: 85.4 ms] Total partitions : 6,268,822 [READ: 5,642,258, WRITE: 626,564] Total errors : 0 [READ: 0, WRITE: 0] Total GC count: 0 Total GC memory : 0.000 KiB Total GC time :0.0 seconds Avg GC time :NaN ms StdDev GC time:0.0 ms Total operation time : 00:01:00 {noformat} so the speedup it like 20% which is quite nice already. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821196#comment-17821196 ] Stefan Miklosovic commented on CASSANDRA-19429: --- I dont see the speedup, I was just testing same commands on my local PC, before / after numbers are basically more or less same, definitely not 2x or 3x. I used just 100 threads. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821178#comment-17821178 ] Stefan Miklosovic commented on CASSANDRA-19429: --- https://github.com/apache/cassandra/pull/3140 > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820913#comment-17820913 ] Dipietro Salvatore commented on CASSANDRA-19429: let me know when you have a PR ready for Cassandra 4 that I can test it > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820823#comment-17820823 ] Stefan Miklosovic commented on CASSANDRA-19429: --- yeah hard to believe ... I will prepare 4.0 patch and [~dipiets] can run it before / after too. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820822#comment-17820822 ] Brandon Williams commented on CASSANDRA-19429: -- A 2-3x gain sounds so good I'm having a hard time believing we've left it on the table for years, at least in the case of 4.0. Speaking of which, it would be good to check performance there too since it has the same code and everything should line up. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820816#comment-17820816 ] Dipietro Salvatore commented on CASSANDRA-19429: Got new profile with the patch and I don't see that locks anymore and the overall number dropped to 200. Preatty amazing! !Screenshot 2024-02-26 at 10.27.10.png! > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820814#comment-17820814 ] Stefan Miklosovic commented on CASSANDRA-19429: --- Great, so what are the next steps? [~brandon.williams] If think that if we go to call DD in MBeans, this should not stay only in 4.0 and 4.1 but we should copy this behavior to newer branches too. That would make it a little bit more involved. I am probably fine with leaving out MBean bits from 4.0 / 4.1 but it is questionable if we all are OK with it ... > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820812#comment-17820812 ] Dipietro Salvatore commented on CASSANDRA-19429: My performance benchmarks results looks good to me: r8g.24xl: 458k op/s (2.72x) r7i.24xl: 292k op/s (1.9x) Tested with commands: {code:java} cd git clone https://github.com/instaclustr/cassandra.git cassandra-instaclustr cd cassandra-instaclustr git checkout CASSANDRA-19429-4.1git log -10 --oneline CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f -R bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost -log file=result.log -graph file=graph.html |& tee stress.txt{code} > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820793#comment-17820793 ] Dipietro Salvatore commented on CASSANDRA-19429: sure, let me test your patch > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820696#comment-17820696 ] Stefan Miklosovic commented on CASSANDRA-19429: --- When one checks if (tracing) ..., there are more cases like that in the affected classes so we just align it to what is there and it definitely does not hurt. Also, I think that if we base that logic on DatabaseDescriptor, we should also cover MBean aspect of it. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820695#comment-17820695 ] Brandon Williams commented on CASSANDRA-19429: -- Not really because bq. neither of these are in the hot path, I haven't tested but I wouldn't expect a ton of performance gain here. but I guess we can check. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820628#comment-17820628 ] Stefan Miklosovic commented on CASSANDRA-19429: --- I created this branch where I incorporated the idea above with more changes related to it (1). For example, we should no call getCapacity() in tracing log messages, we should firstly check if we are going to log on trace level and then construct log message. If we are not, then we call getCapacity() but we just throw it away. I think that in practice we are not logging on trace at all so this is just redundant. When we check DatabaseDescriptor.getKeyCacheSizeInMiB(), if we change capacity of these caches via CacheServiceMBean, then it would have non-zero capacity but we never set it in DatabaseDescriptor. I covered this too, DatabaseDescriptor values are updated on CacheServiceMBean method calls too. (1) https://github.com/apache/cassandra/pull/3133/files [~dipiets] [~brandon.williams] does this make sense to you? [~dipiets] Could you please run your tests again with the (1) patch and check the numbers? > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820624#comment-17820624 ] Stefan Miklosovic commented on CASSANDRA-19429: --- [CASSANDRA-19429-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19429-4.1] {noformat} java11_pre-commit_tests ✓ j11_build1m 32s ✓ j11_cqlsh_dtests_py3 5m 24s ✓ j11_cqlsh_dtests_py311 5m 36s ✓ j11_cqlsh_dtests_py311_vnode 5m 31s ✓ j11_cqlsh_dtests_py385m 32s ✓ j11_cqlsh_dtests_py38_vnode 5m 29s ✓ j11_cqlsh_dtests_py3_vnode 5m 30s ✓ j11_cqlshlib_cython_tests 7m 8s ✓ j11_cqlshlib_tests 6m 12s ✓ j11_dtests 32m 23s ✓ j11_dtests_vnode34m 47s ✓ j11_jvm_dtests 15m 20s ✓ j11_jvm_dtests_vnode12m 21s ✕ j11_unit_tests 7m 38s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] java11_separate_tests java8_pre-commit_tests java8_separate_tests {noformat} [java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3927/workflows/1740b3e0-8813-4a29-89fe-7a0fb951231f] [java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3927/workflows/d1dd4431-caeb-4f55-985d-4720311c000f] [java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3927/workflows/11197ae4-9546-4751-a9f2-c4b05de555aa] [java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3927/workflows/47a3a7a0-6fdd-4b9f-8cdf-4c620c802b0f] > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 10m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820442#comment-17820442 ] Stefan Miklosovic commented on CASSANDRA-19429: --- I think it is <= 0 when key_cache_size: 0MiB in cassandra.yaml Just set a breakpoint in CacheService constructor and follow the execution its constructor where keyCache = initKeyCache(); is called. If we do not want to check "if (maybeKeyCache.getCapacity() > 0)" I think the same effect would be to check "if (DatabaseDescriptor.getKeyCacheSizeInMiB() > 0)" and it would be same. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820232#comment-17820232 ] Dipietro Salvatore commented on CASSANDRA-19429: Under which conditions the `maybeKeyCache.getCapacity()` call can return value <= 0? Based on my tests, I can see that it is always set to a value > 0 [https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L720] I have also saw that Cassandra 5 is not calling it anymore > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820225#comment-17820225 ] Brandon Williams commented on CASSANDRA-19429: -- Capacity is the configured capacity for the cache while the size is how much of it has been consumed, so they aren't interchangeable. Also neither of these are in the hot path, I haven't tested but I wouldn't expect a ton of performance gain here. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820219#comment-17820219 ] Dipietro Salvatore commented on CASSANDRA-19429: My initial patch and test are based on this patch (https://github.com/salvatoredipietro/cassandra/commit/45f770dd8e06e490a4cd0f222e1d4a3060a45a38): {code:java} >From 45f770dd8e06e490a4cd0f222e1d4a3060a45a38 Mon Sep 17 00:00:00 2001 From: Salvatore Dipietro Date: Wed, 21 Feb 2024 16:29:15 -0800 Subject: [PATCH] Remove lock contention generated by getCapacity function in SSTableReader --- CHANGES.txt | 1 + .../org/apache/cassandra/io/sstable/format/SSTableReader.java | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index 21084b8c721f..4ab8ccf21b3e 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.1.5 + * Remove lock contention generated by getCapacity function in SSTableReader Merged from 4.0: * Remove bashisms for mx4j tool in cassandra-env.sh (CASSANDRA-19416) Merged from 3.11: diff --git a/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java b/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java index f26cf65c93e0..3b703911984d 100644 --- a/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java +++ b/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java @@ -520,7 +520,7 @@ public static SSTableReader open(Descriptor descriptor, sstable.validate(); if (sstable.getKeyCache() != null) -logger.trace("key cache contains {}/{} keys", sstable.getKeyCache().size(), sstable.getKeyCache().getCapacity()); +logger.trace("key cache contains {} keys", sstable.getKeyCache().size()); return sstable; } @@ -717,7 +717,7 @@ public void setupOnline() // e.g. by BulkLoader, which does not initialize the cache. As a kludge, we set up the cache // here when we know we're being wired into the rest of the server infrastructure. InstrumentingCache maybeKeyCache = CacheService.instance.keyCache; -if (maybeKeyCache.getCapacity() > 0) +if (maybeKeyCache.size() > 0) keyCache = maybeKeyCache; final ColumnFamilyStore cfs = Schema.instance.getColumnFamilyStoreInstance(metadata().id); {code} However, it seem that the `keyCache` variable remains set to `null`. Based on my experiments, I can see that `maybeKeyCache.getCapacity()` always returns a fixed long number for all its calls and it doesn't change over time. Can someone please advice under which circumstances it is necessary to check that `maybeKeyCache.getCapacity() > 0` or if it is always possible to set it like ` keyCache = CacheService.instance.keyCache;` ? > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress
[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader
[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820218#comment-17820218 ] Brandon Williams commented on CASSANDRA-19429: -- Nice. It looks like 4.0 has the same SSTR code as well. > Remove lock contention generated by getCapacity function in SSTableReader > - > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable >Reporter: Dipietro Salvatore >Assignee: Dipietro Salvatore >Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: asprof_cass4.1.3__lock_20240216052912lock.html > > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org