[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-05-15 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846725#comment-17846725
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I think that because this is targetted primarily to 4.0 / 4.1 and all eyes are 
on 5.0 for now, I do not see this has high priority at the moment, especially 
as we are just releasing 4.1.5 / 4.0.13.

I would however say that you can expect this to be eventually merged, I just do 
not know about when.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-05-15 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846721#comment-17846721
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


[~smiklosovic] by any chance, do you have any update regarding this? What are 
the following steps? Thanks

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-05-07 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844256#comment-17844256
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

To summarize the changes I did, I basically made sure that getCapacity() is not 
used and we read size of caches from DatabaseDescriptor instead. Doing that 
required a little bit more changes in DD, it is currently quite hairy to 
resolve sizes and use getters / setters, there is another set of variables in 
DatabaseDescriptor we do not need and everything goes through Config instead.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-05-07 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844205#comment-17844205
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

4.0 looks reasonable OK too, failing test_pkey_requirement is the fault of 
recently merged CASSANDRA-19341

[CASSANDRA-19429-4.0|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19429-4.0]
{noformat}
java11_pre-commit_tests 
java11_separate_tests
java8_pre-commit_tests  
  ✓ j8_build 1m 33s
  ✓ j8_cqlsh-dtests-py2-no-vnodes5m 17s
  ✓ j8_cqlsh-dtests-py2-with-vnodes  6m 34s
  ✓ j8_cqlsh_dtests_py3  6m 55s
  ✓ j8_cqlsh_dtests_py3118m 10s
  ✓ j8_cqlsh_dtests_py311_vnode  6m 49s
  ✓ j8_cqlsh_dtests_py38 6m 12s
  ✓ j8_cqlsh_dtests_py38_vnode   5m 31s
  ✓ j8_cqlsh_dtests_py3_vnode5m 11s
  ✓ j8_cqlshlib_tests 9m 6s
  ✓ j8_jvm_dtests14m 9s
  ✓ j11_cqlsh_dtests_py3_vnode   5m 25s
  ✓ j11_cqlsh_dtests_py38_vnode  5m 33s
  ✓ j11_cqlsh_dtests_py385m 34s
  ✓ j11_cqlsh_dtests_py311_vnode 5m 33s
  ✓ j11_cqlsh_dtests_py311   5m 45s
  ✓ j11_cqlsh-dtests-py2-with-vnodes 5m 37s
  ✓ j11_cqlsh-dtests-py2-no-vnodes   5m 38s
  ✕ j8_dtests   31m 17s
  json_test.TestJsonFullRowInsertSelect test_pkey_requirement
  ✕ j8_dtests_vnode 36m 54s
  json_test.TestJsonFullRowInsertSelect test_pkey_requirement
  ✕ j8_unit_tests   10m 45s
  org.apache.cassandra.cql3.MemtableSizeTest testTruncationReleasesLogSpace
  ✕ j8_utests_system_keyspace_directory  9m 40s
  org.apache.cassandra.cql3.BatchTest testNonCounterInLoggedBatch
  org.apache.cassandra.cql3.MemtableSizeTest testTruncationReleasesLogSpace
  ✕ j11_unit_tests7m 8s
  org.apache.cassandra.cql3.MemtableSizeTest testTruncationReleasesLogSpace
  ✕ j11_dtests_vnode43m 44s
  bootstrap_test.TestBootstrap test_bootstrap_binary_disabled
  json_test.TestJsonFullRowInsertSelect test_pkey_requirement
  ✕ j11_dtests  31m 20s
  json_test.TestJsonFullRowInsertSelect test_pkey_requirement
  ✕ j11_cqlsh_dtests_py3 5m 21s
  compression_test.TestCompression test_compression_cql_options
java8_separate_tests 
{noformat}

[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4287/workflows/62e98e6f-2687-4ba0-943e-17b990ad5737]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4287/workflows/946942f4-66a9-4aa2-aed7-581dda35f814]
[java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4287/workflows/151e79a6-1494-40c6-9c22-3ac71cf6161b]
[java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4287/workflows/3db92b5a-3faf-440d-9b31-c40e56fb0778]


> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-05-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843813#comment-17843813
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I think that size and capacity are trully two different concepts as already 
explained by Brandon and myself and I believe that having capacity set to 0 
practically disables a respective cache. 

CI for 4.1 looks good with the below patch. I identified couple more places 
where we could use reading the capacity from DatabaseDescriptor rather than 
calling getCapacity() on the cache to see if it is bigger than zero to proceed.

One just needs to be cautious to not forget to update DatabaseDescriptor when 
capacity is set to zero, e.g. from JMX etc if we are going to use DD as the 
source of truth whether a cache is enabled or not.

For other branches, this ticket, as I write this, specifies only 4.0 and 4.1 ax 
fix versions but that should be extended up to trunk.

This is all being said if there is an appetite to do this across all versions 
instead of just fixing the trace messages as suggested by Jon. The bellow build 
tells me that we are on something here and it would be just a matter of 
applying this elsewhere.

[CASSANDRA-19429-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19429-4.1]
{noformat}
java11_pre-commit_tests 
java11_separate_tests
java8_pre-commit_tests  
  ✓ j8_build 6m 53s
  ✓ j8_cqlsh_dtests_py3  6m 53s
  ✓ j8_cqlsh_dtests_py3118m 26s
  ✓ j8_cqlsh_dtests_py311_vnode  6m 53s
  ✓ j8_cqlsh_dtests_py38  8m 3s
  ✓ j8_cqlsh_dtests_py38_vnode   7m 40s
  ✓ j8_cqlsh_dtests_py3_vnode8m 16s
  ✓ j8_cqlshlib_cython_tests13m 32s
  ✓ j8_cqlshlib_tests   11m 44s
  ✓ j8_dtests   33m 23s
  ✓ j8_dtests_vnode 38m 26s
  ✓ j8_jvm_dtests   19m 30s
  ✓ j8_jvm_dtests_vnode 12m 23s
  ✓ j8_simulator_dtests  5m 51s
  ✓ j11_jvm_dtests_vnode11m 50s
  ✓ j11_jvm_dtests  19m 47s
  ✓ j11_dtests_vnode 35m 6s
  ✓ j11_dtests  34m 46s
  ✓ j11_cqlshlib_tests   6m 35s
  ✓ j11_cqlshlib_cython_tests9m 27s
  ✓ j11_cqlsh_dtests_py3_vnode   5m 47s
  ✓ j11_cqlsh_dtests_py38_vnode   6m 7s
  ✓ j11_cqlsh_dtests_py385m 41s
  ✓ j11_cqlsh_dtests_py311_vnode 5m 50s
  ✓ j11_cqlsh_dtests_py311   5m 56s
  ✓ j11_cqlsh_dtests_py3 5m 38s
  ✕ j8_unit_tests9m 53s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
  org.apache.cassandra.net.ConnectionTest testMessageDeliveryOnReconnect
  ✕ j8_utests_system_keyspace_directory  11m 5s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
  ✕ j11_unit_tests   8m 12s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
java8_separate_tests 
{noformat}

[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/7470a92f-1cb3-487d-9d0a-dd1f781a79e8]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/2116e61d-1e6a-4589-85e7-1980acfbdb05]
[java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/30501a49-c2b0-4aaf-a504-f087f27e88f7]
[java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/232134f1-5956-4346-afa3-bd556b6c5f60]


> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-05-05 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843589#comment-17843589
 ] 

Jon Haddad commented on CASSANDRA-19429:


{quote}there is 63 instances in the production codebase where we do not check 
tracing level and just log. We should take more holistic approach here and just 
fix this everywhere which would be probably better suited for a new ticket.
{quote}
 

SGTM
{quote}Anyway, I think we should still deliver this with the changes OP 
suggested (and myself improved on top of that). This seems like a fairly 
innocent change and I do not see where it might go wrong. We should double 
check that our understanding of getSize vs getCapacity is correct though.
{quote}
Yeah, I agree with this train of thought, I just don't know if they're 
equivalent.  My point was more that if we're unsure, simply merging in the 
check on the trace would deliver the same exact performance benefit.  If you 
want to verify the correctness aspect, that SGTM, but since I don't have time 
to do it myself I would look for someone else to approve the patch from a 
correctness perspective.  A +1 on just a trace check is a no brainer and I'd be 
fine doing that.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-05-05 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843588#comment-17843588
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

[~rustyrazorblade] there is 63 instances in the production codebase where we do 
not check tracing level and just log. We should take more holistic approach 
here and just fix this everywhere which would be probably better suited of a 
new ticket.

Anyway, I think we should still deliver this with the changes OP suggested (and 
myself improved on top of that). This seems like a fairy innocent change and I 
do not see where it might go wrong. We should double check that our 
understanding of getSize vs getCapacity is correct though. 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-05-05 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843527#comment-17843527
 ] 

Jon Haddad commented on CASSANDRA-19429:


I took a quick look at the patch, and while I can't say offhand if size() and 
getCapacity() are equivalent, I agree with [~smiklosovic] about adding a check 
if trace logging is enabled before logging the trace.  

Regardless of the difference between size() and getCapacity(), adding the check 
alone avoids the lookup in almost all cases and will deliver a performance win 
if there's any to be had.  I can't think of any reason we can't merge that.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-04-22 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839865#comment-17839865
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


[~smiklosovic] We have tested the patch also on smaller instances( > 4xl) using 
the same instructions I have shared above. 
Here the results:
 
|*Instance type*| *4.1.3 (on JDK11)*|*4.1.3 with patch* {*}(on JDK11){*}{*}{*}|
|r7i.4xlarge|91k/ops 4.9ms@P99|129k/ops 2.4ms@P99 (1.4x)|
|r7i.8xlarge|139k/ops 2.5ms@P99|201k/ops 1.3ms@P99 (1.44x)|
|r7i.16xlarge|152k/ops 2.2ms@P99|277k/ops 0.8ms@P99 (1.8x)|

Starting from 4xl instances we saw a 40% increase in performance. 
Can you please try to test it on your side? 

Otherwise can you let us know how we can help to move forward with it (provide 
instances to test this, run some specific benchmarks, more validation tests, 
etc.)? Thanks

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-04-16 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837784#comment-17837784
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


[~smiklosovic] What if we provide the instance for few days? Would you be able 
to test this?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-04-16 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837621#comment-17837621
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I asked internally our techops to test this and we do not have such a big 
instances we run Cassandra on regularly so it is of a low interest to deal with 
this for now.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-04-12 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836522#comment-17836522
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

Sorry for the delay, I still plan to take a look. 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-04-10 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835883#comment-17835883
 ] 

Jon Haddad commented on CASSANDRA-19429:


Unfortunately, I won't have time to look at this anytime soon, I'm pretty 
overwhelmed with work.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-04-10 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835873#comment-17835873
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


[~smiklosovic] did you got any chance to check more on this issue?
[~rustyrazorblade] Will you have some time this or next week to test it on such 
instances? I can help you provisioning them

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-25 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830582#comment-17830582
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


[~rustyrazorblade]  sure, let me know when you will be able to work on it so 
that I can provide that

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-24 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830297#comment-17830297
 ] 

Jon Haddad commented on CASSANDRA-19429:


Thanks, I appreciate the offer.  I won't have time to look at this in the next 
several days, but can probably look in early April.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-20 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829257#comment-17829257
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


[~rustyrazorblade] I understand. Would it help if I can provide a r7g.16xlarge 
instance for a couple of days to you to run your test? 
If so, I would just need your public SSH key that I can add to the instance to 
allow you to access to it

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-19 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828548#comment-17828548
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I will try that sometimes next week. I am very curious to test it on more 
powerful machines. Thank you [~rustyrazorblade]  for your willingness to test 
this, very much appreciated. 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-19 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828547#comment-17828547
 ] 

Jon Haddad commented on CASSANDRA-19429:


Unfortunately I don't have any other information since I tore down the 
environment.  I'm not backed by a company, I have to pay for these tests out of 
my own pocket, and on my own time, so my test was extremely limited.  

It certainly looks like the patch runs better non-patched based on the 
information you've shared.  It's unfortunate that I did not see the same 
results you did.  I'd love to figure out why we're seeing different results, 
but for the reasons I've mentioned, it's not something I'll be able to 
contribute to.  It may be that the issue only reveals itself in only in very 
specific type of test, or it could have been related to the number of SSTables 
I had, or one of the OS flags I set.  I honestly don't know.  Unfortunately 
figuring this one out for me is a lower priority than a lot of other things on 
my plate.  

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-19 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828541#comment-17828541
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


[~rustyrazorblade] Can you provide more info on how you have test it? What 
commands did you run to load the DB and benchmark it? What throughput did you 
register?

I have tried to reproduce it following your settings but I was not able to see 
the same results and, in my case, I still see 1.5x improvement on c5.18xl with 
the [~smiklosovic]  patch. I have also tried to test it without all the OS and 
Cassandra optimizations you provided, and the results are similar. Full 
commands list used below:


{code:java}
# From fresh AWS Ubuntu 22.04 instance
# Basic upgrade of packages and install few tools
sudo apt update && sudo apt upgrade -y
sudo apt install -y unzip tmux 

## JAVA 11 OpenJDKs
sudo apt install -y openjdk-11-jdk ant
java --version

# Download Cassandra
cd 
wget https://github.com/apache/cassandra/archive/refs/tags/cassandra-4.1.3.zip 
&& unzip cassandra-4.1.3.zip && cd ~/cassandra-cassandra-4.1.3/ 
# Patched
# git clone https://github.com/instaclustr/cassandra.git cassandra-instaclustr 
&& cd cassandra-instaclustr && git checkout CASSANDRA-19429-4.1 && git log -10 
--oneline

# Start DB
CASSANDRA_USE_JDK11=true ant realclean && \
CASSANDRA_USE_JDK11=true ant jar && \
CASSANDRA_USE_JDK11=true ant stress-build  && \
rm -rf data && \
bin/cassandra -f -R

# Start workload
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
bin/nodetool clearsnapshot --all && \
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && \
bin/nodetool compact keyspace1   && \
sleep 30s && \
tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE 
-rate threads=100 -node localhost -log file=result.log -graph file=graph.html 
|& tee stress.txt
 {code}

The difference in performance it is also clear on the graph.html report at the 
end of the execution. After around 300sec into the benchmark, the CPU 
utilization drops to 30% and so the performance while, with the patch, it 
remains much more constant through the entire experiment with a avg cpu 
utilization of 75% and 5x lower P99 latency.
!Screenshot 2024-03-19 at 15.22.50.png!

 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-12 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825755#comment-17825755
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


thanks, I will try to reproduce it this week

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-11 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825465#comment-17825465
 ] 

Jon Haddad commented on CASSANDRA-19429:


I'm using the latest 4.1 and the branch off 4.1 I found above.

Cassandra 4.1 ships configured to use ParNew + CMS, but there's not really a 
good reason to use that.  We changed to G1 in 5.0, it's far superior to ParNew 
/ CMS for general usage.  Both versions were set up using G1.

I built my clusters using easy-cass-lab (formerly tlp-cluster).  Here's the 
cassandra.patch.yaml I used along with the JMX options and a bunch of other 
settings.  You can see the workloads I ran and my flame graphs in a previous 
comment.

I ran these tests on the same node, with the same exact dataset.  In fact, I 
ran many tests, switching back and forth between 4.1 and the patched branch in 
order to try to find some way of replicating what you found, however I was 
unable to.  

 
{noformat}
---
cluster_name: "Test Cluster"
num_tokens: 4
seed_provider:
  class_name: "org.apache.cassandra.locator.SimpleSeedProvider"
  parameters:
    seeds: "172.31.42.182"
hints_directory: "/mnt/cassandra/hints"
data_file_directories:
- "/mnt/cassandra/data"
commitlog_directory: "/mnt/cassandra/commitlog"
concurrent_reads: 128
concurrent_writes: 64
trickle_fsync: true
endpoint_snitch: "Ec2Snitch"
memtable_heap_space: 16MiB{noformat}
 

 
{noformat}
### G1 Settings
## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
-XX:+ParallelRefProcEnabled
-XX:MaxTenuringThreshold=4
-XX:G1HeapRegionSize=16m
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=50
-XX:G1MaxNewSizePercent=70-Xmx24G
-Xms24G{noformat}
 

 

 
{noformat}
sudo sysctl kernel.perf_event_paranoid=1
sudo sysctl kernel.kptr_restrict=0
echo 0 > /proc/sys/vm/zone_reclaim_mode{noformat}
 

 

 
{noformat}
# /etc/security/limits.d/cassandra.conf
cassandra soft memlock unlimited
cassandra hard memlock unlimited
cassandra soft nofile 10
cassandra hard nofile 10
cassandra soft nproc 32768
cassandra hard nproc 32768
cassandra - as unlimited{noformat}
 
{noformat}
# /etc/sysctl.d/60-cassandra.conf
vm.max_map_count = 1048575{noformat}
{noformat}
sudo swapoff --allsudo sysctl -p{noformat}
 

 

 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-11 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825442#comment-17825442
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


> I'm working with a single node, with compaction disabled, and I reduced my 
> memtable space to 16MB in order to constantly flush.  I wrote 10m rows and 
> have 1928 SStables.  These boxes have 72 CPU.  I'm using G1GC with a 24GB 
> heap.  I've tested concurrent_reads at 64 and 128 since there's enough cores 
> on here to handle and we don't need to bottleneck on reads.

[~rustyrazorblade] Can you provided full detailed and step-by-step instructions 
on how you tested? So I will try to reproduce as well.
BTW which Cassandra version did you use?  AFAIK, Cassandra 4 uses 
UseConcMarkSweepGC and not G1GC. Am I right?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-08 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824889#comment-17824889
 ] 

Jon Haddad commented on CASSANDRA-19429:


Unfortunately I don't have the time or a budget to do anymore testing on this.  
I'm still not sure why [~dipiets] was unable to push Cassandra past 50% CPU. 
I'm seeing 72 CPUs at 90%+ on my tests.  My CPU flame graphs are virtually 
identical between the two versions as well.  

It would be better if we could demonstrate the difference using JMH, as it's 
pretty expensive for me to fire up those big nodes.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-08 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824886#comment-17824886
 ] 

Jon Haddad commented on CASSANDRA-19429:


I've tried several different workloads and I am unable to replicate the 
performance results from above.  4.1 and the branch are behaving almost 
identically for both workloads.  

4.1

!image-2024-03-08-15-51-30-439.png!

 

branch:

 

!image-2024-03-08-15-52-07-902.png!

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-08 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824885#comment-17824885
 ] 

Jon Haddad commented on CASSANDRA-19429:


When I try to spin up those instance types in us-west-2 I get an error that 
they're invalid, so I'm running a test with c5.18xlarge.

I'm working with a single node, with compaction disabled, and I reduced my 
memtable space to 16MB in order to constantly flush.  I wrote 10m rows and have 
1928 SStables.  These boxes have 72 CPU.  I'm using G1GC with a 24GB heap.  
I've tested concurrent_reads at 64 and 128 since there's enough cores on here 
to handle and we don't need to bottleneck on reads.

So, right off the bat, I'm not able to duplicate the original observation about 
CPU not going over 50% utilization.  4.1 has reached 90+% CPU utilization:
{noformat}
23:29:57     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  
%guest  %gnice   %idle
23:29:58     all   90.11    0.00    1.07    0.00    0.00    0.24    0.01    
0.00    0.00    8.57
23:29:59     all   90.12    0.00    0.84    0.00    0.00    0.27    0.00    
0.00    0.00    8.77
23:30:00     all   89.82    0.00    0.83    0.03    0.00    0.38    0.01    
0.00    0.00    8.93
23:30:01     all   90.13    0.03    1.08    0.00    0.00    0.30    0.00    
0.00    0.00    8.47
23:30:02     all   89.95    0.00    0.89    0.00    0.00    0.34    0.01    
0.00    0.00    8.82
23:30:03     all   89.86    0.00    1.08    0.00    0.00    0.24    0.00    
0.00    0.00    8.83
23:30:04     all   87.90    0.00    0.97    0.00    0.00    0.24    0.01    
0.00    0.00   10.88 {noformat}
Using a variety of easy-cass-stress KeyValue workloads with different settings 
for --rate, I'm unable to see any meaningful difference, performance-wise.
{noformat}
easy-cass-stress run KeyValue -d 20m --rate 20k -p 10m -t 16 -r 1{noformat}
For each workload I've run, I've seen virtually identical results.  Both are 
pushing C* to use 90% CPU and achieve roughly 25K reads / second.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-08 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824809#comment-17824809
 ] 

Jon Haddad commented on CASSANDRA-19429:


Looking at this today.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-08 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824698#comment-17824698
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

hi [~rustyrazorblade] any progress here?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-03-01 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822669#comment-17822669
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


if you encounter problems to find r8g.24xl instances due to availability 
constrains, you can test it on r7i.24xl or r7g.16xl instance type. 
The gain results should be slightly less than r8g.24xl but anyway close to 2x 
based on our tests

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-29 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822178#comment-17822178
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

Yes, correct.

When it comes to disabling compaction, that was an attempt to rule out any 
additional processing from Cassandra side, to just get "pure reads and writes" 
without any interference of Cassandra doing the compactions in the background. 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-29 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822176#comment-17822176
 ] 

Jon Haddad commented on CASSANDRA-19429:


Yeah I'd definitely like to take a look at this.  Scaling better with more 
cores would definitely be nice :)

Please confirm I understand the situation.  There's a bit of back and forth 
above and I want to make sure I'm clear on what to expect.
 * we're looking at the 4.1 patch here: 
[https://github.com/apache/cassandra/pull/3133]
 * Using r8g.24xlarge instances shows the difference
 * This test is single node
 * Performance is impacted by the number of SSTables, and you're disabling 
compaction to achieve that
 * High thread count
 * Mostly read workload (90%)

I can take a look at this next week.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-29 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822061#comment-17822061
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

Hi [~rustyrazorblade], what do you think about this? 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821460#comment-17821460
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


from ~500k to ~150k ops. I am using all default settings - no optimizations on 
Cassandra or OS.

yes that instance has 96 cores CPU but the CPU utilization is around 25% when 
it reaches 150k ops. In addition, this is not something specific to Graviton 
CPU since it happens also on Intel.

I have tested the 50/50 R/W settings with :

 
{code:java}
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool compact 
keyspace1   && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=50,read=50\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html {code}

Results:

 

- 4.1.3 released:
{code:java}
Results:
Op rate                   :  142,571 op/s  [READ: 71,293 op/s, WRITE: 71,278 
op/s]
Partition rate            :  142,571 pk/s  [READ: 71,293 pk/s, WRITE: 71,278 
pk/s]
Row rate                  :  142,571 row/s [READ: 71,293 row/s, WRITE: 71,278 
row/s]
Latency mean              :    0.7 ms [READ: 1.3 ms, WRITE: 0.1 ms]
Latency median            :    0.2 ms [READ: 1.2 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    2.0 ms [READ: 2.3 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.9 ms, WRITE: 0.2 ms]
Latency 99.9th percentile :    8.4 ms [READ: 9.6 ms, WRITE: 0.4 ms]
Latency max               :   51.0 ms [READ: 51.0 ms, WRITE: 47.7 ms]
Total partitions          : 85,661,309 [READ: 42,835,266, WRITE: 42,826,043]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 1,310
Total GC memory           : 2067.821 GiB
Total GC time             :    9.1 seconds
Avg GC time               :    7.0 ms
StdDev GC time            :    3.6 ms
Total operation time      : 00:10:00 {code}

- 4.1.3 with patch:
{code:java}
Results:
Op rate                   :  459,728 op/s  [READ: 229,910 op/s, WRITE: 229,818 
op/s]
Partition rate            :  459,728 pk/s  [READ: 229,910 pk/s, WRITE: 229,818 
pk/s]
Row rate                  :  459,728 row/s [READ: 229,910 row/s, WRITE: 229,818 
row/s]
Latency mean              :    0.2 ms [READ: 0.3 ms, WRITE: 0.2 ms]
Latency median            :    0.2 ms [READ: 0.2 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    0.3 ms [READ: 0.3 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    0.4 ms [READ: 0.6 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    8.4 ms [READ: 8.9 ms, WRITE: 7.4 ms]
Latency max               : 1887.4 ms [READ: 1,887.4 ms, WRITE: 48.1 ms]
Total partitions          : 275,966,298 [READ: 138,010,917, WRITE: 137,955,381]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 4,438
Total GC memory           : 6971.464 GiB
Total GC time             :   33.7 seconds
Avg GC time               :    7.6 ms
StdDev GC time            :    3.8 ms
Total operation time      : 00:10:00 {code}

Increasing the percentage of writes in the workload, increase the difference in 
performance between the patch and without (3.2x)

 


Did you have the chance to test it with big instances your end?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821448#comment-17821448
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

down to ~150k from what number exactly? I don't think flushes should have such 
an impact. Your machine is a beast so maybe the flushing is the bottleneck? 
What are memtable_flush_writers, memtable_cleanup_threshold, 
memtable_allocation_type, flush_compression and similar configuration 
parameters?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821439#comment-17821439
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


yeah, interesting... let me re-test it again on my side.

One thing I have noticed is that at the beginning of the benchmark both 
versions has similar performance. After few minutes (and few flushes), the 
released version's performance starts to drop down to ~150k and then stay 
constant. Since it is read only workload, we do not have flushes and so not 
have a drop in performance. I will try to see if I can understand it more

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821437#comment-17821437
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

How is it so? Because we are making changes in SSTableReader which _reads_ 
right? So reading path should be affected.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821436#comment-17821436
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


Test without compaction after writes and with nodetool disableautocompaction 
and just READs

Cmd:
{code:java}
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool 
disableautocompaction && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=0,read=100\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html  {code}
 

 

Results using Ubuntu22.04 on r8g.24xlarge with stress test colocated on the 
same instance :
 * 4.1.3 released:

 
{code:java}
Results:
Op rate                   :  637,636 op/s  [READ: 637,636 op/s]
Partition rate            :  637,636 pk/s  [READ: 637,636 pk/s]
Row rate                  :  637,636 row/s [READ: 637,636 row/s]
Latency mean              :    0.1 ms [READ: 0.1 ms]
Latency median            :    0.1 ms [READ: 0.1 ms]
Latency 95th percentile   :    0.2 ms [READ: 0.2 ms]
Latency 99th percentile   :    0.2 ms [READ: 0.2 ms]
Latency 99.9th percentile :    2.6 ms [READ: 2.6 ms]
Latency max               :   21.8 ms [READ: 21.8 ms]
Total partitions          : 382,809,565 [READ: 382,809,565]
Total errors              :          0 [READ: 0]
Total GC count            : 3,379
Total GC memory           : 5406.356 GiB
Total GC time             :    5.5 seconds
Avg GC time               :    1.6 ms
StdDev GC time            :    0.8 ms
Total operation time      : 00:10:00 {code}
 
 * 4.1.3 with patch:

{code:java}
Results:
Op rate                   :  636,043 op/s  [READ: 636,043 op/s]
Partition rate            :  636,043 pk/s  [READ: 636,043 pk/s]
Row rate                  :  636,043 row/s [READ: 636,043 row/s]
Latency mean              :    0.1 ms [READ: 0.1 ms]
Latency median            :    0.1 ms [READ: 0.1 ms]
Latency 95th percentile   :    0.2 ms [READ: 0.2 ms]
Latency 99th percentile   :    0.2 ms [READ: 0.2 ms]
Latency 99.9th percentile :    2.7 ms [READ: 2.7 ms]
Latency max               :   16.2 ms [READ: 16.2 msz]
Total partitions          : 381,776,733 [READ: 381,776,733]
Total errors              :          0 [READ: 0]
Total GC count            : 3,396
Total GC memory           : 5433.623 GiB
Total GC time             :    5.6 seconds
Avg GC time               :    1.6 ms
StdDev GC time            :    0.5 ms
Total operation time      : 00:10:00 {code}

The patch doesn't have any effect on the read only workload

 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821388#comment-17821388
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


2. Test without compaction after writes and nodetool disableautocompaction
Cmd:

 
{code:java}
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool 
disableautocompaction && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html

## Compact and re-run it
bin/nodetool compact keyspace1 && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html |& tee stress.txt
 {code}
 

Results using Ubuntu22.04 on r8g.24xlarge with stress test colocated on the 
same instance :
 * 4.1.3 released:

{code:java}
Results:
Op rate                   :  135,805 op/s  [READ: 122,231 op/s, WRITE: 13,574 
op/s]
Partition rate            :  135,805 pk/s  [READ: 122,231 pk/s, WRITE: 13,574 
pk/s]
Row rate                  :  135,805 row/s [READ: 122,231 row/s, WRITE: 13,574 
row/s]
Latency mean              :    0.7 ms [READ: 0.8 ms, WRITE: 0.2 ms]
Latency median            :    0.6 ms [READ: 0.7 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    1.9 ms [READ: 2.0 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.6 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    7.0 ms [READ: 7.2 ms, WRITE: 1.3 ms]
Latency max               :   51.3 ms [READ: 51.3 ms, WRITE: 48.8 ms]
Total partitions          : 81,488,855 [READ: 73,343,700, WRITE: 8,145,155]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 1,583
Total GC memory           : 2522.153 GiB
Total GC time             :    7.4 seconds
Avg GC time               :    4.7 ms
StdDev GC time            :    2.2 ms
Total operation time      : 00:10:00

## Compact and re-run it
bin/nodetool compact keyspace1   && sleep 30s && tools/bin/cassandra-stress 
mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node 
localhost -log file=result.log -graph file=graph.html |& tee stress.txt
...
Results:
Op rate                   :  136,878 op/s  [READ: 123,177 op/s, WRITE: 13,701 
op/s]
Partition rate            :  136,878 pk/s  [READ: 123,177 pk/s, WRITE: 13,701 
pk/s]
Row rate                  :  136,878 row/s [READ: 123,177 row/s, WRITE: 13,701 
row/s]
Latency mean              :    0.7 ms [READ: 0.8 ms, WRITE: 0.2 ms]
Latency median            :    0.6 ms [READ: 0.7 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    1.9 ms [READ: 2.0 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.6 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    6.5 ms [READ: 6.7 ms, WRITE: 1.2 ms]
Latency max               :   52.6 ms [READ: 52.6 ms, WRITE: 50.2 ms]
Total partitions          : 82,197,489 [READ: 73,969,820, WRITE: 8,227,669]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 1,395
Total GC memory           : 2225.329 GiB
Total GC time             :    6.6 seconds
Avg GC time               :    4.7 ms
StdDev GC time            :    2.2 ms
Total operation time      : 00:10:00{code}
 
 * 4.1.3 with patch:

{code:java}
Results:
Op rate                   :  241,176 op/s  [READ: 217,059 op/s, WRITE: 24,117 
op/s]
Partition rate            :  241,176 pk/s  [READ: 217,059 pk/s, WRITE: 24,117 
pk/s]
Row rate                  :  241,176 row/s [READ: 217,059 row/s, WRITE: 24,117 
row/s]
Latency mean              :    0.4 ms [READ: 0.4 ms, WRITE: 0.2 ms]
Latency median            :    0.3 ms [READ: 0.3 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    0.7 ms [READ: 0.7 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    0.8 ms [READ: 0.8 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    7.2 ms [READ: 7.3 ms, WRITE: 5.1 ms]
Latency max               : 5003.8 ms [READ: 5,003.8 ms, WRITE: 48.5 ms]
Total partitions          : 144,931,367 [READ: 130,438,344, WRITE: 14,493,023]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 4,186
Total GC memory           : 6673.759 GiB
Total GC time             :   23.3 seconds
Avg GC time               :    5.6 ms
StdDev GC time            :    3.7 ms
Total operation time      : 00:10:00

## Compact and re-run it
bin/nodetool compact keyspace1   && sleep 30s && tools/bin/cassandra-stress 
mixed ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node 
localhost -log file=result.log -graph file=graph.html |& tee stress.txt
...
Results:
Op rate                   :  232,130 op/s  [READ: 208,904 op/s, 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821384#comment-17821384
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I will try to provision similar machine in our AWS infrastructure and I will 
let you know. I just need to see this increase on my own, don't take it 
personally. 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821383#comment-17821383
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


1. Test without compaction after writes.
Cmd used:
{code:java}
sbin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && sleep 30s && 
tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE 
-rate threads=100 -node localhost -log file=result.log -graph file=graph.html  
{code}
 

Results using Ubuntu22.04 on r8g.24xlarge with stress test colocated on the 
same instance :

- 4.1.3 released:
{code:java}
Results:
Op rate                   :  167,478 op/s  [READ: 150,727 op/s, WRITE: 16,750 
op/s]
Partition rate            :  167,478 pk/s  [READ: 150,727 pk/s, WRITE: 16,750 
pk/s]
Row rate                  :  167,478 row/s [READ: 150,727 row/s, WRITE: 16,750 
row/s]
Latency mean              :    0.6 ms [READ: 0.6 ms, WRITE: 0.2 ms]
Latency median            :    0.4 ms [READ: 0.5 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    1.6 ms [READ: 1.6 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    2.2 ms [READ: 2.2 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    7.1 ms [READ: 7.3 ms, WRITE: 1.2 ms]
Latency max               :   48.3 ms [READ: 48.3 ms, WRITE: 46.0 ms]
Total partitions          : 100,600,039 [READ: 90,538,533, WRITE: 10,061,506]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 1,515
Total GC memory           : 2411.407 GiB
Total GC time             :    8.3 seconds
Avg GC time               :    5.5 ms
StdDev GC time            :    2.4 ms
Total operation time      : 00:10:00{code}
 

- 4.1.3 with patch:
{code:java}
Results:
Op rate                   :  435,180 op/s  [READ: 391,655 op/s, WRITE: 43,525 
op/s]
Partition rate            :  435,180 pk/s  [READ: 391,655 pk/s, WRITE: 43,525 
pk/s]
Row rate                  :  435,180 row/s [READ: 391,655 row/s, WRITE: 43,525 
row/s]
Latency mean              :    0.2 ms [READ: 0.2 ms, WRITE: 0.2 ms]
Latency median            :    0.2 ms [READ: 0.2 ms, WRITE: 0.2 ms]
Latency 95th percentile   :    0.3 ms [READ: 0.3 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    0.4 ms [READ: 0.4 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    6.7 ms [READ: 6.7 ms, WRITE: 6.2 ms]
Latency max               : 1057.5 ms [READ: 1,057.5 ms, WRITE: 47.7 ms]
Total partitions          : 261,410,615 [READ: 235,265,301, WRITE: 26,145,314]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 4,543
Total GC memory           : 7225.988 GiB
Total GC time             :   25.7 seconds
Avg GC time               :    5.7 ms
StdDev GC time            :    3.1 ms
Total operation time      : 00:10:00{code}

Patch seems to have huge benefit on this case as well (2.6x).
Noticed also huge benefit in P99 latency (5.5x decrease).

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821377#comment-17821377
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


> But yeah ... I am not running this on r8g.24xlarge or r7i.24xlarge .
Not sure what is your HW but, based on our tests, the benefit of removing this 
lock starts using 4xlarge instance and above with ~1.40x improvement (tested on 
r7g.4xl and r7i.4xl). On small instances like r7i.xlarge, not a significant 
different has been recorded.


> [~dipiets] I think the mistake you do is that you do "nodetool compact" after 
>you write the data and then you run mixed workload against that.

Not sure I have got this completely. Why the "nodetool compact" has such huge 
difference with the patch and not with the released version? I see that it uses 
1 SSTable but why does not have similar performance if this is the problem?

I am testing the patch as you suggesting

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821245#comment-17821245
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

[~dipiets] I think the mistake you do is that you do "nodetool compact" after 
you write the data and they you run mixed workload against that.

nodetool compact will make just 1 SSTable instead of whatever number of them 
you have there.

I tested this locally too and, indeed, if you have just one table, that will be 
way more performance-friendly than having multiple of them, because ... there 
is just one table. So Cassandra does not need to look into any other - which is 
more performant by definition. 

If you do not run nodetool compact after writes, try to also do nodetool 
disableautocompaction, then run the tests. After it is finished, compact it all 
into one SSTable and run it again. You will see that it is slower and I do not 
think that looking into capacity etc has anything to do with it.

I mean ... sure, we see less locking etc and we might add that change there, 
but in general I do not think that the effect it has is 2x  hardly.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821211#comment-17821211
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

Well ... third time is a charm.

I think I suffer from what is called a confirmation bias. I turned off 
autocompaction (nodetool disableautocompaction) and run the test just for reads 
(0 writes, 100 reads) for longer time and the difference between before and 
after is really negligible, at least in my case. I am wondering where 
[~dipiets] can see 3x here.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821200#comment-17821200
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

Well, I do see some speedup, just not 2x / 3x, I think this change is amplified 
more powerful machine a node runs on

without this patch

{noformat}
Op rate   :   89,392 op/s  [READ: 80,439 op/s, WRITE: 8,953 
op/s]
Partition rate:   89,392 pk/s  [READ: 80,439 pk/s, WRITE: 8,953 
pk/s]
Row rate  :   89,392 row/s [READ: 80,439 row/s, WRITE: 8,953 
row/s]
Latency mean  :2.2 ms [READ: 2.2 ms, WRITE: 2.2 ms]
Latency median:1.6 ms [READ: 1.6 ms, WRITE: 1.6 ms]
Latency 95th percentile   :5.8 ms [READ: 5.8 ms, WRITE: 5.9 ms]
Latency 99th percentile   :   10.6 ms [READ: 10.6 ms, WRITE: 10.7 ms]
Latency 99.9th percentile :   23.5 ms [READ: 23.4 ms, WRITE: 23.8 ms]
Latency max   :  180.5 ms [READ: 180.5 ms, WRITE: 122.7 ms]
Total partitions  :  5,408,128 [READ: 4,866,473, WRITE: 541,655]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

with this patch, two independent runs:

{noformat}
Op rate   :  119,782 op/s  [READ: 107,849 op/s, WRITE: 11,933 
op/s]
Partition rate:  119,782 pk/s  [READ: 107,849 pk/s, WRITE: 11,933 
pk/s]
Row rate  :  119,782 row/s [READ: 107,849 row/s, WRITE: 11,933 
row/s]
Latency mean  :1.7 ms [READ: 1.6 ms, WRITE: 1.7 ms]
Latency median:1.3 ms [READ: 1.3 ms, WRITE: 1.4 ms]
Latency 95th percentile   :3.8 ms [READ: 3.8 ms, WRITE: 4.0 ms]
Latency 99th percentile   :7.7 ms [READ: 7.7 ms, WRITE: 8.0 ms]
Latency 99.9th percentile :   13.7 ms [READ: 13.7 ms, WRITE: 14.1 ms]
Latency max   :  114.6 ms [READ: 61.5 ms, WRITE: 114.6 ms]
Total partitions  :  7,188,152 [READ: 6,472,051, WRITE: 716,101]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

{noformat}
Results:
Op rate   :  104,456 op/s  [READ: 94,016 op/s, WRITE: 10,440 
op/s]
Partition rate:  104,456 pk/s  [READ: 94,016 pk/s, WRITE: 10,440 
pk/s]
Row rate  :  104,456 row/s [READ: 94,016 row/s, WRITE: 10,440 
row/s]
Latency mean  :1.9 ms [READ: 1.9 ms, WRITE: 2.0 ms]
Latency median:1.5 ms [READ: 1.4 ms, WRITE: 1.5 ms]
Latency 95th percentile   :4.7 ms [READ: 4.6 ms, WRITE: 4.8 ms]
Latency 99th percentile   :8.6 ms [READ: 8.6 ms, WRITE: 8.8 ms]
Latency 99.9th percentile :   13.9 ms [READ: 13.8 ms, WRITE: 14.1 ms]
Latency max   :   85.4 ms [READ: 77.2 ms, WRITE: 85.4 ms]
Total partitions  :  6,268,822 [READ: 5,642,258, WRITE: 626,564]
Total errors  :  0 [READ: 0, WRITE: 0]
Total GC count: 0
Total GC memory   : 0.000 KiB
Total GC time :0.0 seconds
Avg GC time   :NaN ms
StdDev GC time:0.0 ms
Total operation time  : 00:01:00
{noformat}

so the speedup it like 20% which is quite nice already.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821196#comment-17821196
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I dont see the speedup, I was just testing same commands on my local PC, before 
/ after numbers are basically more or less same, definitely not 2x or 3x. 

I used just 100 threads.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-27 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821178#comment-17821178
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

https://github.com/apache/cassandra/pull/3140

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820913#comment-17820913
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


let me know when you have a PR ready for Cassandra 4 that I can test it

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820823#comment-17820823
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

yeah hard to believe ... I will prepare 4.0 patch and [~dipiets] can run it 
before / after too.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820822#comment-17820822
 ] 

Brandon Williams commented on CASSANDRA-19429:
--

A 2-3x gain sounds so good I'm having a hard time believing we've left it on 
the table for years, at least in the case of 4.0.  Speaking of which, it would 
be good to check performance there too since it has the same code and 
everything should line up.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820816#comment-17820816
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


Got new profile with the patch and I don't see that locks anymore and the 
overall number dropped to 200. Preatty amazing!

 

!Screenshot 2024-02-26 at 10.27.10.png!

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820814#comment-17820814
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

Great, so what are the next steps? [~brandon.williams]

If think that if we go to call DD in MBeans, this should not stay only in 4.0 
and 4.1 but we should copy this behavior to newer branches too. That would make 
it a little bit more involved. 

I am probably fine with leaving out MBean bits from 4.0 / 4.1 but it is 
questionable if we all are OK with it ... 

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820812#comment-17820812
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


My performance benchmarks results looks good to me:
r8g.24xl:  458k op/s  (2.72x)
r7i.24xl:  292k op/s    (1.9x)

Tested with commands:
{code:java}
cd 
git clone https://github.com/instaclustr/cassandra.git cassandra-instaclustr
cd cassandra-instaclustr
git checkout CASSANDRA-19429-4.1git log -10 --oneline

CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f -R

bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=1000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool compact 
keyspace1   && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=10,read=90\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html |& tee stress.txt{code}

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820793#comment-17820793
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


sure, let me test your patch

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820696#comment-17820696
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

When one checks if (tracing) ..., there are more cases like that in the 
affected classes so we just align it to what is there and it definitely does 
not hurt.

Also, I think that if we base that logic on DatabaseDescriptor, we should also 
cover MBean aspect of it.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820695#comment-17820695
 ] 

Brandon Williams commented on CASSANDRA-19429:
--

Not really because

bq. neither of these are in the hot path, I haven't tested but I wouldn't 
expect a ton of performance gain here.

but I guess we can check.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820628#comment-17820628
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I created this branch where I incorporated the idea above with more changes 
related to it (1). For example, we should no call getCapacity() in tracing log 
messages, we should firstly check if we are going to log on trace level and 
then construct log message. If we are not, then we call getCapacity() but we 
just throw it away. I think that in practice we are not logging on trace at all 
so this is just redundant.

When we check DatabaseDescriptor.getKeyCacheSizeInMiB(), if we change capacity 
of these caches via CacheServiceMBean, then it would have non-zero capacity but 
we never set it in DatabaseDescriptor. I covered this too, DatabaseDescriptor 
values are updated on CacheServiceMBean method calls too.

(1) https://github.com/apache/cassandra/pull/3133/files

[~dipiets] [~brandon.williams] does this make sense to you? [~dipiets] Could 
you please run your tests again with the (1) patch and check the numbers?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-26 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820624#comment-17820624
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

[CASSANDRA-19429-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19429-4.1]
{noformat}
java11_pre-commit_tests 
  ✓ j11_build1m 32s
  ✓ j11_cqlsh_dtests_py3 5m 24s
  ✓ j11_cqlsh_dtests_py311   5m 36s
  ✓ j11_cqlsh_dtests_py311_vnode 5m 31s
  ✓ j11_cqlsh_dtests_py385m 32s
  ✓ j11_cqlsh_dtests_py38_vnode  5m 29s
  ✓ j11_cqlsh_dtests_py3_vnode   5m 30s
  ✓ j11_cqlshlib_cython_tests 7m 8s
  ✓ j11_cqlshlib_tests   6m 12s
  ✓ j11_dtests  32m 23s
  ✓ j11_dtests_vnode34m 47s
  ✓ j11_jvm_dtests  15m 20s
  ✓ j11_jvm_dtests_vnode12m 21s
  ✕ j11_unit_tests   7m 38s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
java11_separate_tests
java8_pre-commit_tests  
java8_separate_tests 
{noformat}

[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3927/workflows/1740b3e0-8813-4a29-89fe-7a0fb951231f]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3927/workflows/d1dd4431-caeb-4f55-985d-4720311c000f]
[java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3927/workflows/11197ae4-9546-4751-a9f2-c4b05de555aa]
[java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3927/workflows/47a3a7a0-6fdd-4b9f-8cdf-4c620c802b0f]


> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-25 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820442#comment-17820442
 ] 

Stefan Miklosovic commented on CASSANDRA-19429:
---

I think it is <= 0 when key_cache_size: 0MiB in cassandra.yaml

Just set a breakpoint in CacheService constructor and follow the execution its 
constructor where keyCache = initKeyCache(); is called.

If we do not want to check "if (maybeKeyCache.getCapacity() > 0)" I think the 
same effect would be to check "if (DatabaseDescriptor.getKeyCacheSizeInMiB() > 
0)" and it would be same.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-23 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820232#comment-17820232
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


Under which conditions the `maybeKeyCache.getCapacity()` call can return value 
<= 0?
Based on my tests, I can see that it is always set to a value > 0 
[https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L720]

I have also saw that Cassandra 5 is not calling it anymore

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-23 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820225#comment-17820225
 ] 

Brandon Williams commented on CASSANDRA-19429:
--

Capacity is the configured capacity for the cache while the size is how much of 
it has been consumed, so they aren't interchangeable. Also neither of these are 
in the hot path, I haven't tested but I wouldn't expect a ton of performance 
gain here.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-23 Thread Dipietro Salvatore (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820219#comment-17820219
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:


My initial patch and test are based on this patch 
(https://github.com/salvatoredipietro/cassandra/commit/45f770dd8e06e490a4cd0f222e1d4a3060a45a38):


{code:java}
>From 45f770dd8e06e490a4cd0f222e1d4a3060a45a38 Mon Sep 17 00:00:00 2001
From: Salvatore Dipietro 
Date: Wed, 21 Feb 2024 16:29:15 -0800
Subject: [PATCH] Remove lock contention generated by getCapacity function in
 SSTableReader

---
 CHANGES.txt   | 1 +
 .../org/apache/cassandra/io/sstable/format/SSTableReader.java | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 21084b8c721f..4ab8ccf21b3e 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.1.5
+ * Remove lock contention generated by getCapacity function in SSTableReader
 Merged from 4.0:
  * Remove bashisms for mx4j tool in cassandra-env.sh (CASSANDRA-19416)
 Merged from 3.11:
diff --git a/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java 
b/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java
index f26cf65c93e0..3b703911984d 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java
@@ -520,7 +520,7 @@ public static SSTableReader open(Descriptor descriptor,
 sstable.validate();
 
 if (sstable.getKeyCache() != null)
-logger.trace("key cache contains {}/{} keys", 
sstable.getKeyCache().size(), sstable.getKeyCache().getCapacity());
+logger.trace("key cache contains {} keys", 
sstable.getKeyCache().size());
 
 return sstable;
 }
@@ -717,7 +717,7 @@ public void setupOnline()
 // e.g. by BulkLoader, which does not initialize the cache.  As a 
kludge, we set up the cache
 // here when we know we're being wired into the rest of the server 
infrastructure.
 InstrumentingCache maybeKeyCache = 
CacheService.instance.keyCache;
-if (maybeKeyCache.getCapacity() > 0)
+if (maybeKeyCache.size() > 0)
 keyCache = maybeKeyCache;
 
 final ColumnFamilyStore cfs = 
Schema.instance.getColumnFamilyStoreInstance(metadata().id);
 {code}

However, it seem that the `keyCache` variable remains set to `null`. 

Based on my experiments, I can see that `maybeKeyCache.getCapacity()` always 
returns a fixed long number for all its calls and it doesn't change over time.
Can someone please advice under which circumstances it is necessary to check 
that `maybeKeyCache.getCapacity() > 0` or if it is always possible to set it 
like ` keyCache = CacheService.instance.keyCache;` ?

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress 

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

2024-02-23 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820218#comment-17820218
 ] 

Brandon Williams commented on CASSANDRA-19429:
--

Nice.  It looks like 4.0 has the same SSTR code as well.

> Remove lock contention generated by getCapacity function in SSTableReader
> -
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/SSTable
>Reporter: Dipietro Salvatore
>Assignee: Dipietro Salvatore
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=1000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org