[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-25 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830401#comment-17830401
 ] 

Stefan Miklosovic commented on CASSANDRA-19477:
---

[CASSANDRA-19477-trunk|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19477-trunk]
{noformat}
java17_pre-commit_tests 
  ✓ j17_build3m 57s
  ✓ j17_cqlsh_dtests_py3117m 2s
  ✓ j17_cqlsh_dtests_py311_vnode 7m 32s
  ✓ j17_cqlsh_dtests_py386m 50s
  ✓ j17_cqlsh_dtests_py38_vnode  7m 16s
  ✓ j17_cqlshlib_cython_tests7m 39s
  ✓ j17_cqlshlib_tests   6m 31s
  ✓ j17_dtests  34m 33s
  ✓ j17_dtests_vnode35m 10s
  ✓ j17_jvm_dtests_latest_vnode_repeat  26m 31s
  ✓ j17_jvm_dtests_repeat28m 7s
  ✓ j17_unit_tests  16m 26s
  ✓ j17_unit_tests_repeat0m 18s
  ✓ j17_utests_latest   13m 59s
  ✓ j17_utests_latest_repeat 0m 13s
  ✓ j17_utests_oa_repeat 0m 29s
  ✕ j17_dtests_latest   34m 36s
  offline_tools_test.TestOfflineTools test_sstablelevelreset
  offline_tools_test.TestOfflineTools test_sstableofflinerelevel
  configuration_test.TestConfiguration test_change_durable_writes
  configuration_test.TestConfiguration test_change_durable_writes
  ✕ j17_jvm_dtests  27m 59s
  
org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest 
testEndpointVerificationEnabledIpNotInSAN TIMEOUTED
  ✕ j17_jvm_dtests_latest_vnode 22m 44s
  junit.framework.TestSuite 
org.apache.cassandra.fuzz.harry.integration.model.InJVMTokenAwareExecutorTest 
TIMEOUTED
  ✕ j17_utests_oa   13m 58s
  org.apache.cassandra.db.compaction.CompactionsBytemanTest 
testSSTableNotEnoughDiskSpaceForCompactionGetsDropped
java17_separate_tests
java11_pre-commit_tests 
  ✓ j11_build7m 57s
  ✓ j11_cqlsh_dtests_py3117m 7s
  ✓ j11_cqlsh_dtests_py311_vnode10m 13s
  ✓ j11_cqlsh_dtests_py38 8m 1s
  ✓ j11_cqlsh_dtests_py38_vnode 10m 25s
  ✓ j11_cqlshlib_cython_tests7m 28s
  ✓ j11_cqlshlib_tests   9m 40s
  ✓ j11_dtests_vnode36m 58s
  ✓ j11_jvm_dtests_latest_vnode 25m 28s
  ✓ j11_jvm_dtests_latest_vnode_repeat  29m 22s
  ✓ j11_jvm_dtests_repeat28m 7s
  ✓ j11_unit_tests  15m 17s
  ✓ j11_unit_tests_repeat0m 30s
  ✓ j11_utests_latest   16m 56s
  ✓ j11_utests_latest_repeat 0m 34s
  ✓ j11_utests_oa   13m 58s
  ✓ j11_utests_oa_repeat  1m 0s
  ✓ j11_utests_system_keyspace_directory 18m 1s
  ✓ j11_utests_system_keyspace_directory_repeat  3m 39s
  ✓ j17_cqlsh_dtests_py3117m 6s
  ✓ j17_cqlsh_dtests_py311_vnode 7m 27s
  ✓ j17_cqlsh_dtests_py386m 51s
  ✓ j17_cqlsh_dtests_py38_vnode  7m 14s
  ✓ j17_cqlshlib_cython_tests7m 38s
  ✓ j17_cqlshlib_tests   6m 57s
  ✓ j17_dtests  32m 21s
  ✓ j17_dtests_vnode34m 24s
  ✓ j17_jvm_dtests_latest_vnode 22m 45s
  ✓ j17_jvm_dtests_latest_vnode_repeat  26m 32s
  ✓ j17_jvm_dtests_repeat   28m 21s
  ✓ j17_unit_tests_repeat0m 16s
  ✓ j17_utests_latest   15m 34s
  ✓ j17_utests_latest_repeat 0m 36s
  ✓ j17_utests_oa   13m 43s
  ✓ j17_utests_oa_repeat 0m 17s
  ✕ j11_dtests  37m 26s
  pushed_notifications_test.TestPushedNotifications 
test_move_single_node_localhost
  ✕ j11_dtests_latest   40m 40s
  bootstrap_test.TestBootstrap test_bootstrap_with_reset_bootstrap_state
  offline_tools_test.TestOfflineTools test_sstablelevelreset
  offline_tools_test.TestOfflineTools test_sstableofflinerelevel
  configuration_test.TestConfiguration test_change_durable_writes
  ✕ j11_jvm_dtests 

[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830308#comment-17830308
 ] 

Jon Haddad commented on CASSANDRA-19477:


Here's some more fun graphs.  Both read and write latency and load average, are 
significantly improved.

 

 

!image-2024-03-24-18-16-50-370.png|width=645,height=205!

 

!image-2024-03-24-18-20-07-734.png|width=723,height=229!

!image-2024-03-24-18-17-48-334.png|width=653,height=210!

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, 
> image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, 
> image-2024-03-24-18-20-07-734.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830307#comment-17830307
 ] 

Stefan Miklosovic commented on CASSANDRA-19477:
---

nice! It was all the joind effort really, [~aleksey] helped me to improve and 
polish the idea so big kudos to him!

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-24 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830306#comment-17830306
 ] 

Jon Haddad commented on CASSANDRA-19477:


I've set up a 3 node cluster, loaded 15GB of data then took down a node and let 
hints accumulate.  I switched one node to use the 4.1 patch branch above, and 
let the other node remain on release 4.1, then ran this:
{noformat}
easy-cass-stress run RandomPartitionAccess --workload.rows=1000 --rate 5k -d 2h 
-t 4{noformat}
Here's the 4.1 release flame graph.  
[^flame-cassandra0-release-2024-03-25_00-16-44.html]

StorageProxy.mutate is taking up 17% of CPU time, with shouldHint taking up 
almost 7% of CPU time.

Here's the 4.1 + patch flame graph: 
[^flame-cassandra0-patched-2024-03-25_00-40-47.html]

StorageProxy.mutate is only taking up 10% of CPU time now, with shouldHint 
taking up .26% of CPU time.

You can see the below graph 172.31.36.176 is using less CPU overall.

!image-2024-03-24-17-57-32-560.png|width=857,height=270!

 

Here's the same setup with additional load.
{noformat}
easy-cass-stress run RandomPartitionAccess --workload.rows=1000 --rate 30k -d 
2h -t 4{noformat}
!image-2024-03-24-18-08-36-918.png|width=749,height=302!

 

The improvement in this patch is fantastic, really nice work [~smiklosovic].  
I'm +1 with regard to performance, but deferring to [~aleksey] to judge 
correctness.

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, 
> flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, 
> image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-23 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830036#comment-17830036
 ] 

Stefan Miklosovic commented on CASSANDRA-19477:
---

[CASSANDRA-19477-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19477-4.1]
{noformat}
java8_pre-commit_tests  
  ✓ j8_build 3m 56s
  ✓ j8_cqlsh_dtests_py3  5m 12s
  ✓ j8_cqlsh_dtests_py3118m 35s
  ✓ j8_cqlsh_dtests_py311_vnode   9m 4s
  ✓ j8_cqlsh_dtests_py38 6m 47s
  ✓ j8_cqlsh_dtests_py38_vnode6m 8s
  ✓ j8_cqlsh_dtests_py3_vnode9m 18s
  ✓ j8_cqlshlib_cython_tests11m 55s
  ✓ j8_cqlshlib_tests 8m 6s
  ✓ j8_dtests   32m 16s
  ✓ j8_dtests_vnode 36m 21s
  ✓ j8_jvm_dtests   16m 18s
  ✓ j8_jvm_dtests_repeat41m 32s
  ✓ j8_jvm_dtests_vnode_repeat  41m 13s
  ✓ j8_simulator_dtests  2m 49s
  ✓ j8_unit_tests_repeat 3m 59s
  ✓ j8_utests_system_keyspace_directory_repeat   3m 45s
  ✓ j11_unit_tests_repeat0m 31s
  ✓ j11_jvm_dtests_vnode_repeat 38m 49s
  ✓ j11_jvm_dtests_vnode12m 37s
  ✓ j11_jvm_dtests_repeat39m 0s
  ✓ j11_jvm_dtests   16m 8s
  ✓ j11_dtests_vnode35m 43s
  ✓ j11_dtests  33m 55s
  ✓ j11_cqlshlib_tests   6m 15s
  ✓ j11_cqlshlib_cython_tests7m 10s
  ✓ j11_cqlsh_dtests_py3_vnode   5m 37s
  ✓ j11_cqlsh_dtests_py38_vnode  6m 10s
  ✓ j11_cqlsh_dtests_py385m 26s
  ✓ j11_cqlsh_dtests_py311_vnode 5m 46s
  ✓ j11_cqlsh_dtests_py311   5m 29s
  ✓ j11_cqlsh_dtests_py3 5m 20s
  ✕ j8_jvm_dtests_vnode 16m 54s
  org.apache.cassandra.distributed.test.GossipTest nodeDownDuringMove
  ✕ j8_unit_tests   11m 19s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
  ✕ j8_utests_system_keyspace_directory  9m 35s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
  ✕ j11_unit_tests8m 6s
  org.apache.cassandra.db.compaction.DateTieredCompactionStrategyTest 
testDropExpiredSSTables
  org.apache.cassandra.db.compaction.DateTieredCompactionStrategyTest 
testFilterOldSSTables
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
{noformat}

[java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4065/workflows/3c368b8e-2cc7-4c78-afe3-62b45253e416]


> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flamegraph.cpu.html
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by 

[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-22 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829991#comment-17829991
 ] 

Jon Haddad commented on CASSANDRA-19477:


Awesome.  I'll fire up a test with the 4.1 branch, since that's what I tested 
before, and post my findings.

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flamegraph.cpu.html
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-22 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829945#comment-17829945
 ] 

Stefan Miklosovic commented on CASSANDRA-19477:
---

[CASSANDRA-19477-5.0|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19477-5.0]
{noformat}
java17_pre-commit_tests 
  ✓ j17_build3m 49s
  ✓ j17_cqlsh_dtests_py3116m 4s
  ✓ j17_cqlsh_dtests_py311_vnode 6m 13s
  ✓ j17_cqlsh_dtests_py38 6m 3s
  ✓ j17_cqlsh_dtests_py38_vnode  6m 20s
  ✓ j17_cqlshlib_cython_tests7m 25s
  ✓ j17_cqlshlib_tests   6m 27s
  ✓ j17_dtests   32m 3s
  ✓ j17_jvm_dtests  23m 21s
  ✓ j17_jvm_dtests_latest_vnode  14m 2s
  ✓ j17_jvm_dtests_latest_vnode_repeat  40m 42s
  ✓ j17_jvm_dtests_repeat   39m 52s
  ✓ j17_unit_tests  17m 44s
  ✓ j17_unit_tests_repeat2m 28s
  ✓ j17_utests_latest   15m 45s
  ✓ j17_utests_latest_repeat 2m 34s
  ✓ j17_utests_oa_repeat 0m 13s
  ✕ j17_dtests_latest   34m 34s
  configuration_test.TestConfiguration test_change_durable_writes
  ✕ j17_dtests_vnode 32m 7s
  ✕ j17_utests_oa   15m 57s
  org.apache.cassandra.net.ConnectionTest testTimeout
java17_separate_tests
java11_pre-commit_tests 
  ✓ j11_build6m 59s
  ✓ j11_cqlsh_dtests_py311   9m 42s
  ✓ j11_cqlsh_dtests_py311_vnode 7m 41s
  ✓ j11_cqlsh_dtests_py388m 20s
  ✓ j11_cqlsh_dtests_py38_vnode  7m 59s
  ✓ j11_cqlshlib_cython_tests   11m 40s
  ✓ j11_cqlshlib_tests   9m 16s
  ✓ j11_dtests  38m 39s
  ✓ j11_dtests_vnode35m 20s
  ✓ j11_jvm_dtests  23m 29s
  ✓ j11_jvm_dtests_latest_vnode 14m 40s
  ✓ j11_jvm_dtests_latest_vnode_repeat  47m 40s
  ✓ j11_jvm_dtests_repeat   41m 59s
  ✓ j11_simulator_dtests 5m 54s
  ✓ j11_unit_tests  19m 38s
  ✓ j11_unit_tests_repeat3m 26s
  ✓ j11_utests_latest   21m 24s
  ✓ j11_utests_latest_repeat 3m 46s
  ✓ j11_utests_oa   21m 15s
  ✓ j11_utests_oa_repeat 8m 28s
  ✓ j11_utests_system_keyspace_directory16m 17s
  ✓ j11_utests_system_keyspace_directory_repeat  3m 58s
  ✓ j17_cqlsh_dtests_py311   5m 56s
  ✓ j17_cqlsh_dtests_py311_vnode 6m 46s
  ✓ j17_cqlsh_dtests_py38 6m 9s
  ✓ j17_cqlsh_dtests_py38_vnode  6m 55s
  ✓ j17_cqlshlib_cython_tests7m 32s
  ✓ j17_cqlshlib_tests   6m 27s
  ✓ j17_dtests  33m 37s
  ✓ j17_dtests_vnode32m 31s
  ✓ j17_jvm_dtests  23m 16s
  ✓ j17_jvm_dtests_latest_vnode 13m 28s
  ✓ j17_jvm_dtests_latest_vnode_repeat  40m 42s
  ✓ j17_jvm_dtests_repeat   41m 28s
  ✓ j17_unit_tests  14m 40s
  ✓ j17_unit_tests_repeat0m 16s
  ✓ j17_utests_latest_repeat 0m 14s
  ✓ j17_utests_oa   15m 51s
  ✓ j17_utests_oa_repeat 7m 58s
  ✕ j11_dtests_latest35m 2s
  configuration_test.TestConfiguration test_change_durable_writes
  ✕ j17_dtests_latest   33m 55s
  configuration_test.TestConfiguration test_change_durable_writes
  ✕ j17_utests_latest   16m 38s
  org.apache.cassandra.cql3.validation.operations.SelectTest 
testCreatingUDFWithSameNameAsBuiltin_PrefersCompatibleArgs
  org.apache.cassandra.cql3.validation.operations.SelectTest 
testCreatingUDFWithSameNameAsBuiltin_FullyQualifiedFunctionNameWorks
java11_separate_tests
{noformat}

[java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4063/workflows/15b9eab1-70d5-4490-836a-49cc9169c2aa]

[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-22 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829791#comment-17829791
 ] 

Stefan Miklosovic commented on CASSANDRA-19477:
---

[CASSANDRA-19477-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19477-4.1]
{noformat}
java11_pre-commit_tests 
  ✓ j11_build1m 30s
  ✓ j11_cqlsh_dtests_py3 5m 28s
  ✓ j11_cqlsh_dtests_py311   5m 47s
  ✓ j11_cqlsh_dtests_py311_vnode  6m 4s
  ✓ j11_cqlsh_dtests_py385m 53s
  ✓ j11_cqlsh_dtests_py38_vnode   6m 2s
  ✓ j11_cqlsh_dtests_py3_vnode   5m 53s
  ✓ j11_cqlshlib_cython_tests 9m 1s
  ✓ j11_cqlshlib_tests8m 6s
  ✓ j11_dtests  33m 38s
  ✓ j11_dtests_vnode 36m 1s
  ✓ j11_jvm_dtests  15m 44s
  ✓ j11_jvm_dtests_repeat   38m 12s
  ✓ j11_jvm_dtests_vnode12m 39s
  ✓ j11_jvm_dtests_vnode_repeat 38m 46s
  ✓ j11_unit_tests_repeat0m 32s
  ✕ j11_unit_tests   8m 33s
  org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
{noformat}

[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4064/workflows/88c6d9da-308c-43a7-849b-a2b1a6b30307]


> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flamegraph.cpu.html
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize

2024-03-21 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829583#comment-17829583
 ] 

Stefan Miklosovic commented on CASSANDRA-19477:
---

the work and reviews are done, I just need to test this and Jon should 
perf-test it. Ideally this should be committed very early next week.

> Do not go to disk to get HintsStore.getTotalFileSize
> 
>
> Key: CASSANDRA-19477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints
>Reporter: Jon Haddad
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 4.1.x, 5.0-rc, 5.x
>
> Attachments: flamegraph.cpu.html
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When testing a cluster with more requests than it could handle, I noticed 
> significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what 
> I'm seeing from profiling:
> 10% of CPU time spent in HintsDescriptor.fileName which only does this:
>  
> {noformat}
> return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat}
> At a bare minimum here we should create this string up front with the host 
> and version and eliminate 2 of the 3 substitutions, but I think it's probably 
> faster to use a StringBuilder and avoid the underlying regular expression 
> altogether.
> 12% of the time is spent in org.apache.cassandra.io.util.File.length.  It 
> looks like this is called once for each hint file on disk for each host we're 
> hinting to.  In the case of an overloaded cluster, this is significant.  It 
> would be better if we were to track the file size in memory for each hint 
> file and reference that rather than go to the filesystem.
> These fairly small changes should make Cassandra more reliable when under 
> load spikes.
> CPU Flame graph attached.
> I only tested this in 4.1 but it looks like this is present up to trunk.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org