[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830401#comment-17830401 ] Stefan Miklosovic commented on CASSANDRA-19477: --- [CASSANDRA-19477-trunk|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19477-trunk] {noformat} java17_pre-commit_tests ✓ j17_build3m 57s ✓ j17_cqlsh_dtests_py3117m 2s ✓ j17_cqlsh_dtests_py311_vnode 7m 32s ✓ j17_cqlsh_dtests_py386m 50s ✓ j17_cqlsh_dtests_py38_vnode 7m 16s ✓ j17_cqlshlib_cython_tests7m 39s ✓ j17_cqlshlib_tests 6m 31s ✓ j17_dtests 34m 33s ✓ j17_dtests_vnode35m 10s ✓ j17_jvm_dtests_latest_vnode_repeat 26m 31s ✓ j17_jvm_dtests_repeat28m 7s ✓ j17_unit_tests 16m 26s ✓ j17_unit_tests_repeat0m 18s ✓ j17_utests_latest 13m 59s ✓ j17_utests_latest_repeat 0m 13s ✓ j17_utests_oa_repeat 0m 29s ✕ j17_dtests_latest 34m 36s offline_tools_test.TestOfflineTools test_sstablelevelreset offline_tools_test.TestOfflineTools test_sstableofflinerelevel configuration_test.TestConfiguration test_change_durable_writes configuration_test.TestConfiguration test_change_durable_writes ✕ j17_jvm_dtests 27m 59s org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest testEndpointVerificationEnabledIpNotInSAN TIMEOUTED ✕ j17_jvm_dtests_latest_vnode 22m 44s junit.framework.TestSuite org.apache.cassandra.fuzz.harry.integration.model.InJVMTokenAwareExecutorTest TIMEOUTED ✕ j17_utests_oa 13m 58s org.apache.cassandra.db.compaction.CompactionsBytemanTest testSSTableNotEnoughDiskSpaceForCompactionGetsDropped java17_separate_tests java11_pre-commit_tests ✓ j11_build7m 57s ✓ j11_cqlsh_dtests_py3117m 7s ✓ j11_cqlsh_dtests_py311_vnode10m 13s ✓ j11_cqlsh_dtests_py38 8m 1s ✓ j11_cqlsh_dtests_py38_vnode 10m 25s ✓ j11_cqlshlib_cython_tests7m 28s ✓ j11_cqlshlib_tests 9m 40s ✓ j11_dtests_vnode36m 58s ✓ j11_jvm_dtests_latest_vnode 25m 28s ✓ j11_jvm_dtests_latest_vnode_repeat 29m 22s ✓ j11_jvm_dtests_repeat28m 7s ✓ j11_unit_tests 15m 17s ✓ j11_unit_tests_repeat0m 30s ✓ j11_utests_latest 16m 56s ✓ j11_utests_latest_repeat 0m 34s ✓ j11_utests_oa 13m 58s ✓ j11_utests_oa_repeat 1m 0s ✓ j11_utests_system_keyspace_directory 18m 1s ✓ j11_utests_system_keyspace_directory_repeat 3m 39s ✓ j17_cqlsh_dtests_py3117m 6s ✓ j17_cqlsh_dtests_py311_vnode 7m 27s ✓ j17_cqlsh_dtests_py386m 51s ✓ j17_cqlsh_dtests_py38_vnode 7m 14s ✓ j17_cqlshlib_cython_tests7m 38s ✓ j17_cqlshlib_tests 6m 57s ✓ j17_dtests 32m 21s ✓ j17_dtests_vnode34m 24s ✓ j17_jvm_dtests_latest_vnode 22m 45s ✓ j17_jvm_dtests_latest_vnode_repeat 26m 32s ✓ j17_jvm_dtests_repeat 28m 21s ✓ j17_unit_tests_repeat0m 16s ✓ j17_utests_latest 15m 34s ✓ j17_utests_latest_repeat 0m 36s ✓ j17_utests_oa 13m 43s ✓ j17_utests_oa_repeat 0m 17s ✕ j11_dtests 37m 26s pushed_notifications_test.TestPushedNotifications test_move_single_node_localhost ✕ j11_dtests_latest 40m 40s bootstrap_test.TestBootstrap test_bootstrap_with_reset_bootstrap_state offline_tools_test.TestOfflineTools test_sstablelevelreset offline_tools_test.TestOfflineTools test_sstableofflinerelevel configuration_test.TestConfiguration test_change_durable_writes ✕ j11_jvm_dtests
[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830308#comment-17830308 ] Jon Haddad commented on CASSANDRA-19477: Here's some more fun graphs. Both read and write latency and load average, are significantly improved. !image-2024-03-24-18-16-50-370.png|width=645,height=205! !image-2024-03-24-18-20-07-734.png|width=723,height=229! !image-2024-03-24-18-17-48-334.png|width=653,height=210! > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png, > image-2024-03-24-18-16-50-370.png, image-2024-03-24-18-17-48-334.png, > image-2024-03-24-18-20-07-734.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830307#comment-17830307 ] Stefan Miklosovic commented on CASSANDRA-19477: --- nice! It was all the joind effort really, [~aleksey] helped me to improve and polish the idea so big kudos to him! > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830306#comment-17830306 ] Jon Haddad commented on CASSANDRA-19477: I've set up a 3 node cluster, loaded 15GB of data then took down a node and let hints accumulate. I switched one node to use the 4.1 patch branch above, and let the other node remain on release 4.1, then ran this: {noformat} easy-cass-stress run RandomPartitionAccess --workload.rows=1000 --rate 5k -d 2h -t 4{noformat} Here's the 4.1 release flame graph. [^flame-cassandra0-release-2024-03-25_00-16-44.html] StorageProxy.mutate is taking up 17% of CPU time, with shouldHint taking up almost 7% of CPU time. Here's the 4.1 + patch flame graph: [^flame-cassandra0-patched-2024-03-25_00-40-47.html] StorageProxy.mutate is only taking up 10% of CPU time now, with shouldHint taking up .26% of CPU time. You can see the below graph 172.31.36.176 is using less CPU overall. !image-2024-03-24-17-57-32-560.png|width=857,height=270! Here's the same setup with additional load. {noformat} easy-cass-stress run RandomPartitionAccess --workload.rows=1000 --rate 30k -d 2h -t 4{noformat} !image-2024-03-24-18-08-36-918.png|width=749,height=302! The improvement in this patch is fantastic, really nice work [~smiklosovic]. I'm +1 with regard to performance, but deferring to [~aleksey] to judge correctness. > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flame-cassandra0-patched-2024-03-25_00-40-47.html, > flame-cassandra0-release-2024-03-25_00-16-44.html, flamegraph.cpu.html, > image-2024-03-24-17-57-32-560.png, image-2024-03-24-18-08-36-918.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830036#comment-17830036 ] Stefan Miklosovic commented on CASSANDRA-19477: --- [CASSANDRA-19477-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19477-4.1] {noformat} java8_pre-commit_tests ✓ j8_build 3m 56s ✓ j8_cqlsh_dtests_py3 5m 12s ✓ j8_cqlsh_dtests_py3118m 35s ✓ j8_cqlsh_dtests_py311_vnode 9m 4s ✓ j8_cqlsh_dtests_py38 6m 47s ✓ j8_cqlsh_dtests_py38_vnode6m 8s ✓ j8_cqlsh_dtests_py3_vnode9m 18s ✓ j8_cqlshlib_cython_tests11m 55s ✓ j8_cqlshlib_tests 8m 6s ✓ j8_dtests 32m 16s ✓ j8_dtests_vnode 36m 21s ✓ j8_jvm_dtests 16m 18s ✓ j8_jvm_dtests_repeat41m 32s ✓ j8_jvm_dtests_vnode_repeat 41m 13s ✓ j8_simulator_dtests 2m 49s ✓ j8_unit_tests_repeat 3m 59s ✓ j8_utests_system_keyspace_directory_repeat 3m 45s ✓ j11_unit_tests_repeat0m 31s ✓ j11_jvm_dtests_vnode_repeat 38m 49s ✓ j11_jvm_dtests_vnode12m 37s ✓ j11_jvm_dtests_repeat39m 0s ✓ j11_jvm_dtests 16m 8s ✓ j11_dtests_vnode35m 43s ✓ j11_dtests 33m 55s ✓ j11_cqlshlib_tests 6m 15s ✓ j11_cqlshlib_cython_tests7m 10s ✓ j11_cqlsh_dtests_py3_vnode 5m 37s ✓ j11_cqlsh_dtests_py38_vnode 6m 10s ✓ j11_cqlsh_dtests_py385m 26s ✓ j11_cqlsh_dtests_py311_vnode 5m 46s ✓ j11_cqlsh_dtests_py311 5m 29s ✓ j11_cqlsh_dtests_py3 5m 20s ✕ j8_jvm_dtests_vnode 16m 54s org.apache.cassandra.distributed.test.GossipTest nodeDownDuringMove ✕ j8_unit_tests 11m 19s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] ✕ j8_utests_system_keyspace_directory 9m 35s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] ✕ j11_unit_tests8m 6s org.apache.cassandra.db.compaction.DateTieredCompactionStrategyTest testDropExpiredSSTables org.apache.cassandra.db.compaction.DateTieredCompactionStrategyTest testFilterOldSSTables org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] {noformat} [java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4065/workflows/3c368b8e-2cc7-4c78-afe3-62b45253e416] > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flamegraph.cpu.html > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by
[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829991#comment-17829991 ] Jon Haddad commented on CASSANDRA-19477: Awesome. I'll fire up a test with the 4.1 branch, since that's what I tested before, and post my findings. > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flamegraph.cpu.html > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829945#comment-17829945 ] Stefan Miklosovic commented on CASSANDRA-19477: --- [CASSANDRA-19477-5.0|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19477-5.0] {noformat} java17_pre-commit_tests ✓ j17_build3m 49s ✓ j17_cqlsh_dtests_py3116m 4s ✓ j17_cqlsh_dtests_py311_vnode 6m 13s ✓ j17_cqlsh_dtests_py38 6m 3s ✓ j17_cqlsh_dtests_py38_vnode 6m 20s ✓ j17_cqlshlib_cython_tests7m 25s ✓ j17_cqlshlib_tests 6m 27s ✓ j17_dtests 32m 3s ✓ j17_jvm_dtests 23m 21s ✓ j17_jvm_dtests_latest_vnode 14m 2s ✓ j17_jvm_dtests_latest_vnode_repeat 40m 42s ✓ j17_jvm_dtests_repeat 39m 52s ✓ j17_unit_tests 17m 44s ✓ j17_unit_tests_repeat2m 28s ✓ j17_utests_latest 15m 45s ✓ j17_utests_latest_repeat 2m 34s ✓ j17_utests_oa_repeat 0m 13s ✕ j17_dtests_latest 34m 34s configuration_test.TestConfiguration test_change_durable_writes ✕ j17_dtests_vnode 32m 7s ✕ j17_utests_oa 15m 57s org.apache.cassandra.net.ConnectionTest testTimeout java17_separate_tests java11_pre-commit_tests ✓ j11_build6m 59s ✓ j11_cqlsh_dtests_py311 9m 42s ✓ j11_cqlsh_dtests_py311_vnode 7m 41s ✓ j11_cqlsh_dtests_py388m 20s ✓ j11_cqlsh_dtests_py38_vnode 7m 59s ✓ j11_cqlshlib_cython_tests 11m 40s ✓ j11_cqlshlib_tests 9m 16s ✓ j11_dtests 38m 39s ✓ j11_dtests_vnode35m 20s ✓ j11_jvm_dtests 23m 29s ✓ j11_jvm_dtests_latest_vnode 14m 40s ✓ j11_jvm_dtests_latest_vnode_repeat 47m 40s ✓ j11_jvm_dtests_repeat 41m 59s ✓ j11_simulator_dtests 5m 54s ✓ j11_unit_tests 19m 38s ✓ j11_unit_tests_repeat3m 26s ✓ j11_utests_latest 21m 24s ✓ j11_utests_latest_repeat 3m 46s ✓ j11_utests_oa 21m 15s ✓ j11_utests_oa_repeat 8m 28s ✓ j11_utests_system_keyspace_directory16m 17s ✓ j11_utests_system_keyspace_directory_repeat 3m 58s ✓ j17_cqlsh_dtests_py311 5m 56s ✓ j17_cqlsh_dtests_py311_vnode 6m 46s ✓ j17_cqlsh_dtests_py38 6m 9s ✓ j17_cqlsh_dtests_py38_vnode 6m 55s ✓ j17_cqlshlib_cython_tests7m 32s ✓ j17_cqlshlib_tests 6m 27s ✓ j17_dtests 33m 37s ✓ j17_dtests_vnode32m 31s ✓ j17_jvm_dtests 23m 16s ✓ j17_jvm_dtests_latest_vnode 13m 28s ✓ j17_jvm_dtests_latest_vnode_repeat 40m 42s ✓ j17_jvm_dtests_repeat 41m 28s ✓ j17_unit_tests 14m 40s ✓ j17_unit_tests_repeat0m 16s ✓ j17_utests_latest_repeat 0m 14s ✓ j17_utests_oa 15m 51s ✓ j17_utests_oa_repeat 7m 58s ✕ j11_dtests_latest35m 2s configuration_test.TestConfiguration test_change_durable_writes ✕ j17_dtests_latest 33m 55s configuration_test.TestConfiguration test_change_durable_writes ✕ j17_utests_latest 16m 38s org.apache.cassandra.cql3.validation.operations.SelectTest testCreatingUDFWithSameNameAsBuiltin_PrefersCompatibleArgs org.apache.cassandra.cql3.validation.operations.SelectTest testCreatingUDFWithSameNameAsBuiltin_FullyQualifiedFunctionNameWorks java11_separate_tests {noformat} [java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4063/workflows/15b9eab1-70d5-4490-836a-49cc9169c2aa]
[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829791#comment-17829791 ] Stefan Miklosovic commented on CASSANDRA-19477: --- [CASSANDRA-19477-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19477-4.1] {noformat} java11_pre-commit_tests ✓ j11_build1m 30s ✓ j11_cqlsh_dtests_py3 5m 28s ✓ j11_cqlsh_dtests_py311 5m 47s ✓ j11_cqlsh_dtests_py311_vnode 6m 4s ✓ j11_cqlsh_dtests_py385m 53s ✓ j11_cqlsh_dtests_py38_vnode 6m 2s ✓ j11_cqlsh_dtests_py3_vnode 5m 53s ✓ j11_cqlshlib_cython_tests 9m 1s ✓ j11_cqlshlib_tests8m 6s ✓ j11_dtests 33m 38s ✓ j11_dtests_vnode 36m 1s ✓ j11_jvm_dtests 15m 44s ✓ j11_jvm_dtests_repeat 38m 12s ✓ j11_jvm_dtests_vnode12m 39s ✓ j11_jvm_dtests_vnode_repeat 38m 46s ✓ j11_unit_tests_repeat0m 32s ✕ j11_unit_tests 8m 33s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] {noformat} [java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4064/workflows/88c6d9da-308c-43a7-849b-a2b1a6b30307] > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flamegraph.cpu.html > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19477) Do not go to disk to get HintsStore.getTotalFileSize
[ https://issues.apache.org/jira/browse/CASSANDRA-19477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829583#comment-17829583 ] Stefan Miklosovic commented on CASSANDRA-19477: --- the work and reviews are done, I just need to test this and Jon should perf-test it. Ideally this should be committed very early next week. > Do not go to disk to get HintsStore.getTotalFileSize > > > Key: CASSANDRA-19477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19477 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Hints >Reporter: Jon Haddad >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: flamegraph.cpu.html > > Time Spent: 4h 10m > Remaining Estimate: 0h > > When testing a cluster with more requests than it could handle, I noticed > significant CPU time (25%) spent in HintsStore.getTotalFileSize. Here's what > I'm seeing from profiling: > 10% of CPU time spent in HintsDescriptor.fileName which only does this: > > {noformat} > return String.format("%s-%s-%s.hints", hostId, timestamp, version);{noformat} > At a bare minimum here we should create this string up front with the host > and version and eliminate 2 of the 3 substitutions, but I think it's probably > faster to use a StringBuilder and avoid the underlying regular expression > altogether. > 12% of the time is spent in org.apache.cassandra.io.util.File.length. It > looks like this is called once for each hint file on disk for each host we're > hinting to. In the case of an overloaded cluster, this is significant. It > would be better if we were to track the file size in memory for each hint > file and reference that rather than go to the filesystem. > These fairly small changes should make Cassandra more reliable when under > load spikes. > CPU Flame graph attached. > I only tested this in 4.1 but it looks like this is present up to trunk. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org