RE: Questions about high read latency and related metrics
Hello C* community, I have been experimenting a bit with my lab node. I am assuming the following as observing the progress of metrics over time: 1. EstimatedPartitionSizeHistogram metric derives from READ operations. Cassandra reports values to this metric as it serves Read queries. 2. PartitionSize derives from the Compaction activities. Cassandra reports values to this metric as it performs the compaction of sstables. I am not sure whether those assumptions are valid but at least provide a good explanation to the progress of the stats observed. Thanks a lot and CU on the next topic. BR MK From: Michail Kotsiouros via user Sent: Thursday, May 11, 2023 14:08 To: user@cassandra.apache.org Subject: RE: Questions about high read latency and related metrics Hello Erick, No Max/Min/Mean vs Histogram difference is clear. What confuses me is the description of those metrics: Size of the compacted partition (in bytes). Vs estimated partition size. I am after what is measured by each metric. To be more specific: What metric should be consider when we want to see the partition size over time? Does this “compacted partition” means that only the partitions which have undergone a compaction in the respective sstables are taken into account for PartitionSize metrics? What “estimated” means in the EstimatedPartitionSizeHistogram metric? Excuse me if those questions sound trivial. BR MK From: Erick Ramirez mailto:erickramire...@apache.org>> Sent: Thursday, May 11, 2023 13:16 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>; Michail Kotsiouros mailto:michail.kotsiou...@ericsson.com>> Subject: Re: Questions about high read latency and related metrics Is it the concept of histograms that's not clear? Something else?
RE: Questions about high read latency and related metrics
Hello Erick, No Max/Min/Mean vs Histogram difference is clear. What confuses me is the description of those metrics: Size of the compacted partition (in bytes). Vs estimated partition size. I am after what is measured by each metric. To be more specific: What metric should be consider when we want to see the partition size over time? Does this “compacted partition” means that only the partitions which have undergone a compaction in the respective sstables are taken into account for PartitionSize metrics? What “estimated” means in the EstimatedPartitionSizeHistogram metric? Excuse me if those questions sound trivial. BR MK From: Erick Ramirez Sent: Thursday, May 11, 2023 13:16 To: user@cassandra.apache.org; Michail Kotsiouros Subject: Re: Questions about high read latency and related metrics Is it the concept of histograms that's not clear? Something else?
RE: Questions about high read latency and related metrics
Hello Erick, Thanks a lot for the immediate reply but still the difference between those 2 metrics is not clear to me. BR MK From: Erick Ramirez Sent: Thursday, May 11, 2023 13:04 To: user@cassandra.apache.org Subject: Re: Questions about high read latency and related metrics The min/max/mean partition sizes are the sizes in bytes which are the same statistics reported by nodetool tablestats. EstimatedPartitionSizeHistogram is the distribution of partition sizes within specified ranges (percentiles) and is the same histogram reported by nodetool tablehistograms (in the Partition Size column). Cheers!
Questions about high read latency and related metrics
Hello Cassandra community, I see the following metrics in JMX Metric Name org.apache.cassandra.metrics.Table... MinPartitionSize Gauge Size of the smallest compacted partition (in bytes). MaxPartitionSize Gauge Size of the largest compacted partition (in bytes). MeanPartitionSize Gauge Size of the average compacted partition (in bytes). And EstimatedPartitionSizeHistogram Gauge Histogram of estimated partition size (in bytes). Could you, please, help me clarify the difference of those 2 metrics. We suspect that the increasing partition size by the application data model has an impact on Read latency. What would be the appropriate metric to monitor from PartitionSize and EstimatedPartitionSizeHistogram. BR Michail Kotisouros
RE: Cassandra GC tuning
Hello everyone, Sorry for not responding earlier. The GC observed was indeed a symptom. The CPU spike and the slow Cassandra node responses was due to a massive connection of client processes. Most probably, this caused the GC as well. The guides shared have a lot of interesting points, though that are useful in optimizing the Cassandra performance in general. Thanks a lot once more for your comments and suggestions. BR MK From: Jeff Jirsa Sent: Tuesday, September 20, 2022 16:04 To: user@cassandra.apache.org Subject: Re: Cassandra GC tuning Beyond this there are two decent tuning sets, but relatively dated at this point Cassandra-8150 proposed a number of changes to defaults based on how it had been tuned at a specific large (competent) user: ASF JIRA<https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150> issues.apache.org<https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150> [favicon.ico]<https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8150> Any Tobey wrote this guide around the 2.0/2.1 era, so it assumes things like jdk8 / CMS, but still has more rigor than most other guides you’ll find elsewhere and may help identify what’s going on even if the specific tuning isn’t super relevant in all cases: Amy's Cassandra 2.1 tuning guide : Amy Writes<https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html> tobert.github.io<https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html> [favicon.ico]<https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html> On Sep 20, 2022, at 5:27 AM, Michail Kotsiouros via user mailto:user@cassandra.apache.org>> wrote: Hello community, BTW I am using Cassandra 3.11.4. From your comments, I understand that a CPU spike and maybe a long GC may be expected at the snapshot creation under specific circumstances. I will monitor the resources during snapshot creation. I will come back with more news. Thanks a lot for your valuable input. BR MK From: Jeff Jirsa mailto:jji...@gmail.com>> Sent: Monday, September 19, 2022 20:06 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>; Michail Kotsiouros mailto:michail.kotsiou...@ericsson.com>> Subject: Re: Cassandra GC tuning https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may find that tuning those thresholds On Mon, Sep 19, 2022 at 9:50 AM Jeff Jirsa mailto:jji...@gmail.com>> wrote: Snapshots are probably actually caused by a spike in disk IO and disk latency, not GC (you'll see longer STW pauses as you get to a safepoint if that disk is hanging). This is especially problematic on SATA SSDs, or nVME SSDs with poor IO scheduler tuning. There's a patch somewhere to throttle hardlinks to try to mitigate this. On Mon, Sep 19, 2022 at 3:45 AM Michail Kotsiouros via user mailto:user@cassandra.apache.org>> wrote: Hello community, I observe some GC pauses while trying to create snapshots of a keyspace. The GC pauses as such are not long, even though they are reported in logs. The problem is the CPU utilization which affects other applications deployed in my server. Do you have any articles or recommendations about tuning GC in Cassandra? Thank you in advance. BR MK
RE: Log messages "MUTATION messages were dropped" in system.log
Hello community, I came across the following post: https://lists.apache.org/thread/t94jjc4j8dfd9sg5of9283mqcy3y3bgq This gives me a good explanation of the log. I will try to check the storage of the problematic datacenters. I will coma back if I have any good findings. BR MK From: Michail Kotsiouros via user Sent: Tuesday, September 27, 2022 17:36 To: user@cassandra.apache.org Subject: Log messages "MUTATION messages were dropped" in system.log Hello Cassandra community, I see the following log lines in the Cassandra nodes of 2 datacenters that participate in a cluster of 6 datacenters. INFO [ScheduledTasks:1] 2022-09-12 17:19:44,422 MessagingService.java:1236 - MUTATION messages were dropped in last 5000 ms: 0 internal and 2 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 9140 ms<https://protect2.fireeye.com/v1/url?k=31323334-501cfaf3-313273af-454445554331-29833d79f9680915=1=25113adf-a158-4321-a644-3c79d2792963=http%3A%2F%2Fmessagingservice.java%3A1236%2F> >From the information I have found online, I see that those logs indicate a >resource issue on the Cassandra nodes. I am trying to troubleshoot what could >be the bottleneck. Can you, please, help me interpret the log line. So far I know that MUTATION means that it is write operations for already existing keys. What internal means? Does it indicate that the write operation is originated from a coordinator node residing in the same datacenter? What cross node means? Does it indicate that the write operation is originated from a coordinator node residing in another datacenter? What dropped latency means? Is it the time that the message spent in some queue before dropped? I would like to know that so as to understand the impact observed on client application. Thank you in advance!!! I find really useful all the information found in different posts!!! BR Michail
Log messages "MUTATION messages were dropped" in system.log
Hello Cassandra community, I see the following log lines in the Cassandra nodes of 2 datacenters that participate in a cluster of 6 datacenters. INFO [ScheduledTasks:1] 2022-09-12 17:19:44,422 MessagingService.java:1236 - MUTATION messages were dropped in last 5000 ms: 0 internal and 2 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 9140 ms >From the information I have found online, I see that those logs indicate a >resource issue on the Cassandra nodes. I am trying to troubleshoot what could >be the bottleneck. Can you, please, help me interpret the log line. So far I know that MUTATION means that it is write operations for already existing keys. What internal means? Does it indicate that the write operation is originated from a coordinator node residing in the same datacenter? What cross node means? Does it indicate that the write operation is originated from a coordinator node residing in another datacenter? What dropped latency means? Is it the time that the message spent in some queue before dropped? I would like to know that so as to understand the impact observed on client application. Thank you in advance!!! I find really useful all the information found in different posts!!! BR Michail
RE: Cassandra GC tuning
Hello community, BTW I am using Cassandra 3.11.4. From your comments, I understand that a CPU spike and maybe a long GC may be expected at the snapshot creation under specific circumstances. I will monitor the resources during snapshot creation. I will come back with more news. Thanks a lot for your valuable input. BR MK From: Jeff Jirsa Sent: Monday, September 19, 2022 20:06 To: user@cassandra.apache.org; Michail Kotsiouros Subject: Re: Cassandra GC tuning https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may find that tuning those thresholds On Mon, Sep 19, 2022 at 9:50 AM Jeff Jirsa mailto:jji...@gmail.com>> wrote: Snapshots are probably actually caused by a spike in disk IO and disk latency, not GC (you'll see longer STW pauses as you get to a safepoint if that disk is hanging). This is especially problematic on SATA SSDs, or nVME SSDs with poor IO scheduler tuning. There's a patch somewhere to throttle hardlinks to try to mitigate this. On Mon, Sep 19, 2022 at 3:45 AM Michail Kotsiouros via user mailto:user@cassandra.apache.org>> wrote: Hello community, I observe some GC pauses while trying to create snapshots of a keyspace. The GC pauses as such are not long, even though they are reported in logs. The problem is the CPU utilization which affects other applications deployed in my server. Do you have any articles or recommendations about tuning GC in Cassandra? Thank you in advance. BR MK
Cassandra GC tuning
Hello community, I observe some GC pauses while trying to create snapshots of a keyspace. The GC pauses as such are not long, even though they are reported in logs. The problem is the CPU utilization which affects other applications deployed in my server. Do you have any articles or recommendations about tuning GC in Cassandra? Thank you in advance. BR MK
RE: Strange Garbage Collections results in service impact
Hello Cassandra community, I have restarted one of the Cassandra nodes that the issue was observed and the GC has stopped being invoked so frequently and causing the service impact. The rest of the Cassandra nodes in the same datacenter that were not restarted still have this issue. What is more, the max Heap memory reported by nodetool info has been stabilized in the restarted Cassandra node. Now it reports the same value as the Cassandra nodes in the rest operating datacenters. Again, the Cassandra nodes in the problematic datacenter that have not been restarted still report different max Heap memory in the nodetool info printout as monitored every 5 minutes for an hour. >From this observation, I get that restarting of the Cassandra nodes is the way >forward to resolve the frequent GC issue. What is more, the changing max Heap >memory reported by nodetool info from the Cassandra nodes affected seems to be >related with the root cause of the problem. Does this ring a bell? Have anyone observed such behavior before? What could have caused it? Could it be something in the JVM? BR MK From: Michail Kotsiouros via user Sent: Friday, July 1, 2022 11:01 To: user@cassandra.apache.org Subject: Strange Garbage Collections results in service impact Hello Cassandra community, I am new in the list, so no experience in posting questions. We have Cassandra deployed in our product. In one of the production nodes I see the following. The Cassandra nodes belonging to a specific datacenter have frequent Garbage collection impacting the system performance. A lot of queries to the Cassandra nodes timeout during the Garbage Collection. I know that this is expected. However, I cannot figure out why we have such frequent Garbagge collections. The production node is now isolated and does not serve any load and the frequent GCs are still visible. The other strange thing is that the cluster consists of 3 datacenters. The other 2 datacenters deployed do not show the same behavior and they serve traffic as well. The deployment of the cassandra nodes is uniform for all datacenters. What is more, the OS and HW is also the same. The logs that indicate the GC are the following (processed with awk to give more human readable output): 2022-06-28 08:21:29,538 481ms. Old Gen: 5.30356GB -> 417.736MB Survivor Space: 63.6879MB -> 0 2022-06-28 09:28:04,229 627ms. Old Gen: 5.27757GB -> 548.777MB Survivor Space: 55.0002MB -> 0 2022-06-28 10:29:55,439 419ms. Old Gen: 5.29753GB -> 363.493MB Survivor Space: 79.1411MB -> 0 2022-06-28 11:28:35,382 530ms. Old Gen: 5.28328GB -> 473.46MB Survivor Space: 60.6253MB -> 0 2022-06-28 12:22:57,251 658ms. Old Gen: 5.26409GB -> 570.007MB Survivor Space: 63.3597MB -> 0 2022-06-28 13:20:46,893 491ms. Old Gen: 5.26584GB -> 414.906MB Survivor Space: 64.5628MB -> 0 2022-06-28 14:21:17,507 511ms. Old Gen: 5.26839GB -> 449.541MB Survivor Space: 66.2503MB -> 0 2022-06-28 15:21:54,272 562ms. Old Gen: 5.30167GB -> 493.219MB Survivor Space: 69.8441MB -> 0 2022-06-28 16:17:44,517 632ms. Old Gen: 5.23687GB -> 569.884MB Survivor Space: 65.2972MB -> 0 2022-06-28 17:08:39,581 593ms. Old Gen: 5.31225GB -> 503.452MB Survivor Space: 9.15627MB -> 0 2022-06-28 18:00:14,865 474ms. Old Gen: 5.25287GB -> 405.377MB Survivor Space: 66.719MB -> 0 2022-06-28 18:52:39,165 643ms. Old Gen: 5.28023GB -> 559.397MB Survivor Space: 61.1409MB -> 0 2022-06-28 19:52:31,017 598ms. Old Gen: 5.30456GB -> 501.958MB Survivor Space: 60.0472MB -> 0 2022-06-28 21:06:59,596 505ms. Old Gen: 5.29862GB -> 445.448MB Survivor Space: 51.5159MB -> 0 2022-06-28 22:43:05,232 478ms. Old Gen: 5.29055GB -> 425.308MB Survivor Space: 39.1877MB -> 0 One more strange observation is that the maximum HEAP as printed by nodetool info command keeps changing in the problematic cassandra nodes even though the Cassandra process does not restart. Here is some monitoring of the output of nodetool info command in one of the problematic Cassandra nodes every 5 min: Thu Jun 23 14:27:16 CEST 2022 Heap Memory (MB) : 3581.64 / 8078.50 Thu Jun 23 14:32:18 CEST 2022 Heap Memory (MB) : 4308.25 / 8089.50 Thu Jun 23 14:37:21 CEST 2022 Heap Memory (MB) : 5024.90 / 8095.00 Thu Jun 23 14:42:24 CEST 2022 Heap Memory (MB) : 6270.35 / 8085.00 Thu Jun 23 14:47:26 CEST 2022 Heap Memory (MB) : 6843.82 / 8091.00 Thu Jun 23 14:52:29 CEST 2022 Heap Memory (MB) : 881.51 / 8086.00 Thu Jun 23 14:57:32 CEST 2022 Heap Memory (MB) : 1699.78 / 8093.50 Thu Jun 23 15:02:34 CEST 2022 Heap Memory (MB) : 2868.04 / 8089.00 Thu Jun 23 15:07:37 CEST 2022 Heap Memory (MB) : 4719.43 / 8067.00 This observation does not hold for the normally operating Cassandra nodes. Could this be related? Can you, please, help me with the investigation of this issue? I feel that there is something special in the system hosting Cassan
Strange Garbage Collections results in service impact
Hello Cassandra community, I am new in the list, so no experience in posting questions. We have Cassandra deployed in our product. In one of the production nodes I see the following. The Cassandra nodes belonging to a specific datacenter have frequent Garbage collection impacting the system performance. A lot of queries to the Cassandra nodes timeout during the Garbage Collection. I know that this is expected. However, I cannot figure out why we have such frequent Garbagge collections. The production node is now isolated and does not serve any load and the frequent GCs are still visible. The other strange thing is that the cluster consists of 3 datacenters. The other 2 datacenters deployed do not show the same behavior and they serve traffic as well. The deployment of the cassandra nodes is uniform for all datacenters. What is more, the OS and HW is also the same. The logs that indicate the GC are the following (processed with awk to give more human readable output): 2022-06-28 08:21:29,538 481ms. Old Gen: 5.30356GB -> 417.736MB Survivor Space: 63.6879MB -> 0 2022-06-28 09:28:04,229 627ms. Old Gen: 5.27757GB -> 548.777MB Survivor Space: 55.0002MB -> 0 2022-06-28 10:29:55,439 419ms. Old Gen: 5.29753GB -> 363.493MB Survivor Space: 79.1411MB -> 0 2022-06-28 11:28:35,382 530ms. Old Gen: 5.28328GB -> 473.46MB Survivor Space: 60.6253MB -> 0 2022-06-28 12:22:57,251 658ms. Old Gen: 5.26409GB -> 570.007MB Survivor Space: 63.3597MB -> 0 2022-06-28 13:20:46,893 491ms. Old Gen: 5.26584GB -> 414.906MB Survivor Space: 64.5628MB -> 0 2022-06-28 14:21:17,507 511ms. Old Gen: 5.26839GB -> 449.541MB Survivor Space: 66.2503MB -> 0 2022-06-28 15:21:54,272 562ms. Old Gen: 5.30167GB -> 493.219MB Survivor Space: 69.8441MB -> 0 2022-06-28 16:17:44,517 632ms. Old Gen: 5.23687GB -> 569.884MB Survivor Space: 65.2972MB -> 0 2022-06-28 17:08:39,581 593ms. Old Gen: 5.31225GB -> 503.452MB Survivor Space: 9.15627MB -> 0 2022-06-28 18:00:14,865 474ms. Old Gen: 5.25287GB -> 405.377MB Survivor Space: 66.719MB -> 0 2022-06-28 18:52:39,165 643ms. Old Gen: 5.28023GB -> 559.397MB Survivor Space: 61.1409MB -> 0 2022-06-28 19:52:31,017 598ms. Old Gen: 5.30456GB -> 501.958MB Survivor Space: 60.0472MB -> 0 2022-06-28 21:06:59,596 505ms. Old Gen: 5.29862GB -> 445.448MB Survivor Space: 51.5159MB -> 0 2022-06-28 22:43:05,232 478ms. Old Gen: 5.29055GB -> 425.308MB Survivor Space: 39.1877MB -> 0 One more strange observation is that the maximum HEAP as printed by nodetool info command keeps changing in the problematic cassandra nodes even though the Cassandra process does not restart. Here is some monitoring of the output of nodetool info command in one of the problematic Cassandra nodes every 5 min: Thu Jun 23 14:27:16 CEST 2022 Heap Memory (MB) : 3581.64 / 8078.50 Thu Jun 23 14:32:18 CEST 2022 Heap Memory (MB) : 4308.25 / 8089.50 Thu Jun 23 14:37:21 CEST 2022 Heap Memory (MB) : 5024.90 / 8095.00 Thu Jun 23 14:42:24 CEST 2022 Heap Memory (MB) : 6270.35 / 8085.00 Thu Jun 23 14:47:26 CEST 2022 Heap Memory (MB) : 6843.82 / 8091.00 Thu Jun 23 14:52:29 CEST 2022 Heap Memory (MB) : 881.51 / 8086.00 Thu Jun 23 14:57:32 CEST 2022 Heap Memory (MB) : 1699.78 / 8093.50 Thu Jun 23 15:02:34 CEST 2022 Heap Memory (MB) : 2868.04 / 8089.00 Thu Jun 23 15:07:37 CEST 2022 Heap Memory (MB) : 4719.43 / 8067.00 This observation does not hold for the normally operating Cassandra nodes. Could this be related? Can you, please, help me with the investigation of this issue? I feel that there is something special in the system hosting Cassandra in the problematic datacenter and it is not a matter of calibrating the Heap memory configuration. BR Michail Kotsiouros