Re: High frequency garbage collection leading to High load average

2022-03-09 Thread Bowen Song
It sounds like you either have hot partition(s) or hardware issue on 
that node. I;m mentioning hardware issue because I had a server with 
faulty CPU fan and the CPU on it overheats and causes frequency 
throttling, the result is a single server with much higher load than the 
rest of the nodes in the cluster, and the symptom looks very similar to 
hot partitions.


To answer your questions:

1. Slow queries can be a cause of GC pressure, a result of GC pressure, 
or both. In my experience, it more often the GC pressure leads to slow 
queries than the other way around.


2. This is very suspicious. I can think of a few causes, such as hot 
partitions + token aware load balancing, client that only connects to a 
single node in the cluster, heavy streaming activities within some token 
ranges, bad retry policies, etc., and it's pretty hard to know what 
exactly has happened without digging too deep into it. If this happens 
frequently, you should properly investigate and fix it. This certainly 
can lead to higher GC pressure on the affected node.


3. Just higher cache hits but the overall number of queries did not 
change much, or even gone down? That would be an indicator of bad retry 
policies. More cache hits alone won't cause much GC pressure, but the 
underlaying issue lead to the higher cache hits may.



On 08/03/2022 21:44, Inquistive allen wrote:

Hello team,

On a given day , a node in 27 node cluster observed higher frequency 
of garbage collection. Mostly young gc.


I have found below issues:
1. Higher number of slow queries being observed on that particular 
node for that particular day compared to other days


2. Higher outgoing traffic observed from the node , 10 times the 
average outbound traffic on that particular day


3. Higher number of cache requests hitting the key cache and chunk 
cache that other days on the particular node


The cluster has large partition warning as well.

My query is, which of the above is a likely cause of higher frequency 
of GC leading to High load average on the system.


Re: High frequency garbage collection leading to High load average

2022-03-08 Thread Paulo Motta
All these symptoms indicate a potential hotspot in this replica, which can
be caused by one or likely multiple "hot" partitions. Finding out which
particular partition(s) is responsible for this is tricky, but good
candidates are the ones mentioned in the log warning.

Ideally you should fix your data model to avoid large partitions and
hotspots, keep your partitions under 100MB. There are some bucketing
techniques available to reduce partition sizes.

Em ter., 8 de mar. de 2022 às 18:44, Inquistive allen 
escreveu:

> Hello team,
>
> On a given day , a node in 27 node cluster observed higher frequency of
> garbage collection. Mostly young gc.
>
> I have found below issues:
> 1. Higher number of slow queries being observed on that particular node
> for that particular day compared to other days
>
> 2. Higher outgoing traffic observed from the node , 10 times the average
> outbound traffic on that particular day
>
> 3. Higher number of cache requests hitting the key cache and chunk cache
> that other days on the particular node
>
> The cluster has large partition warning as well.
>
> My query is, which of the above is a likely cause of higher frequency of
> GC leading to High load average on the system.
>


High frequency garbage collection leading to High load average

2022-03-08 Thread Inquistive allen
Hello team,

On a given day , a node in 27 node cluster observed higher frequency of
garbage collection. Mostly young gc.

I have found below issues:
1. Higher number of slow queries being observed on that particular node for
that particular day compared to other days

2. Higher outgoing traffic observed from the node , 10 times the average
outbound traffic on that particular day

3. Higher number of cache requests hitting the key cache and chunk cache
that other days on the particular node

The cluster has large partition warning as well.

My query is, which of the above is a likely cause of higher frequency of GC
leading to High load average on the system.