Hello.

I had the same issues on full repair. I've checked on various GC
settings, the most performant is ZGC on Java 11, but I had some
stability issues. I left G1GC settings from 3.11.x and got the same
issues as yours: CPU load over 90 %, and growing count of open file
descriptors (up to max allowed). It looks like the repair job doesn't
finish repairs on segments and waits when all segments get repaired.
so the job itself takes more and more resources until the node may
become irresponsible or all segments will be repaired.

Here my GC settings (Cassandra 4.0.2):

G1GC:

-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:MaxGCPauseMillis=500
-XX:InitiatingHeapOccupancyPercent=70
-XX:ParallelGCThreads=16
-XX:ConcGCThreads=16

ZGC:

-XX:+UnlockExperimentalVMOptions
-XX:+UseZGC
-XX:ConcGCThreads=8
-XX:ParallelGCThreads=8
-XX:+UseTransparentHugePages


ZGC works more effectively on repair tasks (it runs way faster, and
doesn't overuse much system resources), but I get random crushes on
various nodes, so I can't use it as production ready.


пн, 13 июн. 2022 г. в 15:26, onmstester onmstester <onmstes...@zoho.com>:

>
> Hi
>
> I've been testing Cassandra 4.0.3 and when i run rull repair (on a single 
> table), all of bandwidth of my 1G link would be saturated (also CPU became > 
> 80% and disk util is 100%), stream_throughput been set to 200 Mb but not 
> affecting repair, all other configs are default and i could not find any 
> other configuration related to limiting the throughput of repair.
>
> IMHO, having a node with saturated resources would make the whole cluster's 
> response time be slow.
> Any workaround for this? Is this some sort of bug?
>
> Best Regards
>
>


--
>From Siberia with Love!

Reply via email to