Mutation dropped and Read-Repair performance issue

sunil pawar Sat, 19 Dec 2020 18:59:50 -0800

Hi All,

We are facing problems of failure of Read-Repair stages with error Digest
Mismatch and count is 300+ per day per node.
At the same time, we are experiencing node is getting overloaded for a
quick couple of seconds due to long GC pauses (of around 7-8 seconds). We
are not running a repair on regular basis as a maintenance activity owing
to the node is going down whenever we are running repair for the tables.
After running the repair node is going down due to long GC pauses again.
Except for one table for all other tables, we can run the repair with
option  --in-local-dc. Below is the configuration of the cluster:


   1. 15 node cluster.
   2. RF=3
   3. Xmx and Xms 31GB.
   4. G1GC algorithm is in use.
   5. Version 3.11.2
   6. Load on each node roughly around 500GB
   7. One table is having a maximum amount of load compared to other tables.

Please suggest if there are any configuration level changes which we can do
to avoid the above problems. Getting too many digest mismatch messages is a
sign of node is doing more read and write operations compared to without
those messages and it can be the cause of node is getting overloaded for
that particular moment?

-- 
Thanks,
S.R.

Mutation dropped and Read-Repair performance issue

Reply via email to