Hi All, We are facing problems of failure of Read-Repair stages with error Digest Mismatch and count is 300+ per day per node. At the same time, we are experiencing node is getting overloaded for a quick couple of seconds due to long GC pauses (of around 7-8 seconds). We are not running a repair on regular basis as a maintenance activity owing to the node is going down whenever we are running repair for the tables. After running the repair node is going down due to long GC pauses again. Except for one table for all other tables, we can run the repair with option --in-local-dc. Below is the configuration of the cluster:
1. 15 node cluster. 2. RF=3 3. Xmx and Xms 31GB. 4. G1GC algorithm is in use. 5. Version 3.11.2 6. Load on each node roughly around 500GB 7. One table is having a maximum amount of load compared to other tables. Please suggest if there are any configuration level changes which we can do to avoid the above problems. Getting too many digest mismatch messages is a sign of node is doing more read and write operations compared to without those messages and it can be the cause of node is getting overloaded for that particular moment? -- Thanks, S.R.