Re: Switching to Incremental Repair

Bowen Song via user Wed, 07 Feb 2024 05:10:41 -0800

Caution, using the method you described, the amount of data streamed atthe end with the full repair is not the amount of data written betweenstopping the first node and the last node, but depends on the tablesize, the number of partitions written, their distribution in the ringand the 'repair_session_space' value. If the table is large, the writestouch a large number of partitions scattered across the token ring, andthe value of 'repair_session_space' is small, you may end up with a veryexpensive over-streaming.


On 07/02/2024 12:33, Sebastian Marsching wrote:

Full repair running for an entire week sounds excessively long. Evenif you've got 1 TB of data per node, 1 week means the repair speed isless than 2 MB/s, that's very slow. Perhaps you should focus onfinding the bottleneck of the full repair speed and work on that instead.
We store about 3–3.5 TB per node on spinning disks (time-series data),so I don’t think it is too surprising.
Not disabling auto-compaction may result in repaired SSTables gettingcompacted together with unrepaired SSTables before the repair stateis set on them, which leads to mismatch in the repaired data betweennodes, and potentially very expensive over-streaming in a future fullrepair. You should follow the documented and tested steps and notimprovise or getting creative if you value your data and time.
There is a different method that we successfully used on threeclusters, but I agree that anti-entropy repair is a tricky businessand one should be cautious with trying less tested methods.
Due to the long time for a full repair (see my earlier explanation),disabling anticompaction while running the full repair wasn’t anoption for us. It was previously suggested that one could run therepair per node instead of the full cluster, but I don’t think thatthis will work, because only marking the SSTables on a single node asrepaired would lead to massive overstreaming when running the fullrepair for the next node that shares data with the first one.
So, I want to describe the method that we used, just in case someoneis in the same situation:
Going around the ring, we temporarily stopped each node and marked allof its SSTables as repaired. Then we immediately ran a full repair, sothat any inconsistencies in the data that was now marked as repairedbut not actually repaired were fixed.
Using this approach, the amount over over-streaming is minimal (atleast for not too large clusters, where the rolling restart can bedone in less than an hour or so), because the only difference betweenthe “unrepaired” SSTables on the different nodes will be the data thatwas written before stopping the first node and stopping the last node.
Any inconsistencies that might exist in the SSTables that were markedas repaired should be caught in the full repair, so I do not think itis too dangerous either. However, I agree that for clusters where afull repair is quick (e.g. finishes in a few hours), using thewell-tested and frequently used approach is probably better.

Re: Switching to Incremental Repair

Reply via email to