> Full repair running for an entire week sounds excessively long. Even if 
> you've got 1 TB of data per node, 1 week means the repair speed is less than 
> 2 MB/s, that's very slow. Perhaps you should focus on finding the bottleneck 
> of the full repair speed and work on that instead.

We store about 3–3.5 TB per node on spinning disks (time-series data), so I 
don’t think it is too surprising.
> Not disabling auto-compaction may result in repaired SSTables getting 
> compacted together with unrepaired SSTables before the repair state is set on 
> them, which leads to mismatch in the repaired data between nodes, and 
> potentially very expensive over-streaming in a future full repair. You should 
> follow the documented and tested steps and not improvise or getting creative 
> if you value your data and time.
> 
There is a different method that we successfully used on three clusters, but I 
agree that anti-entropy repair is a tricky business and one should be cautious 
with trying less tested methods.

Due to the long time for a full repair (see my earlier explanation), disabling 
anticompaction while running the full repair wasn’t an option for us. It was 
previously suggested that one could run the repair per node instead of the full 
cluster, but I don’t think that this will work, because only marking the 
SSTables on a single node as repaired would lead to massive overstreaming when 
running the full repair for the next node that shares data with the first one.

So, I want to describe the method that we used, just in case someone is in the 
same situation:

Going around the ring, we temporarily stopped each node and marked all of its 
SSTables as repaired. Then we immediately ran a full repair, so that any 
inconsistencies in the data that was now marked as repaired but not actually 
repaired were fixed.

Using this approach, the amount over over-streaming is minimal (at least for 
not too large clusters, where the rolling restart can be done in less than an 
hour or so), because the only difference between the “unrepaired” SSTables on 
the different nodes will be the data that was written before stopping the first 
node and stopping the last node.

Any inconsistencies that might exist in the SSTables that were marked as 
repaired should be caught in the full repair, so I do not think it is too 
dangerous either. However, I agree that for clusters where a full repair is 
quick (e.g. finishes in a few hours), using the well-tested and frequently used 
approach is probably better.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to