Re: easy repair questions on -pr

Sylvain Lebresne Tue, 02 Oct 2012 08:35:07 -0700

The short version is: there is 2 use case for nodetool repair:
  1) For periodic repair of the whole cluster
(http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair).
In that case, you should run repair *with* -pr and you should run it
on *every* node.
  2) When a node has been down for a long time (for instance long
enough that hints may have been dropped), and you want to repair that
node specifically. In that case, you should run repair on that node
only and you should use it *without* -pr.


As for the gory details, nodetool repair without -pr will repair all
the range of the node on which the repair is done. But when a range is
repaired, it is repaired on *all* replica. In other words, a repair on
node A will also repair parts of other nodes that share a range with
A. That why, in the case 1) above, where you want to repair the whole
cluster, a repair without -pr is inefficient, because if you repair A
and B and both are replica for the same range, you will duplicate the
work. Hence repair -pr: on one node it repair only its primary range
(but all replica for said range), and so if you do that on every node,
you will have effectively repair the whole cluster without having
repaired the same range twice.

> Can I run node tool –pr repair on just 1/RF of my nodes if I do the correct 
> nodes?

As it's hopefully clear from the description above, no.

>  Why are the row keys still there though?

http://wiki.apache.org/cassandra/FAQ#range_ghosts


--
Sylvain

Re: easy repair questions on -pr

Reply via email to