> Running nodetool repair causes Cassandra to execute a major compaction This is not what I would call factually accurate. Repair does not run a major compaction. Major compaction is when all SSTables for a CF are compacted down to one SSTable.
Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12 Jul 2011, at 10:09, cbert...@libero.it wrote: >> The book is wrong, at least by current versions of Cassandra (I'm >> basing that on the quote you pasted, I don't know the context). > > To be sure that I didn't misunderstand (English is not my mother tongue) here > is what the entire "repair paragraph" says ... > > Basic Maintenance > There are a few tasks that you’ll need to perform before or after more > impactful tasks. > For example, it makes sense to take a snapshot only after you’ve performed a > flush. So > in this section we look at some of these basic maintenance tasks: repair, > snapshot, and > cleanup. > > Repair > Running nodetool repair causes Cassandra to execute a major compaction. A > Merkle > tree of the data on the target node is computed, and the Merkle tree is > compared with > those of other replicas. This step makes sure that any data that might be out > of sync > with other nodes isn’t forgotten. > During a major compaction (see “Compaction” in the Glossary), the server > initiates a > TreeRequest/TreeReponse conversation to exchange Merkle trees with neighboring > nodes. The Merkle tree is a hash representing the data in that column family. > If the > trees from the different nodes don’t match, they have to be reconciled (or > “repaired”) > in order to determine the latest data values they should all be set to. This > tree compar- > ison validation is the responsibility of the org.apache.cassandra.service. > AntiEntropy > Service class. AntiEntropyService implements the Singleton pattern and > defines > the > static Differencer class as well, which is used to compare two trees. If it > finds any > differences, it launches a repair for the ranges that don’t agree. > So although Cassandra takes care of such matters automatically on occasion, > you can > run it yourself as well. > > > >> >> nodetool repair must be scheduled by the operator to run regularly. >> The name "repair" is a bit unfortunate; it is not meant to imply that >> it only needs to run when something is "wrong". >> >> -- >> / Peter Schuller >> > >