The default repair process doesn't usually work at scale, unfortunately. Depending on your data size, you have the following options.
Netflix Tickler: https://github.com/ckalantzis/cassTickler (Read at CL.ALL via CQL continuously :: Python) Spotify Reaper: https://github.com/spotify/cassandra-reaper (Subrange repair, provides a REST endpoint and calls APIs through JMX :: Java) List subranges: https://github.com/pauloricardomg/cassandra-list-subranges (Tool to get subranges for a given node. :: Java) Subrange Repair: https://github.com/BrianGallew/cassandra_range_repair<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBrianGallew%2Fcassandra_range_repair&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7Cd8ed7c743f3a42ebac1808d3e94a97e4%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=rnOdSYfxRuV0RiXnI9HcLB220StFRDXSCMdoOQKcfvE%3D&reserved=0> (Tool to subrange repair :: Python) Mutation Based Repair (Not ready yet): https://issues.apache.org/jira/browse/CASSANDRA-8911 (C* is thinking of doing this - hot off the press) If you have Spark in your system, you could use that to do what Netflix Tickler does. We're experimenting with it and seems to be the best fit for our datasets over all the other options. From: Leena Ghatpande [mailto:lghatpa...@hotmail.com] Sent: Wednesday, October 12, 2016 7:16 AM To: email@example.com Subject: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr Please advice. Cannot find any clear documentation on what is the best strategy for repairing nodes on a regular basis with multiple datacenters involved. We are running cassandra 3.7 in multi datacenter with 4 nodes in each data center. We are trying to run repairs every other night to keep the nodes in good state.We currently run repair with -pr option , but the repair process gets hung and does not complete gracefully. Dont see any errors in the logs either. What is the best way to perform repairs on multiple data centers on large tables. 1. Can we run Datacenter repair using -dc option for each data center? Do we need to run repair on each node in that case or will it repair all nodes within the datacenter? 2. Is running repair with -pr across all nodes required , if we perform the step 1 every night? 3. Is cross data center repair required and if so whats the best option? Thanks Leena