Hi Anuj, Thanks for the reply.
1). We are using Cassandra 2.2.8, and our repair commands we are comparing are "nodetool repair --in-local-dc --partitioner-range” and "nodetool repair --in-local-dc” Since 2.2 I believe inc repairs are the default - that seems to be confirmed in the logs that list the repair details when a repair starts. 2) From looks at a few runsr, on average: with -pr repairs, each node is approx 6.5 - 8 hours, so a total over the 7 nodes of 53 hours With just inc repairs, each node ~26 - 29 hours, so a total of 193 3) we currently have two DCs in total, the ‘production’ ring with 7 nodes and RF=3, and a testing ring with one single node and RF=1 for our single keyspace we currently use. 4) Yeah that number came from the Cassandra repair logs from an inc repair, I can share the number reports when using a pr repair later this evening when the currently running repair has completed. Many thanks for the reply again, Chris > On 6 Jun 2017, at 17:50, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > > Hi Chris, > > Can your share following info: > > 1. Exact repair commands you use for inc repair and pr repair > > 2. Repair time should be measured at cluster level for inc repair. So, whats > the total time it takes to run repair on all nodes for incremental vs pr > repairs? > > 3. You are repairing one dc DC3. How many DCs are there in total and whats > the RF for keyspaces? Running pr on a specific dc would not repair entire > data. > > 4. 885 ranges? From where did you get this number? Logs? Can you share the > number ranges printed in logs for both inc and pr case? > > > Thanks > Anuj > > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > On Tue, Jun 6, 2017 at 9:33 PM, Chris Stokesmore > <chris.elsm...@demandlogic.co> wrote: > Thank you for the excellent and clear description of the different versions > of repair Anuj, that has cleared up what I expect to be happening. > > The problem now is in our cluster, we are running repairs with options > (parallelism: parallel, primary range: false, incremental: true, job threads: > 1, ColumnFamilies: [], dataCenters: [DC3], hosts: [], # of ranges: 885) and > when we do our repairs are taking over a day to complete when previously when > running with the partition range option they were taking more like 8-9 hours. > > As I understand it, using incremental should have sped this process up as all > three sets of data on each repair job should be marked as repaired however > this does not seem to be the case. Any ideas? > > Chris > >> On 6 Jun 2017, at 16:08, Anuj Wadehra <anujw_2...@yahoo.co.in.INVALID >> <mailto:anujw_2...@yahoo.co.in.INVALID>> wrote: >> >> Hi Chris, >> >> Using pr with incremental repairs does not make sense. Primary range repair >> is an optimization over full repair. If you run full repair on a n node >> cluster with RF=3, you would be repairing each data thrice. >> E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . >> When full repair is run on node A, the entire data in that range gets synced >> with replicas on node B and C. Now, when you run full repair on nodes B and >> C, you are wasting resources on repairing data which is already repaired. >> >> Primary range repair ensures that when you run repair on a node, it ONLY >> repairs the data which is owned by the node. Thus, no node repairs data >> which is not owned by it and must be repaired by other node. Redundant work >> is eliminated. >> >> Even in pr, each time you run pr on all nodes, you repair 100% of data. Why >> to repair complete data in each cycle?? ..even data which has not even >> changed since the last repair cycle? >> >> This is where Incremental repair comes as an improvement. Once repaired, a >> data would be marked repaired so that the next repair cycle could just focus >> on repairing the delta. Now, lets go back to the example of 5 node cluster >> with RF =3.This time we run incremental repair on all nodes. When you repair >> entire data on node A, all 3 replicas are marked as repaired. Even if you >> run inc repair on all ranges on the second node, you would not re-repair the >> already repaired data. Thus, there is no advantage of repairing only the >> data owned by the node (primary range of the node). You can run inc repair >> on all the data present on a node and Cassandra would make sure that when >> you repair data on other nodes, you only repair unrepaired data. >> >> Thanks >> Anuj >> >> >> >> Sent from Yahoo Mail on Android >> <https://overview.mail.yahoo.com/mobile/?.src=Android> >> On Tue, Jun 6, 2017 at 4:27 PM, Chris Stokesmore >> <chris.elsm...@demandlogic.co <mailto:chris.elsm...@demandlogic.co>> wrote: >> Hi all, >> >> Wondering if anyone had any thoughts on this? At the moment the long running >> repairs cause us to be running them on two nodes at once for a bit of time, >> which obivould increases the cluster load. >> >> On 2017-05-25 16:18 (+0100), Chris Stokesmore <c...@demandlogic.co >> <mailto:c...@demandlogic.co>> wrote: >> > Hi,> >> > >> > We are running a 7 node Cassandra 2.2.8 cluster, RF=3, and had been >> > running repairs with the -pr option, via a cron job that runs on each node >> > once per week.> >> > >> > We changed that as some advice on the Cassandra IRC channel said it would >> > cause more anticompaction and >> > http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html >> > >> > <http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html>says >> > 'Performing partitioner range repairs by using the -pr option is >> > generally considered a good choice for doing manual repairs. However, this >> > option cannot be used with incremental repairs (default for Cassandra 2.2 >> > and later)' >> > >> > Only problem is our -pr repairs were taking about 8 hours, and now the >> > non-pr repair are taking 24+ - I guess this makes sense, repairing 1/7 of >> > data increased to 3/7, except I was hoping to see a speed up after the >> > first loop through the cluster as each repair will be marking much more >> > data as repaired, right?> >> > >> > >> > Is running -pr with incremental repairs really that bad? > >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> <mailto:user-unsubscr...@cassandra.apache.org> >> For additional commands, e-mail: user-h...@cassandra.apache.org >> <mailto:user-h...@cassandra.apache.org> >