In fact, it did eventually finish in ~20 minutes. Is this duration expected/normal?
--Kevin On Wed, Apr 9, 2014 at 9:32 AM, Kevin McLaughlin <kmcla...@gmail.com> wrote: > Have a test cluster with three nodes each in two datacenters. The > following causes nodetool repair to go into an (apparent) infinite > loop. This is with 2.0.6. > > On node 10.140.140.101: > > cqlsh> CREATE KEYSPACE looptest WITH replication = { > > ... 'class': 'NetworkTopologyStrategy', > > ... '140': '2', > > ... '141': '2' > > ... }; > > cqlsh> use looptest; > > cqlsh:looptest> CREATE TABLE a_table ( > > ... id uuid, > > ... description text, > > ... PRIMARY KEY (id) > > ... ); > > cqlsh:looptest> > > On node 10.140.140.102: > > [default@unknown] describe cluster; > > Cluster Information: > > Name: Dev Cluster > > Snitch: org.apache.cassandra.locator.RackInferringSnitch > > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > > Schema versions: > > e7c46d59-fceb-38b5-947c-dcbd14950a4c: [10.141.140.101, 10.140.140.101, > 10.140.140.102, 10.141.140.103, 10.141.140.102, 10.140.140.103] > > nodetool status: > > Datacenter: 141 > > =============== > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > -- Address Load Tokens Owns Host ID > Rack > > UN 10.141.140.101 25.09 MB 256 15.6% > 3f0d60bf-dfcd-42a9-9cff-8b76146359e3 140 > > UN 10.141.140.102 27.83 MB 256 16.7% > bbdcc640-278e-4d3d-ac12-fcb4d837d0e1 140 > > UN 10.141.140.103 23.78 MB 256 16.5% > b030e290-b8da-4883-a13d-b2529fab37fe 140 > > Datacenter: 140 > > =============== > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > -- Address Load Tokens Owns Host ID > Rack > > UN 10.140.140.103 65.26 MB 256 18.1% > 52a9a718-2bed-4972-ab11-bd97a8d8539c 140 > > UN 10.140.140.101 69.46 MB 256 17.6% > d59300db-6179-484e-9ca1-8d1eada0701a 140 > > UN 10.140.140.102 68.08 MB 256 15.4% > 22e504c9-1cc6-4744-b302-32bb5116d409 140 > > > Back on 10.140.140.101: > > "nodetool repair looptest" never returns. Looking in the system.log, > it is continuously looping with: > > INFO [AntiEntropySessions:818] 2014-04-09 13:23:31,889 > RepairSession.java (line 282) [repair > #24b2b1b0-bfea-11e3-85a3-911072ba5322] session completed successfully > > INFO [AntiEntropySessions:816] 2014-04-09 13:23:31,916 > RepairSession.java (line 244) [repair > #253687b0-bfea-11e3-85a3-911072ba5322] new session: will sync > /10.140.140.101, /10.141.140.103, /10.140.140.103, /10.141.140.102 on > range (-4377479664111251829,-4360027703686042340] for > looptest.[a_table] > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:31,949 RepairSession.java > (line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received > merkle tree for a_table from /10.141.140.102 > > INFO [RepairJobTask:3] 2014-04-09 13:23:32,002 RepairJob.java (line > 134) [repair #253687b0-bfea-11e3-85a3-911072ba5322] requesting merkle > trees for a_table (to [/10.141.140.103, /10.140.140.103, > /10.141.140.102, /10.140.140.101]) > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,007 RepairSession.java > (line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received > merkle tree for a_table from /10.140.140.101 > > INFO [RepairJobTask:3] 2014-04-09 13:23:32,012 Differencer.java (line > 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.141.140.101 and /10.140.140.103 are consistent for a_table > > INFO [RepairJobTask:2] 2014-04-09 13:23:32,016 Differencer.java (line > 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.141.140.101 and /10.140.140.101 are consistent for a_table > > INFO [RepairJobTask:1] 2014-04-09 13:23:32,016 Differencer.java (line > 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.141.140.101 and /10.141.140.102 are consistent for a_table > > INFO [RepairJobTask:4] 2014-04-09 13:23:32,016 Differencer.java (line > 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.140.140.103 and /10.141.140.102 are consistent for a_table > > INFO [RepairJobTask:5] 2014-04-09 13:23:32,016 Differencer.java (line > 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.140.140.103 and /10.140.140.101 are consistent for a_table > > INFO [RepairJobTask:6] 2014-04-09 13:23:32,016 Differencer.java (line > 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.141.140.102 and /10.140.140.101 are consistent for a_table > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,018 RepairSession.java > (line 221) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] a_table is > fully synced > > INFO [AntiEntropySessions:817] 2014-04-09 13:23:32,019 > RepairSession.java (line 282) [repair > #24e867b0-bfea-11e3-85a3-911072ba5322] session completed successfully > > INFO [AntiEntropySessions:818] 2014-04-09 13:23:32,043 > RepairSession.java (line 244) [repair > #2549c190-bfea-11e3-85a3-911072ba5322] new session: will sync > /10.140.140.101, /10.141.140.103, /10.140.140.102, /10.141.140.102 on > range (-3457228189350977014,-3443426249422196914] for > looptest.[a_table] > > INFO [RepairJobTask:3] 2014-04-09 13:23:32,169 RepairJob.java (line > 134) [repair #2549c190-bfea-11e3-85a3-911072ba5322] requesting merkle > trees for a_table (to [/10.141.140.103, /10.140.140.102, > /10.141.140.102, /10.140.140.101]) > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,197 RepairSession.java > (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received > merkle tree for a_table from /10.141.140.103 > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,247 RepairSession.java > (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received > merkle tree for a_table from /10.140.140.103 > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,454 RepairSession.java > (line 164) [repair #2549c190-bfea-11e3-85a3-911072ba5322] Received > merkle tree for a_table from /10.141.140.103 > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,516 RepairSession.java > (line 164) [repair #2549c190-bfea-11e3-85a3-911072ba5322] Received > merkle tree for a_table from /10.140.140.102 > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,522 RepairSession.java > (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received > merkle tree for a_table from /10.141.140.102 > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,581 RepairSession.java > (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received > merkle tree for a_table from /10.140.140.101 > > INFO [RepairJobTask:3] 2014-04-09 13:23:32,586 Differencer.java (line > 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.141.140.103 and /10.140.140.103 are consistent for a_table > > INFO [RepairJobTask:2] 2014-04-09 13:23:32,589 Differencer.java (line > 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.141.140.103 and /10.140.140.101 are consistent for a_table > > INFO [RepairJobTask:1] 2014-04-09 13:23:32,589 Differencer.java (line > 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.141.140.103 and /10.141.140.102 are consistent for a_table > > INFO [RepairJobTask:5] 2014-04-09 13:23:32,589 Differencer.java (line > 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.140.140.103 and /10.140.140.101 are consistent for a_table > > INFO [RepairJobTask:4] 2014-04-09 13:23:32,590 Differencer.java (line > 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.140.140.103 and /10.141.140.102 are consistent for a_table > > INFO [RepairJobTask:6] 2014-04-09 13:23:32,590 Differencer.java (line > 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints > /10.141.140.102 and /10.140.140.101 are consistent for a_table > > INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,592 RepairSession.java > (line 221) [repair #253687b0-bfea-11e3-85a3-911072ba5322] a_table is > fully synced > > INFO [AntiEntropySessions:816] 2014-04-09 13:23:32,592 > RepairSession.java (line 282) [repair > #253687b0-bfea-11e3-85a3-911072ba5322] session completed successfully > > Any ideas? Could the fact that the rack name is the same in both > datacenters have something to do with it? > > Thanks, > > --Kevin