In fact, it did eventually finish in ~20 minutes.  Is this duration
expected/normal?

--Kevin


On Wed, Apr 9, 2014 at 9:32 AM, Kevin McLaughlin <kmcla...@gmail.com> wrote:
> Have a test cluster with three nodes each in two datacenters.  The
> following causes nodetool repair to go into an (apparent) infinite
> loop.  This is with 2.0.6.
>
> On node 10.140.140.101:
>
> cqlsh> CREATE KEYSPACE looptest WITH replication = {
>
>   ...   'class': 'NetworkTopologyStrategy',
>
>    ...   '140': '2',
>
>    ...   '141': '2'
>
>    ... };
>
> cqlsh> use looptest;
>
> cqlsh:looptest> CREATE TABLE a_table (
>
>             ...   id uuid,
>
>             ...   description text,
>
>             ...   PRIMARY KEY (id)
>
>             ... );
>
> cqlsh:looptest>
>
> On node 10.140.140.102:
>
> [default@unknown] describe cluster;
>
> Cluster Information:
>
>    Name: Dev Cluster
>
>    Snitch: org.apache.cassandra.locator.RackInferringSnitch
>
>    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
>    Schema versions:
>
> e7c46d59-fceb-38b5-947c-dcbd14950a4c: [10.141.140.101, 10.140.140.101,
> 10.140.140.102, 10.141.140.103, 10.141.140.102, 10.140.140.103]
>
> nodetool status:
>
> Datacenter: 141
>
> ===============
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address         Load       Tokens  Owns   Host ID
>              Rack
>
> UN  10.141.140.101  25.09 MB   256     15.6%
> 3f0d60bf-dfcd-42a9-9cff-8b76146359e3  140
>
> UN  10.141.140.102  27.83 MB   256     16.7%
> bbdcc640-278e-4d3d-ac12-fcb4d837d0e1  140
>
> UN  10.141.140.103  23.78 MB   256     16.5%
> b030e290-b8da-4883-a13d-b2529fab37fe  140
>
> Datacenter: 140
>
> ===============
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address         Load       Tokens  Owns   Host ID
>              Rack
>
> UN  10.140.140.103  65.26 MB   256     18.1%
> 52a9a718-2bed-4972-ab11-bd97a8d8539c  140
>
> UN  10.140.140.101  69.46 MB   256     17.6%
> d59300db-6179-484e-9ca1-8d1eada0701a  140
>
> UN  10.140.140.102  68.08 MB   256     15.4%
> 22e504c9-1cc6-4744-b302-32bb5116d409  140
>
>
> Back on 10.140.140.101:
>
> "nodetool repair looptest" never returns.  Looking in the system.log,
> it is continuously looping with:
>
> INFO [AntiEntropySessions:818] 2014-04-09 13:23:31,889
> RepairSession.java (line 282) [repair
> #24b2b1b0-bfea-11e3-85a3-911072ba5322] session completed successfully
>
>  INFO [AntiEntropySessions:816] 2014-04-09 13:23:31,916
> RepairSession.java (line 244) [repair
> #253687b0-bfea-11e3-85a3-911072ba5322] new session: will sync
> /10.140.140.101, /10.141.140.103, /10.140.140.103, /10.141.140.102 on
> range (-4377479664111251829,-4360027703686042340] for
> looptest.[a_table]
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:31,949 RepairSession.java
> (line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received
> merkle tree for a_table from /10.141.140.102
>
>  INFO [RepairJobTask:3] 2014-04-09 13:23:32,002 RepairJob.java (line
> 134) [repair #253687b0-bfea-11e3-85a3-911072ba5322] requesting merkle
> trees for a_table (to [/10.141.140.103, /10.140.140.103,
> /10.141.140.102, /10.140.140.101])
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,007 RepairSession.java
> (line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received
> merkle tree for a_table from /10.140.140.101
>
>  INFO [RepairJobTask:3] 2014-04-09 13:23:32,012 Differencer.java (line
> 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.141.140.101 and /10.140.140.103 are consistent for a_table
>
>  INFO [RepairJobTask:2] 2014-04-09 13:23:32,016 Differencer.java (line
> 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.141.140.101 and /10.140.140.101 are consistent for a_table
>
>  INFO [RepairJobTask:1] 2014-04-09 13:23:32,016 Differencer.java (line
> 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.141.140.101 and /10.141.140.102 are consistent for a_table
>
>  INFO [RepairJobTask:4] 2014-04-09 13:23:32,016 Differencer.java (line
> 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.140.140.103 and /10.141.140.102 are consistent for a_table
>
>  INFO [RepairJobTask:5] 2014-04-09 13:23:32,016 Differencer.java (line
> 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.140.140.103 and /10.140.140.101 are consistent for a_table
>
>  INFO [RepairJobTask:6] 2014-04-09 13:23:32,016 Differencer.java (line
> 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.141.140.102 and /10.140.140.101 are consistent for a_table
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,018 RepairSession.java
> (line 221) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] a_table is
> fully synced
>
>  INFO [AntiEntropySessions:817] 2014-04-09 13:23:32,019
> RepairSession.java (line 282) [repair
> #24e867b0-bfea-11e3-85a3-911072ba5322] session completed successfully
>
>  INFO [AntiEntropySessions:818] 2014-04-09 13:23:32,043
> RepairSession.java (line 244) [repair
> #2549c190-bfea-11e3-85a3-911072ba5322] new session: will sync
> /10.140.140.101, /10.141.140.103, /10.140.140.102, /10.141.140.102 on
> range (-3457228189350977014,-3443426249422196914] for
> looptest.[a_table]
>
>  INFO [RepairJobTask:3] 2014-04-09 13:23:32,169 RepairJob.java (line
> 134) [repair #2549c190-bfea-11e3-85a3-911072ba5322] requesting merkle
> trees for a_table (to [/10.141.140.103, /10.140.140.102,
> /10.141.140.102, /10.140.140.101])
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,197 RepairSession.java
> (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received
> merkle tree for a_table from /10.141.140.103
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,247 RepairSession.java
> (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received
> merkle tree for a_table from /10.140.140.103
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,454 RepairSession.java
> (line 164) [repair #2549c190-bfea-11e3-85a3-911072ba5322] Received
> merkle tree for a_table from /10.141.140.103
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,516 RepairSession.java
> (line 164) [repair #2549c190-bfea-11e3-85a3-911072ba5322] Received
> merkle tree for a_table from /10.140.140.102
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,522 RepairSession.java
> (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received
> merkle tree for a_table from /10.141.140.102
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,581 RepairSession.java
> (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received
> merkle tree for a_table from /10.140.140.101
>
>  INFO [RepairJobTask:3] 2014-04-09 13:23:32,586 Differencer.java (line
> 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.141.140.103 and /10.140.140.103 are consistent for a_table
>
>  INFO [RepairJobTask:2] 2014-04-09 13:23:32,589 Differencer.java (line
> 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.141.140.103 and /10.140.140.101 are consistent for a_table
>
>  INFO [RepairJobTask:1] 2014-04-09 13:23:32,589 Differencer.java (line
> 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.141.140.103 and /10.141.140.102 are consistent for a_table
>
>  INFO [RepairJobTask:5] 2014-04-09 13:23:32,589 Differencer.java (line
> 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.140.140.103 and /10.140.140.101 are consistent for a_table
>
>  INFO [RepairJobTask:4] 2014-04-09 13:23:32,590 Differencer.java (line
> 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.140.140.103 and /10.141.140.102 are consistent for a_table
>
>  INFO [RepairJobTask:6] 2014-04-09 13:23:32,590 Differencer.java (line
> 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
> /10.141.140.102 and /10.140.140.101 are consistent for a_table
>
>  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,592 RepairSession.java
> (line 221) [repair #253687b0-bfea-11e3-85a3-911072ba5322] a_table is
> fully synced
>
>  INFO [AntiEntropySessions:816] 2014-04-09 13:23:32,592
> RepairSession.java (line 282) [repair
> #253687b0-bfea-11e3-85a3-911072ba5322] session completed successfully
>
> Any ideas?  Could the fact that the rack name is the same in both
> datacenters have something to do with it?
>
> Thanks,
>
> --Kevin

Reply via email to