Re: Unbalanced ring mystery multi-DC issue with 1.1.11
Check the logs for messages about nodes going up and down, and also look at the MessagingService MBean for timeouts. If the node in DR 2 times out replying to DR1 the DR1 node will store a hint. Also when hints are stored they are TTL'd to the gc_grace_seconds for the CF (IIRC). If that's low the hints may not have been delivered. Am not aware of any specific tracking for failed hints other than log messages. A - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 28/09/2013, at 12:01 AM, Oleg Dulin oleg.du...@gmail.com wrote: Here is some more information. I am running full repair on one of the nodes and I am observing strange behavior. Both DCs were up during the data load. But repair is reporting a lot of out-of-sync data. Why would that be ? Is there a way for me to tell that WAN may be dropping hinted handoff traffic ? Regards, Oleg On 2013-09-27 10:35:34 +, Oleg Dulin said: Wanted to add one more thing: I can also tell that the numbers are not consistent across DRs this way -- I have a column family with really wide rows (a couple million columns). DC1 reports higher column counts than DC2. DC2 only becomes consistent after I do the command a couple of times and trigger a read-repair. But why would nodetool repair logs show that everything is in sync ? Regards, Oleg On 2013-09-27 10:23:45 +, Oleg Dulin said: Consider this output from nodetool ring: Address DC RackStatus State Load Effective-Ownership Token 127605887595351923798765477786913079396 dc1.5 DC1 RAC1Up Normal 32.07 GB50.00%0 dc2.100DC2 RAC1Up Normal 8.21 GB 50.00%100 dc1.6 DC1 RAC1Up Normal 32.82 GB50.00% 42535295865117307932921825928971026432 dc2.101DC2 RAC1Up Normal 12.41 GB50.00% 42535295865117307932921825928971026532 dc1.7 DC1 RAC1Up Normal 28.37 GB50.00% 85070591730234615865843651857942052864 dc2.102DC2 RAC1Up Normal 12.27 GB50.00% 85070591730234615865843651857942052964 dc1.8 DC1 RAC1Up Normal 27.34 GB50.00% 127605887595351923798765477786913079296 dc2.103DC2 RAC1Up Normal 13.46 GB50.00% 127605887595351923798765477786913079396 I concealed IPs and DC names for confidentiality. All of the data loading was happening against DC1 at a pretty brisk rate, of, say, 200K writes per minute. Note how my tokens are offset by 100. Shouldn't that mean that load on each node should be roughly identical ? In DC1 it is roughly around 30 G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by token range. To verify that the nodes are in sync, I ran nodetool -h localhost repair MyKeySpace --partitioner-range on each node in DC2. Watching the logs, I see that the repair went really quick and all column families are in sync! I need help making sense of this. Is this because DC1 is not fully compacted ? Is it because DC2 is not fully synced and I am not checking correctly ? How can I tell that there is still replication going on in progress (note, I started my load yesterday at 9:50am). -- Regards, Oleg Dulin http://www.olegdulin.com
Unbalanced ring mystery multi-DC issue with 1.1.11
Consider this output from nodetool ring: Address DC RackStatus State Load Effective-Ownership Token 127605887595351923798765477786913079396 dc1.5 DC1 RAC1Up Normal 32.07 GB50.00% 0 dc2.100DC2 RAC1Up Normal 8.21 GB 50.00% 100 dc1.6 DC1 RAC1Up Normal 32.82 GB50.00% 42535295865117307932921825928971026432 dc2.101DC2 RAC1Up Normal 12.41 GB50.00% 42535295865117307932921825928971026532 dc1.7 DC1 RAC1Up Normal 28.37 GB50.00% 85070591730234615865843651857942052864 dc2.102DC2 RAC1Up Normal 12.27 GB50.00% 85070591730234615865843651857942052964 dc1.8 DC1 RAC1Up Normal 27.34 GB50.00% 127605887595351923798765477786913079296 dc2.103DC2 RAC1Up Normal 13.46 GB50.00% 127605887595351923798765477786913079396 I concealed IPs and DC names for confidentiality. All of the data loading was happening against DC1 at a pretty brisk rate, of, say, 200K writes per minute. Note how my tokens are offset by 100. Shouldn't that mean that load on each node should be roughly identical ? In DC1 it is roughly around 30 G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by token range. To verify that the nodes are in sync, I ran nodetool -h localhost repair MyKeySpace --partitioner-range on each node in DC2. Watching the logs, I see that the repair went really quick and all column families are in sync! I need help making sense of this. Is this because DC1 is not fully compacted ? Is it because DC2 is not fully synced and I am not checking correctly ? How can I tell that there is still replication going on in progress (note, I started my load yesterday at 9:50am). -- Regards, Oleg Dulin http://www.olegdulin.com
Re: Unbalanced ring mystery multi-DC issue with 1.1.11
Wanted to add one more thing: I can also tell that the numbers are not consistent across DRs this way -- I have a column family with really wide rows (a couple million columns). DC1 reports higher column counts than DC2. DC2 only becomes consistent after I do the command a couple of times and trigger a read-repair. But why would nodetool repair logs show that everything is in sync ? Regards, Oleg On 2013-09-27 10:23:45 +, Oleg Dulin said: Consider this output from nodetool ring: Address DC RackStatus State Load Effective-Ownership Token 127605887595351923798765477786913079396 dc1.5 DC1 RAC1Up Normal 32.07 GB50.00% 0 dc2.100DC2 RAC1Up Normal 8.21 GB 50.00% 100 dc1.6 DC1 RAC1Up Normal 32.82 GB50.00% 42535295865117307932921825928971026432 dc2.101DC2 RAC1Up Normal 12.41 GB50.00% 42535295865117307932921825928971026532 dc1.7 DC1 RAC1Up Normal 28.37 GB50.00% 85070591730234615865843651857942052864 dc2.102DC2 RAC1Up Normal 12.27 GB50.00% 85070591730234615865843651857942052964 dc1.8 DC1 RAC1Up Normal 27.34 GB50.00% 127605887595351923798765477786913079296 dc2.103DC2 RAC1Up Normal 13.46 GB50.00% 127605887595351923798765477786913079396 I concealed IPs and DC names for confidentiality. All of the data loading was happening against DC1 at a pretty brisk rate, of, say, 200K writes per minute. Note how my tokens are offset by 100. Shouldn't that mean that load on each node should be roughly identical ? In DC1 it is roughly around 30 G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by token range. To verify that the nodes are in sync, I ran nodetool -h localhost repair MyKeySpace --partitioner-range on each node in DC2. Watching the logs, I see that the repair went really quick and all column families are in sync! I need help making sense of this. Is this because DC1 is not fully compacted ? Is it because DC2 is not fully synced and I am not checking correctly ? How can I tell that there is still replication going on in progress (note, I started my load yesterday at 9:50am). -- Regards, Oleg Dulin http://www.olegdulin.com
Re: Unbalanced ring mystery multi-DC issue with 1.1.11
Here is some more information. I am running full repair on one of the nodes and I am observing strange behavior. Both DCs were up during the data load. But repair is reporting a lot of out-of-sync data. Why would that be ? Is there a way for me to tell that WAN may be dropping hinted handoff traffic ? Regards, Oleg On 2013-09-27 10:35:34 +, Oleg Dulin said: Wanted to add one more thing: I can also tell that the numbers are not consistent across DRs this way -- I have a column family with really wide rows (a couple million columns). DC1 reports higher column counts than DC2. DC2 only becomes consistent after I do the command a couple of times and trigger a read-repair. But why would nodetool repair logs show that everything is in sync ? Regards, Oleg On 2013-09-27 10:23:45 +, Oleg Dulin said: Consider this output from nodetool ring: Address DC RackStatus State Load Effective-Ownership Token 127605887595351923798765477786913079396 dc1.5 DC1 RAC1Up Normal 32.07 GB50.00% 0 dc2.100DC2 RAC1Up Normal 8.21 GB 50.00%100 dc1.6 DC1 RAC1Up Normal 32.82 GB50.00% 42535295865117307932921825928971026432 dc2.101DC2 RAC1Up Normal 12.41 GB50.00% 42535295865117307932921825928971026532 dc1.7 DC1 RAC1Up Normal 28.37 GB50.00% 85070591730234615865843651857942052864 dc2.102DC2 RAC1Up Normal 12.27 GB50.00% 85070591730234615865843651857942052964 dc1.8 DC1 RAC1Up Normal 27.34 GB50.00% 127605887595351923798765477786913079296 dc2.103DC2 RAC1Up Normal 13.46 GB50.00% 127605887595351923798765477786913079396 I concealed IPs and DC names for confidentiality. All of the data loading was happening against DC1 at a pretty brisk rate, of, say, 200K writes per minute. Note how my tokens are offset by 100. Shouldn't that mean that load on each node should be roughly identical ? In DC1 it is roughly around 30 G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by token range. To verify that the nodes are in sync, I ran nodetool -h localhost repair MyKeySpace --partitioner-range on each node in DC2. Watching the logs, I see that the repair went really quick and all column families are in sync! I need help making sense of this. Is this because DC1 is not fully compacted ? Is it because DC2 is not fully synced and I am not checking correctly ? How can I tell that there is still replication going on in progress (note, I started my load yesterday at 9:50am). -- Regards, Oleg Dulin http://www.olegdulin.com