Re: Unbalanced ring in Cassandra 0.8.4
Does cleanup only cleanup keys that no longer belong to that node. Yes. I guess it could be an artefact of the bulk load. It's not been reported previously though. Try the cleanup and see how it goes. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/06/2012, at 1:34 AM, Raj N wrote: Nick, thanks for the response. Does cleanup only cleanup keys that no longer belong to that node. Just to add more color, when I bulk loaded all my data into these 6 nodes, all of them had the same amount of data. After the first nodetool repair, the first node started having more data than the rest of the cluster. And since then it has never come back down. When I run cfstats on the node, the amount of data for every column family is almost 2 times the the amount of data for other. This is true for the number of keys estimate as well. For 1 CF I see more than double the number of keys and that's the largest cf as well with 34 GB data. Thanks -Rajesh On Wed, Jun 20, 2012 at 12:32 AM, Nick Bailey n...@datastax.com wrote: No. Cleanup will scan each sstable to remove data that is no longer owned by that specific node. It won't compact the sstables together however. On Tue, Jun 19, 2012 at 11:11 PM, Raj N raj.cassan...@gmail.com wrote: But wont that also run a major compaction which is not recommended anymore. -Raj On Sun, Jun 17, 2012 at 11:58 PM, aaron morton aa...@thelastpickle.com wrote: Assuming you have been running repair, it' can't hurt. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/06/2012, at 4:06 AM, Raj N wrote: Nick, do you think I should still run cleanup on the first node. -Rajesh On Fri, Jun 15, 2012 at 3:47 PM, Raj N raj.cassan...@gmail.com wrote: I did run nodetool move. But that was when I was setting up the cluster which means I didn't have any data at that time. -Raj On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey n...@datastax.com wrote: Did you start all your nodes at the correct tokens or did you balance by moving them? Moving nodes around won't delete unneeded data after the move is done. Try running 'nodetool cleanup' on all of your nodes. On Fri, Jun 15, 2012 at 12:24 PM, Raj N raj.cassan...@gmail.com wrote: Actually I am not worried about the percentage. Its the data I am concerned about. Look at the first node. It has 102.07GB data. And the other nodes have around 60 GB(one has 69, but lets ignore that one). I am not understanding why the first node has almost double the data. Thanks -Raj On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com wrote: This is just a known problem with the nodetool output and multiple DCs. Your configuration is correct. The problem with nodetool is fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-3412 On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned tokens using the first strategy(adding 1) mentioned here - http://wiki.apache.org/cassandra/Operations?#Token_selection But when I run nodetool ring on my cluster, this is the result I get - Address DC Rack Status State LoadOwnsToken 113427455640312814857969558651062452225 172.17.72.91DC1 RAC13 Up Normal 102.07 GB 33.33% 0 45.10.80.144DC2 RAC5 Up Normal 59.1 GB 0.00% 1 172.17.72.93DC1 RAC18 Up Normal 59.57 GB33.33% 56713727820156407428984779325531226112 45.10.80.146DC2 RAC7 Up Normal 59.64 GB0.00% 56713727820156407428984779325531226113 172.17.72.95DC1 RAC19 Up Normal 69.58 GB33.33% 113427455640312814857969558651062452224 45.10.80.148DC2 RAC9 Up Normal 59.31 GB0.00% 113427455640312814857969558651062452225 As you can see the first node has considerably more load than the others(almost double) which is surprising since all these are replicas of each other. I am running Cassandra 0.8.4. Is there an explanation for this behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be the cause for this? Thanks -Raj
Re: Unbalanced ring in Cassandra 0.8.4
But wont that also run a major compaction which is not recommended anymore. -Raj On Sun, Jun 17, 2012 at 11:58 PM, aaron morton aa...@thelastpickle.comwrote: Assuming you have been running repair, it' can't hurt. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/06/2012, at 4:06 AM, Raj N wrote: Nick, do you think I should still run cleanup on the first node. -Rajesh On Fri, Jun 15, 2012 at 3:47 PM, Raj N raj.cassan...@gmail.com wrote: I did run nodetool move. But that was when I was setting up the cluster which means I didn't have any data at that time. -Raj On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey n...@datastax.com wrote: Did you start all your nodes at the correct tokens or did you balance by moving them? Moving nodes around won't delete unneeded data after the move is done. Try running 'nodetool cleanup' on all of your nodes. On Fri, Jun 15, 2012 at 12:24 PM, Raj N raj.cassan...@gmail.com wrote: Actually I am not worried about the percentage. Its the data I am concerned about. Look at the first node. It has 102.07GB data. And the other nodes have around 60 GB(one has 69, but lets ignore that one). I am not understanding why the first node has almost double the data. Thanks -Raj On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com wrote: This is just a known problem with the nodetool output and multiple DCs. Your configuration is correct. The problem with nodetool is fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-3412 On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned tokens using the first strategy(adding 1) mentioned here - http://wiki.apache.org/cassandra/Operations?#Token_selection But when I run nodetool ring on my cluster, this is the result I get - Address DC Rack Status State LoadOwnsToken 113427455640312814857969558651062452225 172.17.72.91DC1 RAC13 Up Normal 102.07 GB 33.33% 0 45.10.80.144DC2 RAC5 Up Normal 59.1 GB 0.00% 1 172.17.72.93DC1 RAC18 Up Normal 59.57 GB33.33% 56713727820156407428984779325531226112 45.10.80.146DC2 RAC7 Up Normal 59.64 GB0.00% 56713727820156407428984779325531226113 172.17.72.95DC1 RAC19 Up Normal 69.58 GB33.33% 113427455640312814857969558651062452224 45.10.80.148DC2 RAC9 Up Normal 59.31 GB0.00% 113427455640312814857969558651062452225 As you can see the first node has considerably more load than the others(almost double) which is surprising since all these are replicas of each other. I am running Cassandra 0.8.4. Is there an explanation for this behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be the cause for this? Thanks -Raj
Re: Unbalanced ring in Cassandra 0.8.4
No. Cleanup will scan each sstable to remove data that is no longer owned by that specific node. It won't compact the sstables together however. On Tue, Jun 19, 2012 at 11:11 PM, Raj N raj.cassan...@gmail.com wrote: But wont that also run a major compaction which is not recommended anymore. -Raj On Sun, Jun 17, 2012 at 11:58 PM, aaron morton aa...@thelastpickle.com wrote: Assuming you have been running repair, it' can't hurt. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/06/2012, at 4:06 AM, Raj N wrote: Nick, do you think I should still run cleanup on the first node. -Rajesh On Fri, Jun 15, 2012 at 3:47 PM, Raj N raj.cassan...@gmail.com wrote: I did run nodetool move. But that was when I was setting up the cluster which means I didn't have any data at that time. -Raj On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey n...@datastax.com wrote: Did you start all your nodes at the correct tokens or did you balance by moving them? Moving nodes around won't delete unneeded data after the move is done. Try running 'nodetool cleanup' on all of your nodes. On Fri, Jun 15, 2012 at 12:24 PM, Raj N raj.cassan...@gmail.com wrote: Actually I am not worried about the percentage. Its the data I am concerned about. Look at the first node. It has 102.07GB data. And the other nodes have around 60 GB(one has 69, but lets ignore that one). I am not understanding why the first node has almost double the data. Thanks -Raj On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com wrote: This is just a known problem with the nodetool output and multiple DCs. Your configuration is correct. The problem with nodetool is fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-3412 On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned tokens using the first strategy(adding 1) mentioned here - http://wiki.apache.org/cassandra/Operations?#Token_selection But when I run nodetool ring on my cluster, this is the result I get - Address DC Rack Status State Load Owns Token 113427455640312814857969558651062452225 172.17.72.91 DC1 RAC13 Up Normal 102.07 GB 33.33% 0 45.10.80.144 DC2 RAC5 Up Normal 59.1 GB 0.00% 1 172.17.72.93 DC1 RAC18 Up Normal 59.57 GB 33.33% 56713727820156407428984779325531226112 45.10.80.146 DC2 RAC7 Up Normal 59.64 GB 0.00% 56713727820156407428984779325531226113 172.17.72.95 DC1 RAC19 Up Normal 69.58 GB 33.33% 113427455640312814857969558651062452224 45.10.80.148 DC2 RAC9 Up Normal 59.31 GB 0.00% 113427455640312814857969558651062452225 As you can see the first node has considerably more load than the others(almost double) which is surprising since all these are replicas of each other. I am running Cassandra 0.8.4. Is there an explanation for this behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be the cause for this? Thanks -Raj
Re: Unbalanced ring in Cassandra 0.8.4
Assuming you have been running repair, it' can't hurt. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/06/2012, at 4:06 AM, Raj N wrote: Nick, do you think I should still run cleanup on the first node. -Rajesh On Fri, Jun 15, 2012 at 3:47 PM, Raj N raj.cassan...@gmail.com wrote: I did run nodetool move. But that was when I was setting up the cluster which means I didn't have any data at that time. -Raj On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey n...@datastax.com wrote: Did you start all your nodes at the correct tokens or did you balance by moving them? Moving nodes around won't delete unneeded data after the move is done. Try running 'nodetool cleanup' on all of your nodes. On Fri, Jun 15, 2012 at 12:24 PM, Raj N raj.cassan...@gmail.com wrote: Actually I am not worried about the percentage. Its the data I am concerned about. Look at the first node. It has 102.07GB data. And the other nodes have around 60 GB(one has 69, but lets ignore that one). I am not understanding why the first node has almost double the data. Thanks -Raj On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com wrote: This is just a known problem with the nodetool output and multiple DCs. Your configuration is correct. The problem with nodetool is fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-3412 On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned tokens using the first strategy(adding 1) mentioned here - http://wiki.apache.org/cassandra/Operations?#Token_selection But when I run nodetool ring on my cluster, this is the result I get - Address DC Rack Status State LoadOwnsToken 113427455640312814857969558651062452225 172.17.72.91DC1 RAC13 Up Normal 102.07 GB 33.33% 0 45.10.80.144DC2 RAC5 Up Normal 59.1 GB 0.00% 1 172.17.72.93DC1 RAC18 Up Normal 59.57 GB33.33% 56713727820156407428984779325531226112 45.10.80.146DC2 RAC7 Up Normal 59.64 GB0.00% 56713727820156407428984779325531226113 172.17.72.95DC1 RAC19 Up Normal 69.58 GB33.33% 113427455640312814857969558651062452224 45.10.80.148DC2 RAC9 Up Normal 59.31 GB0.00% 113427455640312814857969558651062452225 As you can see the first node has considerably more load than the others(almost double) which is surprising since all these are replicas of each other. I am running Cassandra 0.8.4. Is there an explanation for this behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be the cause for this? Thanks -Raj
Re: Unbalanced ring in Cassandra 0.8.4
Nick, do you think I should still run cleanup on the first node. -Rajesh On Fri, Jun 15, 2012 at 3:47 PM, Raj N raj.cassan...@gmail.com wrote: I did run nodetool move. But that was when I was setting up the cluster which means I didn't have any data at that time. -Raj On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey n...@datastax.com wrote: Did you start all your nodes at the correct tokens or did you balance by moving them? Moving nodes around won't delete unneeded data after the move is done. Try running 'nodetool cleanup' on all of your nodes. On Fri, Jun 15, 2012 at 12:24 PM, Raj N raj.cassan...@gmail.com wrote: Actually I am not worried about the percentage. Its the data I am concerned about. Look at the first node. It has 102.07GB data. And the other nodes have around 60 GB(one has 69, but lets ignore that one). I am not understanding why the first node has almost double the data. Thanks -Raj On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com wrote: This is just a known problem with the nodetool output and multiple DCs. Your configuration is correct. The problem with nodetool is fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-3412 On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned tokens using the first strategy(adding 1) mentioned here - http://wiki.apache.org/cassandra/Operations?#Token_selection But when I run nodetool ring on my cluster, this is the result I get - Address DC Rack Status State LoadOwnsToken 113427455640312814857969558651062452225 172.17.72.91DC1 RAC13 Up Normal 102.07 GB 33.33% 0 45.10.80.144DC2 RAC5 Up Normal 59.1 GB 0.00% 1 172.17.72.93DC1 RAC18 Up Normal 59.57 GB33.33% 56713727820156407428984779325531226112 45.10.80.146DC2 RAC7 Up Normal 59.64 GB0.00% 56713727820156407428984779325531226113 172.17.72.95DC1 RAC19 Up Normal 69.58 GB33.33% 113427455640312814857969558651062452224 45.10.80.148DC2 RAC9 Up Normal 59.31 GB0.00% 113427455640312814857969558651062452225 As you can see the first node has considerably more load than the others(almost double) which is surprising since all these are replicas of each other. I am running Cassandra 0.8.4. Is there an explanation for this behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be the cause for this? Thanks -Raj
Re: Unbalanced ring in Cassandra 0.8.4
This is just a known problem with the nodetool output and multiple DCs. Your configuration is correct. The problem with nodetool is fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-3412 On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned tokens using the first strategy(adding 1) mentioned here - http://wiki.apache.org/cassandra/Operations?#Token_selection But when I run nodetool ring on my cluster, this is the result I get - Address DC Rack Status State Load Owns Token 113427455640312814857969558651062452225 172.17.72.91 DC1 RAC13 Up Normal 102.07 GB 33.33% 0 45.10.80.144 DC2 RAC5 Up Normal 59.1 GB 0.00% 1 172.17.72.93 DC1 RAC18 Up Normal 59.57 GB 33.33% 56713727820156407428984779325531226112 45.10.80.146 DC2 RAC7 Up Normal 59.64 GB 0.00% 56713727820156407428984779325531226113 172.17.72.95 DC1 RAC19 Up Normal 69.58 GB 33.33% 113427455640312814857969558651062452224 45.10.80.148 DC2 RAC9 Up Normal 59.31 GB 0.00% 113427455640312814857969558651062452225 As you can see the first node has considerably more load than the others(almost double) which is surprising since all these are replicas of each other. I am running Cassandra 0.8.4. Is there an explanation for this behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be the cause for this? Thanks -Raj
Re: Unbalanced ring in Cassandra 0.8.4
Actually I am not worried about the percentage. Its the data I am concerned about. Look at the first node. It has 102.07GB data. And the other nodes have around 60 GB(one has 69, but lets ignore that one). I am not understanding why the first node has almost double the data. Thanks -Raj On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com wrote: This is just a known problem with the nodetool output and multiple DCs. Your configuration is correct. The problem with nodetool is fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-3412 On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned tokens using the first strategy(adding 1) mentioned here - http://wiki.apache.org/cassandra/Operations?#Token_selection But when I run nodetool ring on my cluster, this is the result I get - Address DC Rack Status State LoadOwnsToken 113427455640312814857969558651062452225 172.17.72.91DC1 RAC13 Up Normal 102.07 GB 33.33% 0 45.10.80.144DC2 RAC5 Up Normal 59.1 GB 0.00% 1 172.17.72.93DC1 RAC18 Up Normal 59.57 GB33.33% 56713727820156407428984779325531226112 45.10.80.146DC2 RAC7 Up Normal 59.64 GB0.00% 56713727820156407428984779325531226113 172.17.72.95DC1 RAC19 Up Normal 69.58 GB33.33% 113427455640312814857969558651062452224 45.10.80.148DC2 RAC9 Up Normal 59.31 GB0.00% 113427455640312814857969558651062452225 As you can see the first node has considerably more load than the others(almost double) which is surprising since all these are replicas of each other. I am running Cassandra 0.8.4. Is there an explanation for this behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be the cause for this? Thanks -Raj
Re: Unbalanced ring in Cassandra 0.8.4
Did you start all your nodes at the correct tokens or did you balance by moving them? Moving nodes around won't delete unneeded data after the move is done. Try running 'nodetool cleanup' on all of your nodes. On Fri, Jun 15, 2012 at 12:24 PM, Raj N raj.cassan...@gmail.com wrote: Actually I am not worried about the percentage. Its the data I am concerned about. Look at the first node. It has 102.07GB data. And the other nodes have around 60 GB(one has 69, but lets ignore that one). I am not understanding why the first node has almost double the data. Thanks -Raj On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com wrote: This is just a known problem with the nodetool output and multiple DCs. Your configuration is correct. The problem with nodetool is fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-3412 On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned tokens using the first strategy(adding 1) mentioned here - http://wiki.apache.org/cassandra/Operations?#Token_selection But when I run nodetool ring on my cluster, this is the result I get - Address DC Rack Status State Load Owns Token 113427455640312814857969558651062452225 172.17.72.91 DC1 RAC13 Up Normal 102.07 GB 33.33% 0 45.10.80.144 DC2 RAC5 Up Normal 59.1 GB 0.00% 1 172.17.72.93 DC1 RAC18 Up Normal 59.57 GB 33.33% 56713727820156407428984779325531226112 45.10.80.146 DC2 RAC7 Up Normal 59.64 GB 0.00% 56713727820156407428984779325531226113 172.17.72.95 DC1 RAC19 Up Normal 69.58 GB 33.33% 113427455640312814857969558651062452224 45.10.80.148 DC2 RAC9 Up Normal 59.31 GB 0.00% 113427455640312814857969558651062452225 As you can see the first node has considerably more load than the others(almost double) which is surprising since all these are replicas of each other. I am running Cassandra 0.8.4. Is there an explanation for this behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be the cause for this? Thanks -Raj
Re: Unbalanced ring in Cassandra 0.8.4
I did run nodetool move. But that was when I was setting up the cluster which means I didn't have any data at that time. -Raj On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey n...@datastax.com wrote: Did you start all your nodes at the correct tokens or did you balance by moving them? Moving nodes around won't delete unneeded data after the move is done. Try running 'nodetool cleanup' on all of your nodes. On Fri, Jun 15, 2012 at 12:24 PM, Raj N raj.cassan...@gmail.com wrote: Actually I am not worried about the percentage. Its the data I am concerned about. Look at the first node. It has 102.07GB data. And the other nodes have around 60 GB(one has 69, but lets ignore that one). I am not understanding why the first node has almost double the data. Thanks -Raj On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com wrote: This is just a known problem with the nodetool output and multiple DCs. Your configuration is correct. The problem with nodetool is fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-3412 On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned tokens using the first strategy(adding 1) mentioned here - http://wiki.apache.org/cassandra/Operations?#Token_selection But when I run nodetool ring on my cluster, this is the result I get - Address DC Rack Status State LoadOwnsToken 113427455640312814857969558651062452225 172.17.72.91DC1 RAC13 Up Normal 102.07 GB 33.33% 0 45.10.80.144DC2 RAC5 Up Normal 59.1 GB 0.00% 1 172.17.72.93DC1 RAC18 Up Normal 59.57 GB33.33% 56713727820156407428984779325531226112 45.10.80.146DC2 RAC7 Up Normal 59.64 GB0.00% 56713727820156407428984779325531226113 172.17.72.95DC1 RAC19 Up Normal 69.58 GB33.33% 113427455640312814857969558651062452224 45.10.80.148DC2 RAC9 Up Normal 59.31 GB0.00% 113427455640312814857969558651062452225 As you can see the first node has considerably more load than the others(almost double) which is surprising since all these are replicas of each other. I am running Cassandra 0.8.4. Is there an explanation for this behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be the cause for this? Thanks -Raj