Re: Clarification on how multi-DC replication works
So is that picture incorrect, or just incomplete missing the piece on how the nodes reply to the coordinator node. On Tue, Feb 11, 2014 at 9:38 AM, sankalp kohli kohlisank...@gmail.comwrote: @Mullen, I think your diagram does not answer the question on responses. @Sameer All nodes in DC2 will replay back to the co-ordinator in DC1. So if you have replication of DC1:3,DC2:3. A co-ordinator node will get 6 responses back if it is not in the replica set. Hope that answers your question. On Tue, Feb 11, 2014 at 8:16 AM, Mullen, Robert robert.mul...@pearson.com wrote: I had the same question a while back and put together this picture to help me understand the flow of data for multi region deployments. Hope that it helps. On Mon, Feb 10, 2014 at 7:52 PM, Sameer Farooqui sam...@blueplastic.comwrote: Hi, I was hoping someone could clarify a point about multi-DC replication. Let's say I have 2 data centers configured with replication factor = 3 in each DC. My client app is sitting in DC 1 and is able to intelligently pick a coordinator that will also be a replica partner. So the client app sends a write with consistency for DC1 = Q and consistency for DC2 = Q to a coordinator node in DC1. That coordinator in DC1 forwards the write to 2 other nodes in DC1 and a coordinator in DC2. Is it correct that all 3 nodes in DC2 will respond back to the original coordinator in DC1? Or will the DC2 nodes respond back to the DC2 coordinator? Let's say one of the replica nodes in DC2 is down. Who will hold the hint for that node? The original coordinator in DC1 or the coordinator in DC2?
Re: Clarification on how multi-DC replication works
Thanks for the feedback. The picture shows a sample request, which is why the coordinator points to two specific nodes. What I was trying to convey that the coordinator node would ensure that 2 of the 3 nodes were written to before reporting success to the client. I found the article here, it says that the non-blocking writes to the 2nd data center are asynchronous. Is this blog post incorrect as well? http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers I'd like to get clarification on how this works and hope to clear up some of the misinformation about multi-DC replication that is out there. I like a lot of the features of cassandra and enjoy working with it, but the amount of conflicting information out on the web is a little disconcerting sometimes. thanks, Rob On Tue, Feb 11, 2014 at 11:02 AM, Andrey Ilinykh ailin...@gmail.com wrote: 1. reply part is missing. 2. It is confusing a little bit. I would not use term synchronous. Everything is asynchronous here. Coordinator writes data to all local nodes and waits for response from ANY two of them (in case of quorum). In your picture it looks like the coordinator first makes decision what nodes should reply. It is not correct. On Tue, Feb 11, 2014 at 9:36 AM, Mullen, Robert robert.mul...@pearson.com wrote: So is that picture incorrect, or just incomplete missing the piece on how the nodes reply to the coordinator node. On Tue, Feb 11, 2014 at 9:38 AM, sankalp kohli kohlisank...@gmail.comwrote: @Mullen, I think your diagram does not answer the question on responses. @Sameer All nodes in DC2 will replay back to the co-ordinator in DC1. So if you have replication of DC1:3,DC2:3. A co-ordinator node will get 6 responses back if it is not in the replica set. Hope that answers your question. On Tue, Feb 11, 2014 at 8:16 AM, Mullen, Robert robert.mul...@pearson.com wrote: I had the same question a while back and put together this picture to help me understand the flow of data for multi region deployments. Hope that it helps. On Mon, Feb 10, 2014 at 7:52 PM, Sameer Farooqui sam...@blueplastic.com wrote: Hi, I was hoping someone could clarify a point about multi-DC replication. Let's say I have 2 data centers configured with replication factor = 3 in each DC. My client app is sitting in DC 1 and is able to intelligently pick a coordinator that will also be a replica partner. So the client app sends a write with consistency for DC1 = Q and consistency for DC2 = Q to a coordinator node in DC1. That coordinator in DC1 forwards the write to 2 other nodes in DC1 and a coordinator in DC2. Is it correct that all 3 nodes in DC2 will respond back to the original coordinator in DC1? Or will the DC2 nodes respond back to the DC2 coordinator? Let's say one of the replica nodes in DC2 is down. Who will hold the hint for that node? The original coordinator in DC1 or the coordinator in DC2?
Re: Clarification on how multi-DC replication works
The picture shows a sample request, which is why the coordinator points to two specific nodes. What I was trying to convey that the coordinator node would ensure that 2 of the 3 nodes were written to before reporting success to the client. This is my point. ANY 2 of 3. Your picture shows specific 2 of 3. True that, I know that and I'm not debating that. I am showing a single request sequence in that picture, and during a single request it will actually be a specific 2 of the 3 nodes. I found the article here, it says that the non-blocking writes to the 2nd data center are asynchronous. Is this blog post incorrect as well? http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers Why is it incorrect? Everything is asynchronous, both local and remote. The coordinator simply waits for response from local nodes. But it doesn't make it synchronous, because it waits for response from ANY 2 nodes. I wasn't saying it was incorrect, I was just looking for clarification if you thought that that blog post was misleading as well, as I've been sending people to that page for info on multi dc replication. It it was erroneous then I would have stopped sending them there. I was thinking that the response was synchronous more from the client's point of view, meaning that the app can't proceed until those specific operations were completed and a response was returned from cassandra. Thanks for the help in clarifying all of this, it is very much appreciated. Regards, Rob On Tue, Feb 11, 2014 at 11:25 AM, Andrey Ilinykh ailin...@gmail.com wrote: On Tue, Feb 11, 2014 at 10:14 AM, Mullen, Robert robert.mul...@pearson.com wrote: Thanks for the feedback. The picture shows a sample request, which is why the coordinator points to two specific nodes. What I was trying to convey that the coordinator node would ensure that 2 of the 3 nodes were written to before reporting success to the client. This is my point. ANY 2 of 3. Your picture shows specific 2 of 3. I found the article here, it says that the non-blocking writes to the 2nd data center are asynchronous. Is this blog post incorrect as well? http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers Why is it incorrect? Everything is asynchronous, both local and remote. The coordinator simply waits for response from local nodes. But it doesn't make it synchronous, because it waits for response from ANY 2 nodes.
Re: question about secondary index or not
Thanks for that info ondrej, I've never tested out secondary indexes as I've avoided them because of all the uncertainty around them, and your statement just adds to the uncertainty. Everything I had read said that secondary indexes were supposed to work well for columns with low cardinality, but I guess that's not always the case. peace, Rob On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote: Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.com wrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
Re: question about secondary index or not
I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks
Re: nodetool status owns % calculation after upgrade to 2.0.2
Oh man, you know what my problem was, I was not specifying the keyspace after nodetool status. After specifying the keyspace i get the 100% ownership like I would expect. nodetool status discsussions ubuntu@prd-usw2b-pr-01-dscsapi-cadb-0002:~$ nodetool status discussions Datacenter: us-east-1 = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 10.198.4.802.02 MB256 100.0% e31aecd5-1eb1-4ddb-85ac-7a4135618b66 use1d UN 10.198.2.20132.34 MB 256 100.0% 3253080f-09b6-47a6-9b66-da3d174d1101 use1c UN 10.198.0.249 1.77 MB256 100.0% 22b30bea-5643-43b5-8d98-6e0eafe4af75 use1b Datacenter: us-west-2 = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 10.198.20.51 1.2 MB 256 100.0% 6a40b500-cff4-4513-b26b-ea33048c1590 usw2c UN 10.198.16.92 1.46 MB256 100.0% 01989d0b-0f81-411b-a70e-f22f01189542 usw2a UN 10.198.18.125 2.14 MB256 100.0% aa746ed1-288c-414f-8d97-65fc867a5bdd usw2b As for the counts being off,rRunning nodetool repair discussions, which you're supposed to do after changing replication factor, fixed the fact that the counts were off, after doing that on the 6 nodes in my cluster, that one column family is returning a count of 60 on each node. Thanks for all the help here, I've only been working with cassandra for a couple of months now and there is a lot to learn. Thanks, Rob On Sun, Jan 5, 2014 at 11:55 PM, Or Sher or.sh...@gmail.com wrote: RandomPartitioner was the default at 1.2.* It looks like since 1.2 the default is Murmur3.. Not sure that's your problem if you say you've upgraded from 1.2.*.. On Mon, Jan 6, 2014 at 3:42 AM, Rob Mullen robert.mul...@pearson.comwrote: Do you know of the default changed? I'm pretty sure I never changed that setting the the config file. Sent from my iPhone On Jan 4, 2014, at 11:22 PM, Or Sher or.sh...@gmail.com wrote: Robert, is it possible you've changed the partitioner during the upgrade? (e.g. from RandomPartitioner to Murmur3Partitioner ?) On Sat, Jan 4, 2014 at 9:32 PM, Mullen, Robert robert.mul...@pearson.com wrote: The nodetool repair command (which took about 8 hours) seems to have sync'd the data in us-east, all 3 nodes returning 59 for the count now. I'm wondering if this has more to do with changing the replication factor from 2 to 3 and how 2.0.2 reports the % owned rather than the upgrade itself. I still don't understand why it's reporting 16% for each node when 100% seems to reflect the state of the cluster better. I didn't find any info in those issues you posted that would relate to the % changing from 100% -16%. On Sat, Jan 4, 2014 at 12:26 PM, Mullen, Robert robert.mul...@pearson.com wrote: from cql cqlshselect count(*) from topics; On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli rc...@eventbrite.comwrote: On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert robert.mul...@pearson.com wrote: I have a column family called topics which has a count of 47 on one node, 59 on another and 49 on another node. It was my understanding with a replication factor of 3 and 3 nodes in each ring that the nodes should be equal so I could lose a node in the ring and have no loss of data. Based upon that I would expect the counts across the nodes to all be 59 in this case. In what specific way are you counting rows? =Rob -- Or Sher -- Or Sher
Re: nodetool status owns % calculation after upgrade to 2.0.2
Hey Rob, Thanks for the reply. First, why would you upgrade to 2.0.2 when higher versions exist? I upgraded a while ago when 2.0.2 was the latest version, haven't upgraded since then as I'd like to figure out what's going on here before upgrading again. I was on vacation for a while too, so am just revisiting this after the holidays now. I am running in production but under very low usage with my API in alpha state, so I don't mind a bumpy road with 5 of the Z version, as the API matures to beta-GA I'll keep that info in mind. What do you mean by the counts are different across the nodes now? I have a column family called topics which has a count of 47 on one node, 59 on another and 49 on another node. It was my understanding with a replication factor of 3 and 3 nodes in each ring that the nodes should be equal so I could lose a node in the ring and have no loss of data. Based upon that I would expect the counts across the nodes to all be 59 in this case. thanks, Rob On Fri, Jan 3, 2014 at 5:14 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jan 3, 2014 at 3:33 PM, Mullen, Robert robert.mul...@pearson.comwrote: I have a multi region cluster with 3 nodes in each data center, ec2 us-east and and west. Prior to upgrading to 2.0.2 from 1.2.6, the owns % of each node was 100%, which made sense because I had a replication factor of 3 for each data center. After upgrading to 2.0.2 each node claims to own about 17% of the data now. First, why would you upgrade to 2.0.2 when higher versions exist? Second, are you running in production? If so, read this : https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ So a couple of questions: 1. Any idea why the owns % would have changed from 100% to 17% per node after upgrade? Because the display of this information has changed repeatedly over the years, including for bugfixes. https://issues.apache.org/jira/browse/CASSANDRA-3412 https://issues.apache.org/jira/browse/CASSANDRA-5076 https://issues.apache.org/jira/browse/CASSANDRA-4598 https://issues.apache.org/jira/browse/CASSANDRA-6168 https://issues.apache.org/jira/browse/CASSANDRA-5954 etc. 2. Is there anything else I can do to get the data back in sync between the nodes other than nodetool repair? What do you mean by the counts are different across the nodes now? It is pretty unlikely that you have lost any data, from what you have described. =Rob
Re: nodetool status owns % calculation after upgrade to 2.0.2
from cql cqlshselect count(*) from topics; On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli rc...@eventbrite.com wrote: On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert robert.mul...@pearson.com wrote: I have a column family called topics which has a count of 47 on one node, 59 on another and 49 on another node. It was my understanding with a replication factor of 3 and 3 nodes in each ring that the nodes should be equal so I could lose a node in the ring and have no loss of data. Based upon that I would expect the counts across the nodes to all be 59 in this case. In what specific way are you counting rows? =Rob
Re: nodetool status owns % calculation after upgrade to 2.0.2
The nodetool repair command (which took about 8 hours) seems to have sync'd the data in us-east, all 3 nodes returning 59 for the count now. I'm wondering if this has more to do with changing the replication factor from 2 to 3 and how 2.0.2 reports the % owned rather than the upgrade itself. I still don't understand why it's reporting 16% for each node when 100% seems to reflect the state of the cluster better. I didn't find any info in those issues you posted that would relate to the % changing from 100% -16%. On Sat, Jan 4, 2014 at 12:26 PM, Mullen, Robert robert.mul...@pearson.comwrote: from cql cqlshselect count(*) from topics; On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli rc...@eventbrite.com wrote: On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert robert.mul...@pearson.com wrote: I have a column family called topics which has a count of 47 on one node, 59 on another and 49 on another node. It was my understanding with a replication factor of 3 and 3 nodes in each ring that the nodes should be equal so I could lose a node in the ring and have no loss of data. Based upon that I would expect the counts across the nodes to all be 59 in this case. In what specific way are you counting rows? =Rob
nodetool status owns % calculation after upgrade to 2.0.2
Hello, I have a multi region cluster with 3 nodes in each data center, ec2 us-east and and west. Prior to upgrading to 2.0.2 from 1.2.6, the owns % of each node was 100%, which made sense because I had a replication factor of 3 for each data center. After upgrading to 2.0.2 each node claims to own about 17% of the data now. :~$ nodetool status Datacenter: us-west-2 = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN 10.198.20.51 958.16 KB 256 16.9% 6a40b500-cff4-4513-b26b-ea33048c1590 usw2c UN 10.198.18.125 776 KB 256 17.0% aa746ed1-288c-414f-8d97-65fc867a5bdd usw2b UN 10.198.16.92 1.39 MB256 16.4% 01989d0b-0f81-411b-a70e-f22f01189542 usw2a Datacenter: us-east-1 = Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN 10.198.0.249 1.11 MB256 16.3% 22b30bea-5643-43b5-8d98-6e0eafe4af75 use1b UN 10.198.4.801.22 MB256 16.4% e31aecd5-1eb1-4ddb-85ac-7a4135618b66 use1d UN 10.198.2.20137.27 MB 256 17.0% 3253080f-09b6-47a6-9b66-da3d174d1101 use1c I checked some of the data on each of the nodes in one column family and the counts are different across the nodes now. I'm trying to run a nodetool repair but it's been running for about 6 hours now. So a couple of questions: 1. Any idea why the owns % would have changed from 100% to 17% per node after upgrade? 2. Is there anything else I can do to get the data back in sync between the nodes other than nodetool repair? thanks, Rob