Re: Clarification on how multi-DC replication works

2014-02-11 Thread sankalp kohli
@Mullen,
I think your diagram does not answer the question on responses.
@Sameer
All nodes in DC2 will replay back to the co-ordinator in DC1. So if you
have replication of DC1:3,DC2:3. A co-ordinator node will get 6 responses
back if it is not in the replica set.
Hope that answers your question.


On Tue, Feb 11, 2014 at 8:16 AM, Mullen, Robert
robert.mul...@pearson.comwrote:

 I had the same question a while back and put together this picture to help
 me understand the flow of data for multi region deployments. Hope that it
 helps.


 On Mon, Feb 10, 2014 at 7:52 PM, Sameer Farooqui 
 sam...@blueplastic.comwrote:

 Hi,

 I was hoping someone could clarify a point about multi-DC replication.

 Let's say I have 2 data centers configured with replication factor = 3 in
 each DC.

 My client app is sitting in DC 1 and is able to intelligently pick a
 coordinator that will also be a replica partner.

 So the client app sends a write with consistency for DC1 = Q and
 consistency for DC2 = Q to a coordinator node in DC1.

 That coordinator in DC1 forwards the write to 2 other nodes in DC1 and a
 coordinator in DC2.

 Is it correct that all 3 nodes in DC2 will respond back to the original
 coordinator in DC1? Or will the DC2 nodes respond back to the DC2
 coordinator?

 Let's say one of the replica nodes in DC2 is down. Who will hold the hint
 for that node? The original coordinator in DC1 or the coordinator in DC2?





Re: Clarification on how multi-DC replication works

2014-02-11 Thread Mullen, Robert
So is that picture incorrect, or just incomplete missing the piece on how
the nodes reply to the coordinator node.


On Tue, Feb 11, 2014 at 9:38 AM, sankalp kohli kohlisank...@gmail.comwrote:

 @Mullen,
 I think your diagram does not answer the question on responses.
 @Sameer
 All nodes in DC2 will replay back to the co-ordinator in DC1. So if you
 have replication of DC1:3,DC2:3. A co-ordinator node will get 6 responses
 back if it is not in the replica set.
 Hope that answers your question.


 On Tue, Feb 11, 2014 at 8:16 AM, Mullen, Robert robert.mul...@pearson.com
  wrote:

 I had the same question a while back and put together this picture to
 help me understand the flow of data for multi region deployments. Hope that
 it helps.


 On Mon, Feb 10, 2014 at 7:52 PM, Sameer Farooqui 
 sam...@blueplastic.comwrote:

 Hi,

 I was hoping someone could clarify a point about multi-DC replication.

 Let's say I have 2 data centers configured with replication factor = 3
 in each DC.

 My client app is sitting in DC 1 and is able to intelligently pick a
 coordinator that will also be a replica partner.

 So the client app sends a write with consistency for DC1 = Q and
 consistency for DC2 = Q to a coordinator node in DC1.

 That coordinator in DC1 forwards the write to 2 other nodes in DC1 and a
 coordinator in DC2.

 Is it correct that all 3 nodes in DC2 will respond back to the original
 coordinator in DC1? Or will the DC2 nodes respond back to the DC2
 coordinator?

 Let's say one of the replica nodes in DC2 is down. Who will hold the
 hint for that node? The original coordinator in DC1 or the coordinator in
 DC2?






Re: Clarification on how multi-DC replication works

2014-02-11 Thread Andrey Ilinykh
1. reply part is missing.
2. It is confusing a little bit. I would not use term synchronous.
Everything is asynchronous here. Coordinator writes data to all local nodes
and waits for  response from ANY two of them (in case of quorum). In your
picture it looks like the coordinator first makes decision what nodes
should reply. It is not correct.


On Tue, Feb 11, 2014 at 9:36 AM, Mullen, Robert
robert.mul...@pearson.comwrote:

 So is that picture incorrect, or just incomplete missing the piece on how
 the nodes reply to the coordinator node.


 On Tue, Feb 11, 2014 at 9:38 AM, sankalp kohli kohlisank...@gmail.comwrote:

 @Mullen,
 I think your diagram does not answer the question on responses.
 @Sameer
 All nodes in DC2 will replay back to the co-ordinator in DC1. So if you
 have replication of DC1:3,DC2:3. A co-ordinator node will get 6 responses
 back if it is not in the replica set.
 Hope that answers your question.


 On Tue, Feb 11, 2014 at 8:16 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I had the same question a while back and put together this picture to
 help me understand the flow of data for multi region deployments. Hope that
 it helps.


 On Mon, Feb 10, 2014 at 7:52 PM, Sameer Farooqui sam...@blueplastic.com
  wrote:

 Hi,

 I was hoping someone could clarify a point about multi-DC replication.

 Let's say I have 2 data centers configured with replication factor = 3
 in each DC.

 My client app is sitting in DC 1 and is able to intelligently pick a
 coordinator that will also be a replica partner.

 So the client app sends a write with consistency for DC1 = Q and
 consistency for DC2 = Q to a coordinator node in DC1.

 That coordinator in DC1 forwards the write to 2 other nodes in DC1 and
 a coordinator in DC2.

 Is it correct that all 3 nodes in DC2 will respond back to the original
 coordinator in DC1? Or will the DC2 nodes respond back to the DC2
 coordinator?

 Let's say one of the replica nodes in DC2 is down. Who will hold the
 hint for that node? The original coordinator in DC1 or the coordinator in
 DC2?







Re: Clarification on how multi-DC replication works

2014-02-11 Thread Mullen, Robert
Thanks for the feedback.

The picture shows a sample request, which is why the coordinator points to
two specific nodes.  What I was trying to convey that the coordinator node
would ensure that 2 of the 3 nodes were written to before reporting success
to the client.

I found the article here, it says that the non-blocking writes to the 2nd
data center are asynchronous.  Is this blog post incorrect as well?
http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers

I'd like to get clarification on how this works and hope to clear up some
of the misinformation about multi-DC replication that is out there.  I like
a lot of the features of cassandra and enjoy working with it, but the
amount of conflicting information out on the web is a little disconcerting
sometimes.

thanks,
Rob

On Tue, Feb 11, 2014 at 11:02 AM, Andrey Ilinykh ailin...@gmail.com wrote:

 1. reply part is missing.
 2. It is confusing a little bit. I would not use term synchronous.
 Everything is asynchronous here. Coordinator writes data to all local nodes
 and waits for  response from ANY two of them (in case of quorum). In your
 picture it looks like the coordinator first makes decision what nodes
 should reply. It is not correct.


 On Tue, Feb 11, 2014 at 9:36 AM, Mullen, Robert robert.mul...@pearson.com
  wrote:

 So is that picture incorrect, or just incomplete missing the piece on how
 the nodes reply to the coordinator node.


 On Tue, Feb 11, 2014 at 9:38 AM, sankalp kohli kohlisank...@gmail.comwrote:

 @Mullen,
 I think your diagram does not answer the question on responses.
 @Sameer
 All nodes in DC2 will replay back to the co-ordinator in DC1. So if you
 have replication of DC1:3,DC2:3. A co-ordinator node will get 6 responses
 back if it is not in the replica set.
 Hope that answers your question.


 On Tue, Feb 11, 2014 at 8:16 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I had the same question a while back and put together this picture to
 help me understand the flow of data for multi region deployments. Hope that
 it helps.


 On Mon, Feb 10, 2014 at 7:52 PM, Sameer Farooqui 
 sam...@blueplastic.com wrote:

 Hi,

 I was hoping someone could clarify a point about multi-DC replication.

 Let's say I have 2 data centers configured with replication factor = 3
 in each DC.

 My client app is sitting in DC 1 and is able to intelligently pick a
 coordinator that will also be a replica partner.

 So the client app sends a write with consistency for DC1 = Q and
 consistency for DC2 = Q to a coordinator node in DC1.

 That coordinator in DC1 forwards the write to 2 other nodes in DC1 and
 a coordinator in DC2.

 Is it correct that all 3 nodes in DC2 will respond back to the
 original coordinator in DC1? Or will the DC2 nodes respond back to the DC2
 coordinator?

 Let's say one of the replica nodes in DC2 is down. Who will hold the
 hint for that node? The original coordinator in DC1 or the coordinator in
 DC2?








Re: Clarification on how multi-DC replication works

2014-02-11 Thread Andrey Ilinykh
On Tue, Feb 11, 2014 at 10:14 AM, Mullen, Robert
robert.mul...@pearson.comwrote:

 Thanks for the feedback.

 The picture shows a sample request, which is why the coordinator points to
 two specific nodes.  What I was trying to convey that the coordinator node
 would ensure that 2 of the 3 nodes were written to before reporting success
 to the client.

This is my point. ANY 2 of 3. Your picture shows specific 2 of 3.




 I found the article here, it says that the non-blocking writes to the 2nd
 data center are asynchronous.  Is this blog post incorrect as well?

 http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers


Why is it incorrect? Everything is asynchronous, both local and remote. The
coordinator simply waits for response from local nodes. But it doesn't make
it synchronous, because it waits for response from ANY 2 nodes.


Re: Clarification on how multi-DC replication works

2014-02-11 Thread Mullen, Robert

 The picture shows a sample request, which is why the coordinator points to
 two specific nodes.  What I was trying to convey that the coordinator node
 would ensure that 2 of the 3 nodes were written to before reporting success
 to the client.

This is my point. ANY 2 of 3. Your picture shows specific 2 of 3.

True that, I know that and I'm not debating that.  I am showing a single
request sequence in that picture, and during a single request it will
actually be a specific 2 of the 3 nodes.



 I found the article here, it says that the non-blocking writes to the 2nd
 data center are asynchronous.  Is this blog post incorrect as well?

 http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers


Why is it incorrect? Everything is asynchronous, both local and remote. The
coordinator simply waits for response from local nodes. But it doesn't make
it synchronous, because it waits for response from ANY 2 nodes.

I wasn't saying it was incorrect, I was just looking for clarification if
you thought that that blog post was misleading as well, as I've been
sending people to that page for info on multi dc replication.  It it was
erroneous then I would have stopped sending them there.   I was thinking
that the response was synchronous more from the client's point of view,
meaning that the app can't proceed until those specific operations were
completed and a response was returned from cassandra.

Thanks for the help in clarifying all of this, it is very much appreciated.
Regards,
Rob


On Tue, Feb 11, 2014 at 11:25 AM, Andrey Ilinykh ailin...@gmail.com wrote:




 On Tue, Feb 11, 2014 at 10:14 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 Thanks for the feedback.

 The picture shows a sample request, which is why the coordinator points
 to two specific nodes.  What I was trying to convey that the coordinator
 node would ensure that 2 of the 3 nodes were written to before reporting
 success to the client.

 This is my point. ANY 2 of 3. Your picture shows specific 2 of 3.




 I found the article here, it says that the non-blocking writes to the 2nd
 data center are asynchronous.  Is this blog post incorrect as well?

 http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers


 Why is it incorrect? Everything is asynchronous, both local and remote.
 The coordinator simply waits for response from local nodes. But it doesn't
 make it synchronous, because it waits for response from ANY 2 nodes.



Re: Clarification on how multi-DC replication works

2014-02-11 Thread graham sanderson
slightly off topic, but does anyone know off the top of their head what happens 
if data is being written at LOCAL_QUORUM to a multi data center setup faster 
than the inter data center link can handle… something has to block, throw an 
exception, die, or have unbounded growth (memory, threads, on disk hints etc) 
somewhere along the line ;-)

I haven’t found any good info on this via searching the web… I have not studied 
the code in detail as we have not yet set up a multi-DC cluster. (Note we’re 
using 2.0.5).

Note we do not intend to do this in practice, but it might happen in some short 
bursts… obviously we can test this once we have such a setup, but any info 
would help us plan how to handle it, and/or throttle at either cassandra config 
level, or app level.

On Feb 11, 2014, at 12:33 PM, Mullen, Robert robert.mul...@pearson.com wrote:

 The picture shows a sample request, which is why the coordinator points to 
 two specific nodes.  What I was trying to convey that the coordinator node 
 would ensure that 2 of the 3 nodes were written to before reporting success 
 to the client.
 This is my point. ANY 2 of 3. Your picture shows specific 2 of 3.
 
 True that, I know that and I'm not debating that.  I am showing a single 
 request sequence in that picture, and during a single request it will 
 actually be a specific 2 of the 3 nodes.   
 
 
 
 I found the article here, it says that the non-blocking writes to the 2nd 
 data center are asynchronous.  Is this blog post incorrect as well?
 http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
 
 Why is it incorrect? Everything is asynchronous, both local and remote. The 
 coordinator simply waits for response from local nodes. But it doesn't make 
 it synchronous, because it waits for response from ANY 2 nodes. 
 
 I wasn't saying it was incorrect, I was just looking for clarification if you 
 thought that that blog post was misleading as well, as I've been sending 
 people to that page for info on multi dc replication.  It it was erroneous 
 then I would have stopped sending them there.   I was thinking that the 
 response was synchronous more from the client's point of view, meaning that 
 the app can't proceed until those specific operations were completed and a 
 response was returned from cassandra.  
 
 Thanks for the help in clarifying all of this, it is very much appreciated.
 Regards,
 Rob
 
 
 On Tue, Feb 11, 2014 at 11:25 AM, Andrey Ilinykh ailin...@gmail.com wrote:
 
 
 
 On Tue, Feb 11, 2014 at 10:14 AM, Mullen, Robert robert.mul...@pearson.com 
 wrote:
 Thanks for the feedback.
 
 The picture shows a sample request, which is why the coordinator points to 
 two specific nodes.  What I was trying to convey that the coordinator node 
 would ensure that 2 of the 3 nodes were written to before reporting success 
 to the client.
 This is my point. ANY 2 of 3. Your picture shows specific 2 of 3.
 
  
 
 I found the article here, it says that the non-blocking writes to the 2nd 
 data center are asynchronous.  Is this blog post incorrect as well?
 http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
 
 Why is it incorrect? Everything is asynchronous, both local and remote. The 
 coordinator simply waits for response from local nodes. But it doesn't make 
 it synchronous, because it waits for response from ANY 2 nodes. 
 



smime.p7s
Description: S/MIME cryptographic signature


Clarification on how multi-DC replication works

2014-02-10 Thread Sameer Farooqui
Hi,

I was hoping someone could clarify a point about multi-DC replication.

Let's say I have 2 data centers configured with replication factor = 3 in
each DC.

My client app is sitting in DC 1 and is able to intelligently pick a
coordinator that will also be a replica partner.

So the client app sends a write with consistency for DC1 = Q and
consistency for DC2 = Q to a coordinator node in DC1.

That coordinator in DC1 forwards the write to 2 other nodes in DC1 and a
coordinator in DC2.

Is it correct that all 3 nodes in DC2 will respond back to the original
coordinator in DC1? Or will the DC2 nodes respond back to the DC2
coordinator?

Let's say one of the replica nodes in DC2 is down. Who will hold the hint
for that node? The original coordinator in DC1 or the coordinator in DC2?