Re: Clarification on how multi-DC replication works

2014-02-11 Thread Mullen, Robert
So is that picture incorrect, or just incomplete missing the piece on how
the nodes reply to the coordinator node.


On Tue, Feb 11, 2014 at 9:38 AM, sankalp kohli kohlisank...@gmail.comwrote:

 @Mullen,
 I think your diagram does not answer the question on responses.
 @Sameer
 All nodes in DC2 will replay back to the co-ordinator in DC1. So if you
 have replication of DC1:3,DC2:3. A co-ordinator node will get 6 responses
 back if it is not in the replica set.
 Hope that answers your question.


 On Tue, Feb 11, 2014 at 8:16 AM, Mullen, Robert robert.mul...@pearson.com
  wrote:

 I had the same question a while back and put together this picture to
 help me understand the flow of data for multi region deployments. Hope that
 it helps.


 On Mon, Feb 10, 2014 at 7:52 PM, Sameer Farooqui 
 sam...@blueplastic.comwrote:

 Hi,

 I was hoping someone could clarify a point about multi-DC replication.

 Let's say I have 2 data centers configured with replication factor = 3
 in each DC.

 My client app is sitting in DC 1 and is able to intelligently pick a
 coordinator that will also be a replica partner.

 So the client app sends a write with consistency for DC1 = Q and
 consistency for DC2 = Q to a coordinator node in DC1.

 That coordinator in DC1 forwards the write to 2 other nodes in DC1 and a
 coordinator in DC2.

 Is it correct that all 3 nodes in DC2 will respond back to the original
 coordinator in DC1? Or will the DC2 nodes respond back to the DC2
 coordinator?

 Let's say one of the replica nodes in DC2 is down. Who will hold the
 hint for that node? The original coordinator in DC1 or the coordinator in
 DC2?






Re: Clarification on how multi-DC replication works

2014-02-11 Thread Mullen, Robert
Thanks for the feedback.

The picture shows a sample request, which is why the coordinator points to
two specific nodes.  What I was trying to convey that the coordinator node
would ensure that 2 of the 3 nodes were written to before reporting success
to the client.

I found the article here, it says that the non-blocking writes to the 2nd
data center are asynchronous.  Is this blog post incorrect as well?
http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers

I'd like to get clarification on how this works and hope to clear up some
of the misinformation about multi-DC replication that is out there.  I like
a lot of the features of cassandra and enjoy working with it, but the
amount of conflicting information out on the web is a little disconcerting
sometimes.

thanks,
Rob

On Tue, Feb 11, 2014 at 11:02 AM, Andrey Ilinykh ailin...@gmail.com wrote:

 1. reply part is missing.
 2. It is confusing a little bit. I would not use term synchronous.
 Everything is asynchronous here. Coordinator writes data to all local nodes
 and waits for  response from ANY two of them (in case of quorum). In your
 picture it looks like the coordinator first makes decision what nodes
 should reply. It is not correct.


 On Tue, Feb 11, 2014 at 9:36 AM, Mullen, Robert robert.mul...@pearson.com
  wrote:

 So is that picture incorrect, or just incomplete missing the piece on how
 the nodes reply to the coordinator node.


 On Tue, Feb 11, 2014 at 9:38 AM, sankalp kohli kohlisank...@gmail.comwrote:

 @Mullen,
 I think your diagram does not answer the question on responses.
 @Sameer
 All nodes in DC2 will replay back to the co-ordinator in DC1. So if you
 have replication of DC1:3,DC2:3. A co-ordinator node will get 6 responses
 back if it is not in the replica set.
 Hope that answers your question.


 On Tue, Feb 11, 2014 at 8:16 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I had the same question a while back and put together this picture to
 help me understand the flow of data for multi region deployments. Hope that
 it helps.


 On Mon, Feb 10, 2014 at 7:52 PM, Sameer Farooqui 
 sam...@blueplastic.com wrote:

 Hi,

 I was hoping someone could clarify a point about multi-DC replication.

 Let's say I have 2 data centers configured with replication factor = 3
 in each DC.

 My client app is sitting in DC 1 and is able to intelligently pick a
 coordinator that will also be a replica partner.

 So the client app sends a write with consistency for DC1 = Q and
 consistency for DC2 = Q to a coordinator node in DC1.

 That coordinator in DC1 forwards the write to 2 other nodes in DC1 and
 a coordinator in DC2.

 Is it correct that all 3 nodes in DC2 will respond back to the
 original coordinator in DC1? Or will the DC2 nodes respond back to the DC2
 coordinator?

 Let's say one of the replica nodes in DC2 is down. Who will hold the
 hint for that node? The original coordinator in DC1 or the coordinator in
 DC2?








Re: Clarification on how multi-DC replication works

2014-02-11 Thread Mullen, Robert

 The picture shows a sample request, which is why the coordinator points to
 two specific nodes.  What I was trying to convey that the coordinator node
 would ensure that 2 of the 3 nodes were written to before reporting success
 to the client.

This is my point. ANY 2 of 3. Your picture shows specific 2 of 3.

True that, I know that and I'm not debating that.  I am showing a single
request sequence in that picture, and during a single request it will
actually be a specific 2 of the 3 nodes.



 I found the article here, it says that the non-blocking writes to the 2nd
 data center are asynchronous.  Is this blog post incorrect as well?

 http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers


Why is it incorrect? Everything is asynchronous, both local and remote. The
coordinator simply waits for response from local nodes. But it doesn't make
it synchronous, because it waits for response from ANY 2 nodes.

I wasn't saying it was incorrect, I was just looking for clarification if
you thought that that blog post was misleading as well, as I've been
sending people to that page for info on multi dc replication.  It it was
erroneous then I would have stopped sending them there.   I was thinking
that the response was synchronous more from the client's point of view,
meaning that the app can't proceed until those specific operations were
completed and a response was returned from cassandra.

Thanks for the help in clarifying all of this, it is very much appreciated.
Regards,
Rob


On Tue, Feb 11, 2014 at 11:25 AM, Andrey Ilinykh ailin...@gmail.com wrote:




 On Tue, Feb 11, 2014 at 10:14 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 Thanks for the feedback.

 The picture shows a sample request, which is why the coordinator points
 to two specific nodes.  What I was trying to convey that the coordinator
 node would ensure that 2 of the 3 nodes were written to before reporting
 success to the client.

 This is my point. ANY 2 of 3. Your picture shows specific 2 of 3.




 I found the article here, it says that the non-blocking writes to the 2nd
 data center are asynchronous.  Is this blog post incorrect as well?

 http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers


 Why is it incorrect? Everything is asynchronous, both local and remote.
 The coordinator simply waits for response from local nodes. But it doesn't
 make it synchronous, because it waits for response from ANY 2 nodes.



Re: question about secondary index or not

2014-01-29 Thread Mullen, Robert
Thanks for that info ondrej, I've never tested out secondary indexes as
I've avoided them because of all the uncertainty around them, and your
statement just adds to the uncertainty.  Everything I had read said that
secondary indexes were supposed to work well for columns with low
cardinality, but I guess that's not always the case.

peace,
Rob


On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote:

 Hi,

 we had a similar use case. Just do the filtering client-side, the #2
 example performs horribly, secondary indexes on something dividing the set
 into two roughly the same size subsets just don't work.

 Give it a try on localhost with just a couple of records (150.000), you
 will see.

 regards,

 ondrej


 On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 in my #2 example:
 select * from people where company_id='xxx' and gender='male'

 I already specify the first part of the primary key(row key) in my where
 clause, so how does the secondary indexed column gender='male help
 determine which row to return? It is more like filtering a list of column
 from a row(which is exactly I can do that in #1 example).
 But then if I don't create index first, the cql statement will run into
 syntax error.




 On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I would do #2.   Take a look at this blog which talks about secondary
 indexes, cardinality, and what it means for cassandra.   Secondary indexes
 in cassandra are a different beast, so often old rules of thumb about
 indexes don't apply.   http://www.wentnet.com/blog/?p=77


 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:

 Generally indexes on binary fields true/false male/female are not
 terrible effective.


 On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.comwrote:

 I have a simple column family like the following

 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );

 if I want to find out all the male employee given a company id, I
 can do

 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
 gender column value equal to male

 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'


 I though #2 seems more appropriate, but I also thought the secondary
 index is helping only locating the primary row key, with the select clause
 in #2, is it more efficient than #1 where application responsible loop
 through the result and filter the right content?

 (
 It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
 select * from people where gender='male
 )

 thanks








Re: question about secondary index or not

2014-01-28 Thread Mullen, Robert
I would do #2.   Take a look at this blog which talks about secondary
indexes, cardinality, and what it means for cassandra.   Secondary indexes
in cassandra are a different beast, so often old rules of thumb about
indexes don't apply.   http://www.wentnet.com/blog/?p=77


On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Generally indexes on binary fields true/false male/female are not terrible
 effective.


 On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 I have a simple column family like the following

 create table people(
 company_id text,
 employee_id text,
 gender text,
 primary key(company_id, employee_id)
 );

 if I want to find out all the male employee given a company id, I can do

 1/
 select * from people where company_id='
 and loop through the result efficiently to pick the employee who has
 gender column value equal to male

 2/
 add a seconday index
 create index gender_index on people(gender)
 select * from people where company_id='xxx' and gender='male'


 I though #2 seems more appropriate, but I also thought the secondary
 index is helping only locating the primary row key, with the select clause
 in #2, is it more efficient than #1 where application responsible loop
 through the result and filter the right content?

 (
 It totally make sense if I only need to find out all the male
 employee(and not within a company) by using
 select * from people where gender='male
 )

 thanks





Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-06 Thread Mullen, Robert
Oh man, you know what my problem was, I was not specifying the keyspace
after nodetool status. After specifying the keyspace i get the 100%
ownership like I would expect.

nodetool status discsussions
ubuntu@prd-usw2b-pr-01-dscsapi-cadb-0002:~$ nodetool status discussions
Datacenter: us-east-1
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns (effective)  Host ID
  Rack
UN  10.198.4.802.02 MB256 100.0%
 e31aecd5-1eb1-4ddb-85ac-7a4135618b66  use1d
UN  10.198.2.20132.34 MB  256 100.0%
 3253080f-09b6-47a6-9b66-da3d174d1101  use1c
UN  10.198.0.249   1.77 MB256 100.0%
 22b30bea-5643-43b5-8d98-6e0eafe4af75  use1b
Datacenter: us-west-2
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns (effective)  Host ID
  Rack
UN  10.198.20.51   1.2 MB 256 100.0%
 6a40b500-cff4-4513-b26b-ea33048c1590  usw2c
UN  10.198.16.92   1.46 MB256 100.0%
 01989d0b-0f81-411b-a70e-f22f01189542  usw2a
UN  10.198.18.125  2.14 MB256 100.0%
 aa746ed1-288c-414f-8d97-65fc867a5bdd  usw2b


As for the counts being off,rRunning nodetool repair discussions, which
you're supposed to do after changing replication factor, fixed the fact
that the counts were off, after doing that on the 6 nodes in my cluster,
that one column family is returning a count of 60 on each node.

Thanks for all the help here, I've only been working with cassandra for a
couple of months now and there is a lot to learn.

Thanks,
Rob


On Sun, Jan 5, 2014 at 11:55 PM, Or Sher or.sh...@gmail.com wrote:

 RandomPartitioner was the default at   1.2.*
 It looks like since 1.2 the default is Murmur3..
 Not sure that's your problem if you say you've upgraded from 1.2.*..


 On Mon, Jan 6, 2014 at 3:42 AM, Rob Mullen robert.mul...@pearson.comwrote:

 Do you know of the default changed?   I'm pretty sure I never changed
 that setting the the config file.

 Sent from my iPhone

 On Jan 4, 2014, at 11:22 PM, Or Sher or.sh...@gmail.com wrote:

 Robert, is it possible you've changed the partitioner during the upgrade?
 (e.g. from RandomPartitioner to Murmur3Partitioner ?)


 On Sat, Jan 4, 2014 at 9:32 PM, Mullen, Robert robert.mul...@pearson.com
  wrote:

 The nodetool repair command (which took about 8 hours) seems to have
 sync'd the data in us-east, all 3 nodes returning 59 for the count now.
  I'm wondering if this has more to do with changing the replication factor
 from 2 to 3 and how 2.0.2 reports the % owned rather than the upgrade
 itself.  I still don't understand why it's reporting 16% for each node when
 100% seems to reflect the state of the cluster better.  I didn't find any
 info in those issues you posted that would relate to the % changing from
 100% -16%.


 On Sat, Jan 4, 2014 at 12:26 PM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 from cql
 cqlshselect count(*) from topics;



 On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli rc...@eventbrite.comwrote:

 On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I have a column family called topics which has a count of 47 on one
 node, 59 on another and 49 on another node. It was my understanding with 
 a
 replication factor of 3 and 3 nodes in each ring that the nodes should be
 equal so I could lose a node in the ring and have no loss of data.  Based
 upon that I would expect the counts across the nodes to all be 59 in this
 case.


 In what specific way are you counting rows?

 =Rob






 --
 Or Sher




 --
 Or Sher



Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-04 Thread Mullen, Robert
Hey Rob,
Thanks for the reply.

First, why would you upgrade to 2.0.2 when higher versions exist?
I upgraded a while ago when 2.0.2 was the latest version, haven't upgraded
since then as I'd like to figure out what's going on here before upgrading
again.  I was on vacation for a while too, so am just revisiting this after
the holidays now.

I am running in production but under very low usage with my API in alpha
state, so I don't mind a bumpy road with 5 of the Z version, as the API
matures to beta-GA I'll keep that info in mind.

What do you mean by the counts are different across the nodes now?
I have a column family called topics which has a count of 47 on one node,
59 on another and 49 on another node. It was my understanding with a
replication factor of 3 and 3 nodes in each ring that the nodes should be
equal so I could lose a node in the ring and have no loss of data.  Based
upon that I would expect the counts across the nodes to all be 59 in this
case.

thanks,
Rob



On Fri, Jan 3, 2014 at 5:14 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Jan 3, 2014 at 3:33 PM, Mullen, Robert 
 robert.mul...@pearson.comwrote:

 I have a multi region cluster with 3 nodes in each data center, ec2
 us-east and and west.  Prior to upgrading to 2.0.2 from 1.2.6, the owns %
 of each node was 100%, which made sense because I had a replication factor
 of 3 for each data center.  After upgrading to 2.0.2 each node claims to
 own about 17% of the data now.


 First, why would you upgrade to 2.0.2 when higher versions exist?

 Second, are you running in production? If so, read this :
 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/


 So a couple of questions:
 1.  Any idea why the owns % would have changed from 100% to 17% per node
 after upgrade?


 Because the display of this information has changed repeatedly over the
 years, including for bugfixes.

 https://issues.apache.org/jira/browse/CASSANDRA-3412
 https://issues.apache.org/jira/browse/CASSANDRA-5076
 https://issues.apache.org/jira/browse/CASSANDRA-4598
 https://issues.apache.org/jira/browse/CASSANDRA-6168
 https://issues.apache.org/jira/browse/CASSANDRA-5954

 etc.

 2. Is there anything else I can do to get the data back in sync between
 the nodes other than nodetool repair?


 What do you mean by the counts are different across the nodes now?

 It is pretty unlikely that you have lost any data, from what you have
 described.

 =Rob




Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-04 Thread Mullen, Robert
from cql
cqlshselect count(*) from topics;



On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli rc...@eventbrite.com wrote:

 On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert robert.mul...@pearson.com
  wrote:

 I have a column family called topics which has a count of 47 on one
 node, 59 on another and 49 on another node. It was my understanding with a
 replication factor of 3 and 3 nodes in each ring that the nodes should be
 equal so I could lose a node in the ring and have no loss of data.  Based
 upon that I would expect the counts across the nodes to all be 59 in this
 case.


 In what specific way are you counting rows?

 =Rob



Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-04 Thread Mullen, Robert
The nodetool repair command (which took about 8 hours) seems to have sync'd
the data in us-east, all 3 nodes returning 59 for the count now.  I'm
wondering if this has more to do with changing the replication factor from
2 to 3 and how 2.0.2 reports the % owned rather than the upgrade itself.  I
still don't understand why it's reporting 16% for each node when 100% seems
to reflect the state of the cluster better.  I didn't find any info in
those issues you posted that would relate to the % changing from 100%
-16%.


On Sat, Jan 4, 2014 at 12:26 PM, Mullen, Robert
robert.mul...@pearson.comwrote:

 from cql
 cqlshselect count(*) from topics;



 On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli rc...@eventbrite.com wrote:

 On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert 
 robert.mul...@pearson.com wrote:

 I have a column family called topics which has a count of 47 on one
 node, 59 on another and 49 on another node. It was my understanding with a
 replication factor of 3 and 3 nodes in each ring that the nodes should be
 equal so I could lose a node in the ring and have no loss of data.  Based
 upon that I would expect the counts across the nodes to all be 59 in this
 case.


 In what specific way are you counting rows?

 =Rob





nodetool status owns % calculation after upgrade to 2.0.2

2014-01-03 Thread Mullen, Robert
Hello,
I have a multi region cluster with 3 nodes in each data center, ec2 us-east
and and west.  Prior to upgrading to 2.0.2 from 1.2.6, the owns % of each
node was 100%, which made sense because I had a replication factor of 3 for
each data center.  After upgrading to 2.0.2 each node claims to own about
17% of the data now.


:~$ nodetool status
Datacenter: us-west-2
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns   Host ID
Rack
UN  10.198.20.51   958.16 KB  256 16.9%
 6a40b500-cff4-4513-b26b-ea33048c1590  usw2c
UN  10.198.18.125  776 KB 256 17.0%
 aa746ed1-288c-414f-8d97-65fc867a5bdd  usw2b
UN  10.198.16.92   1.39 MB256 16.4%
 01989d0b-0f81-411b-a70e-f22f01189542  usw2a
Datacenter: us-east-1
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns   Host ID
Rack
UN  10.198.0.249   1.11 MB256 16.3%
 22b30bea-5643-43b5-8d98-6e0eafe4af75  use1b
UN  10.198.4.801.22 MB256 16.4%
 e31aecd5-1eb1-4ddb-85ac-7a4135618b66  use1d
UN  10.198.2.20137.27 MB  256 17.0%
 3253080f-09b6-47a6-9b66-da3d174d1101  use1c

I checked some of the data on each of the nodes in one column family and
the counts are different across the nodes now.   I'm trying to run a
nodetool repair but it's been running for about 6 hours now.

So a couple of questions:
1.  Any idea why the owns % would have changed from 100% to 17% per node
after upgrade?
2. Is there anything else I can do to get the data back in sync between the
nodes other than nodetool repair?

thanks,
Rob