Re: Docs: Token Selection

2011-06-17 Thread Jonathan Ellis
Replication location is determined by the row key, not the location of the client that inserted it. (Otherwise, without knowing what DC a row was inserted in, you couldn't look it up to read it!) On Fri, Jun 17, 2011 at 12:20 AM, AJ a...@dude.podzone.net wrote: On 6/16/2011 9:45 PM, aaron

Re: Docs: Token Selection

2011-06-17 Thread William Oberman
I haven't done it yet, but when I researched how to make geo-diverse/failover DCs, I figured I'd have to do something like RF=6, strategy = {DC1=3, DC2=3}, and LOCAL_QUORUM for reads/writes. This gives you an ack after 2 local nodes do the read/write, but the data eventually gets distributed to

Re: Docs: Token Selection

2011-06-17 Thread AJ
Thanks Jonathan. I assumed since each data center owned the full key space that the first replica would be stored in the dc of the coordinating node, the 2nd in another dc, and the 3rd+ back in the 1st dc. But, are you saying that the first endpoint is selected regardless of the location of

Re: Docs: Token Selection

2011-06-17 Thread AJ
On 6/17/2011 7:26 AM, William Oberman wrote: I haven't done it yet, but when I researched how to make geo-diverse/failover DCs, I figured I'd have to do something like RF=6, strategy = {DC1=3, DC2=3}, and LOCAL_QUORUM for reads/writes. This gives you an ack after 2 local nodes do the

Re: Docs: Token Selection

2011-06-17 Thread Eric tamme
On Fri, Jun 17, 2011 at 12:07 PM, AJ a...@dude.podzone.net wrote: Thanks Jonathan.  I assumed since each data center owned the full key space that the first replica would be stored in the dc of the coordinating node, the 2nd in another dc, and the 3rd+ back in the 1st dc.  But, are you saying

Re: Docs: Token Selection

2011-06-17 Thread Eric tamme
What I don't like about NTS is I would have to have more replicas than I need.  {DC1=2, DC2=2}, RF=4 would be the minimum.  If I felt that 2 local replicas was insufficient, I'd have to move up to RF=6 which seems like a waste... I'm predicting data in the TB range so I'm trying to keep

Re: Docs: Token Selection

2011-06-17 Thread Sasha Dolgy
+1 for this if it is possible... On Fri, Jun 17, 2011 at 6:31 PM, Eric tamme eta...@gmail.com wrote: What I don't like about NTS is I would have to have more replicas than I need.  {DC1=2, DC2=2}, RF=4 would be the minimum.  If I felt that 2 local replicas was insufficient, I'd have to move up

RE: Docs: Token Selection

2011-06-17 Thread Jeremiah Jordan
Run two Cassandra clusters... -Original Message- From: Eric tamme [mailto:eta...@gmail.com] Sent: Friday, June 17, 2011 11:31 AM To: user@cassandra.apache.org Subject: Re: Docs: Token Selection What I don't like about NTS is I would have to have more replicas than I need.  {DC1=2

Re: Docs: Token Selection

2011-06-17 Thread AJ
+1 Yes, that is what I'm talking about Eric. Maybe I could write my own strategy, I dunno. I'll have to understand more first. On 6/17/2011 10:37 AM, Sasha Dolgy wrote: +1 for this if it is possible... On Fri, Jun 17, 2011 at 6:31 PM, Eric tammeeta...@gmail.com wrote: What I don't like

Re: Docs: Token Selection

2011-06-17 Thread AJ
Hi Jeremiah, can you give more details? Thanks On 6/17/2011 10:49 AM, Jeremiah Jordan wrote: Run two Cassandra clusters... -Original Message- From: Eric tamme [mailto:eta...@gmail.com] Sent: Friday, June 17, 2011 11:31 AM To: user@cassandra.apache.org Subject: Re: Docs: Token

RE: Docs: Token Selection

2011-06-17 Thread Jeremiah Jordan
] Sent: Friday, June 17, 2011 1:02 PM To: user@cassandra.apache.org Subject: Re: Docs: Token Selection Hi Jeremiah, can you give more details? Thanks On 6/17/2011 10:49 AM, Jeremiah Jordan wrote: Run two Cassandra clusters... -Original Message- From: Eric tamme [mailto:eta...@gmail.com

Re: Docs: Token Selection

2011-06-17 Thread Eric tamme
Yes.  But, the more I think about it, the more I see issues.  Here is what I envision (Issues marked with *): Three or more dc's, each serving as fail-overs for the others with 1 maximum unavailable dc supported at a time. Each dc is a production dc serving users that I choose. Each dc also

Re: Docs: Token Selection

2011-06-17 Thread AJ
On 6/17/2011 12:33 PM, Eric tamme wrote: As i said previously, trying to build make cassandra treat things differently based on some kind of persistent locality set it maintains in memory .. or whatever .. sounds like you will be absolutely undermining the core principles of how cassandra

Re: Docs: Token Selection

2011-06-17 Thread AJ
On 6/17/2011 12:32 PM, Jeremiah Jordan wrote: Run two clusters, one which has {DC1:2, DC2:1} and one which is {DC1:1,DC2:2}. You can't have both in the same cluster, otherwise it isn't possible to tell where the data got written when you want to read it. For a given key XYZ you must be

Re: Docs: Token Selection

2011-06-17 Thread Sasha Dolgy
Replication factor is defined per keyspace if i'm not mistaken. Can't remember if NTS is per keyspace or per cluster ... if it's per keyspace, that would be a way around it ... without having to maintain multiple clusters just have multiple keyspaces ... On Fri, Jun 17, 2011 at 9:23 PM, AJ

Re: Docs: Token Selection

2011-06-17 Thread AJ
On 6/17/2011 1:27 PM, Sasha Dolgy wrote: Replication factor is defined per keyspace if i'm not mistaken. Can't remember if NTS is per keyspace or per cluster ... if it's per keyspace, that would be a way around it ... without having to maintain multiple clusters just have multiple

Re: Docs: Token Selection

2011-06-16 Thread aaron morton
See this thread for background http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replica-data-distributing-between-racks-td6324819.html In a multi DC environment, if you calculate the initial tokens for the entire cluster data will not be evenly distributed. Cheers

Re: Docs: Token Selection

2011-06-16 Thread Eric tamme
AJ, sorry I seemed to miss the original email on this thread. As Aaron said, when computing tokens for multiple data centers, you should compute them independently for each data center - as if it were its own Cassandra cluster. You can have overlapping token ranges between multiple data

Re: Docs: Token Selection

2011-06-16 Thread AJ
LOL, I feel Eric's pain. This double-ring thing can throw you for a loop since, like I said, there is only one place it is documented and it is only *implied*, so one is not sure he is interpreting it correctly. Even the source for NTS doesn't mention this. Thanks for everyone's help on

Re: Docs: Token Selection

2011-06-16 Thread AJ
Thanks Eric! I've finally got it! I feel like I've just been initiated or something by discovering this secret. I kid! But, I'm thinking about using OldNetworkTopStrat. Do you, or anyone else, know if the same rules for token assignment applies to ONTS? On 6/16/2011 7:21 AM, Eric tamme

Re: Docs: Token Selection

2011-06-16 Thread Sasha Dolgy
So, with ec2 ... 3 regions (DC's), each one is +1 from another? On Jun 16, 2011 3:40 PM, AJ a...@dude.podzone.net wrote: Thanks Eric! I've finally got it! I feel like I've just been initiated or something by discovering this secret. I kid! But, I'm thinking about using OldNetworkTopStrat.

Re: Docs: Token Selection

2011-06-16 Thread Eric tamme
On Thu, Jun 16, 2011 at 11:11 AM, Sasha Dolgy sdo...@gmail.com wrote: So, with ec2 ... 3 regions (DC's), each one is +1 from another? I dont use ec2, so I am not familiar with the specifics of deployment there. That said, if you have 3 data centers with equal nodes in each (so that you

Re: Docs: Token Selection

2011-06-16 Thread aaron morton
But, I'm thinking about using OldNetworkTopStrat. NetworkTopologyStrategy is where it's at. A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 17 Jun 2011, at 01:39, AJ wrote: Thanks Eric! I've finally got it! I feel like I've just

Re: Docs: Token Selection

2011-06-16 Thread AJ
On 6/16/2011 9:45 PM, aaron morton wrote: But, I'm thinking about using OldNetworkTopStrat. NetworkTopologyStrategy is where it's at. Oh yeah? It didn't look like it would serve my requirements. I want 2 full production geo-diverse data centers with each serving as a failover for the

Re: Docs: Token Selection

2011-06-15 Thread Vijay
The problem in the above approach is you have 2 nodes between 12 to 4 in DC1 but from 4 to 12 you just have 1 (Which will cause uneven distribution of data the node) It is easier to think of the DCs as ring and split equally and interleave them together DC1 Node 1 : token 0 DC1 Node 2 :

Re: Docs: Token Selection

2011-06-15 Thread Vijay
Correction The problem in the above approach is you have 2 nodes between 12 to 4 in DC1 but from 4 to 12 you just have 1 should be The problem in the above approach is you have 1 node between 0-4 (25%) and and one node covering the rest which is 4-16, 0-0 (75%) Regards, /VJ On Wed, Jun

Re: Docs: Token Selection

2011-06-15 Thread Vijay
All you heard is right... You are not overriding Cassandra's token assignment by saying here is your token... Logic is: Calculate a token for the given key... find the node in each region independently (If you use NTS and if you set the strategy options which says you want to replicate to the

Re: Docs: Token Selection

2011-06-15 Thread AJ
Vijay, thank you for your thoughtful reply. Will Cass complain if I don't setup my tokens like in the examples? On 6/15/2011 2:41 PM, Vijay wrote: All you heard is right... You are not overriding Cassandra's token assignment by saying here is your token... Logic is: Calculate a token for

Re: Docs: Token Selection

2011-06-15 Thread Vijay
No it wont it will assume you are doing the right thing... Regards, /VJ On Wed, Jun 15, 2011 at 2:34 PM, AJ a...@dude.podzone.net wrote: Vijay, thank you for your thoughtful reply. Will Cass complain if I don't setup my tokens like in the examples? On 6/15/2011 2:41 PM, Vijay

Re: Docs: Token Selection

2011-06-15 Thread AJ
Ok. I understand the reasoning you laid out. But, I think it should be documented more thoroughly. I was trying to get an idea as to how flexible Cass lets you be with the various combinations of strategies, snitches, token ranges, etc.. It would be instructional to see what a graphical

Re: Docs: Token Selection

2011-06-15 Thread Vijay
+1 for more documentation (I guess contributions are always welcomed) I will try to write it down sometime when we have a bit more time... 0.8 nodetool ring command adds the DC and RAC information http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers

Docs: Token Selection

2011-06-14 Thread AJ
This http://wiki.apache.org/cassandra/Operations#Token_selection says: With NetworkTopologyStrategy, you should calculate the tokens the nodes in each DC independantly. and gives the example: DC1 node 1 = 0 node 2 = 85070591730234615865843651857942052864 DC2 node 3 = 1 node 4 =

Re: Docs: Token Selection

2011-06-14 Thread Vijay
Yes... Thats right... If you are trying to say the below... DC1 Node1 Owns 50% (Ranges 8..4 - 8..5 8..5 - 0) Node2 Owns 50% (Ranges 0 - 1 1 - 8..4) DC2 Node1 Owns 50% (Ranges 8..5 - 0 0 - 1) Node2 Owns 50% (Ranges 1 - 8..4 8..4 - 8..5) Regards, /VJ On Tue, Jun 14, 2011 at 3:47

Re: Docs: Token Selection

2011-06-14 Thread AJ
Yes, which means that the ranges overlap each other. Is this just a convention, or is it technically required when using NetworkTopologyStrategy? Would it be acceptable to split the ranges into quarters by ignoring the data centers, such as: DC1 node 1 = 0 Range: (12, 16], (0, 0] node