R: Re: AW: How to control location of data?

cbert...@libero.it Tue, 10 Jan 2012 07:08:39 -0800

In each node of the ring has a unique Token which representing the node's 
logical position in the cluster. 
When you perform an operation on a row is calculated a token based on this row 
... the node-token "closest" to the row-token will store the data (and also the 
RF-1 remaining nodes) -- this tecnique should guarantee that data are balanced 
among the cluster (if you use the Random Partitioner)
Regards,Carlo




----Messaggio originale----

Da: andreas.rudo...@spontech-spine.com

Data: 10/01/2012 15.05

A: "user@cassandra.apache.org"<user@cassandra.apache.org>

Ogg: Re: AW: How to control location of data?



-->Hi!
Thank you for your last reply. I'm still wondering if I got you right...
... A partitioner decides into which partition a piece of data belongsDoes your 
statement imply that the partitioner does not take any decisions at all on the 
(physical) storage location? Or put another way: What do you mean with 
"partition"?
To quote http://wiki.apache.org/cassandra/ArchitectureInternals: "... 
AbstractReplicationStrategy controls what nodes get secondary, tertiary,
 etc. replicas of each key range.  Primary replica is always determined 
by the token ring (...)"
... You can select different placement strategies and partitioners for 
different keyspaces, thereby choosing known data to be stored on known 
hosts.This is however discouraged for various reasons – i.e.  you need a lot of 
knowledge about your data to keep the cluster balanced. What is your usecase 
for this requirement? there is probably a more suitable solution. What we want 
is to partition the cluster with respect to key spaces.That is we want to 
establish an association between nodes and key spaces so that a node of the 
cluster holds data from a key space if and only if that node is a *member* of 
that key space.
To our knowledge Cassandra has no built-in way to specify such a 
membership-relation. Therefore we thought of implementing our own replica 
placement strategy until we started to assume that the partitioner had to be 
replaced, too, to accomplish the task.
Do you have any ideas?

Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] 
Gesendet: Dienstag, 10. Januar 2012 09:53
An: user@cassandra.apache.org
Betreff: How to control location of data? Hi! We're evaluating Cassandra for 
our storage needs. One of the key benefits we see is the online replication of 
the data, that is an easy way to share data across nodes. But we have the need 
to precisely control on what node group specific parts of a key space 
(columns/column families) are stored on. Now we're having trouble understanding 
the documentation. Could anyone help us with to find some answers to our 
questions?·  What does the term "replica" mean: If a key is stored on exactly 
three nodes in a cluster, is it correct then to say that there are three 
replicas of that key or are there just two replicas (copies) and one original?· 
 What is the relation between the Cassandra concepts "Partitioner" and "Replica 
Placement Strategy"? According to documentation found on DataStax web site and 
architecture internals from the Cassandra Wiki the first storage location of a 
key (and its associated data) is determined by the "Partitioner" whereas 
additional storage locations are defined by "Replica Placement Strategy". I'm 
wondering if I could completely redefine the way how nodes are selected to 
store a key by just implementing my own subclass of AbstractReplicationStrategy 
and configuring that subclass into the key space.·  How can I suppress that the 
"Partitioner" is consulted at all to determine what node stores a key first?·  
Is a key space always distributed across the whole cluster? Is it possible to 
configure Cassandra in such a way that more or less freely chosen parts of a 
key space (columns) are stored on arbitrarily chosen nodes? Any tips would be 
very appreciated :-)

R: Re: AW: How to control location of data?

Reply via email to