awesome! Thank you guys for the really quick answers and the links to the presentations.
On Thu, Dec 9, 2010 at 12:06 PM, Sylvain Lebresne <sylv...@yakaz.com> wrote: >> This helps a little but unfortunately I'm still a bit fuzzy for me. So is it >> not true that each node contains all the data in the cluster? > > Not at all. Basically each node is responsible of only a part of the data (a > range really). But for each data you can choose on how many nodes it is; this > is the Replication Factor. > > For instance, if you choose to have RF=1, then each piece of data will be on > exactly one node (this is usually a bad idea since it offers very weak > durability guarantees but nevertheless, it can be done). > > If you choose RF=3, each piece of data is on 3 nodes (independently of the > number of nodes your cluster have). You can have all data on all node, but for > that you'll have to choose RF=#{nodes in the cluster}. But this is a very > degenerate case. > >> how does my query get directed to the right node? > > Each node in the cluster knows the ranges of data each other nodes hold. I > suggest you watch the first video linked in this page > http://wiki.apache.org/cassandra/ArticlesAndPresentations > It explains this and more. > > -- > Sylvain >