Thanks Ran.  This helps a little but unfortunately I'm still a bit
fuzzy for me.  So is it not true that each node contains all the data
in the cluster? I haven't come across any information on how clustered
data is coordinated in cassandra.  how does my query get directed to
the right node?

On Thu, Dec 9, 2010 at 11:35 AM, Ran Tavory <ran...@gmail.com> wrote:
> there are two numbers to look at, N the numbers of hosts in the ring
> (cluster) and R the number of replicas for each data item. R is configurable
> per column family.
> Typically for large clusters N >> R. For very small clusters if makes sense
> for R to be close to N in which case cassandra is useful so the database
> doesn't have a single a single point of failure but not so much b/c of the
> size of the data. But for large clusters it rarely makes sense to have N=R,
> usually N >> R.
>
> On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby <jonathan.co...@gmail.com>
> wrote:
>>
>> I have a very basic question which I have been unable to find in
>> online documentation on cassandra.
>>
>> It seems like every node in a cassandra cluster contains all the data
>> ever stored in the cluster (i.e., all nodes are identical).  I don't
>> understand how you can scale this on commodity servers with merely
>> internal hard disks.   In other words, if I want to store 5 TB of
>> data, does that each node need a hard disk capacity of 5 TB??
>>
>> With HBase, memcached and other nosql solutions it is more clear how
>> data is spilt up in the cluster and replicated for fault tolerance.
>> Again, please excuse the rather basic question.
>
>
>
> --
> /Ran
>

Reply via email to