I have a very basic question which I have been unable to find in online documentation on cassandra.
It seems like every node in a cassandra cluster contains all the data ever stored in the cluster (i.e., all nodes are identical). I don't understand how you can scale this on commodity servers with merely internal hard disks. In other words, if I want to store 5 TB of data, does that each node need a hard disk capacity of 5 TB?? With HBase, memcached and other nosql solutions it is more clear how data is spilt up in the cluster and replicated for fault tolerance. Again, please excuse the rather basic question.