Hi there, I am currently in the planning state of a new web application that should use Cassandra because of its scaling possibilities.
I would like to ask a few questions to make sure I fully understood how Cassandra handles certain cases. If there is somewhere I missed to read or where some more details are available, please point me in that direction :-) Removal of data: If I delete delete data from my cluster will there over time be nodes that will have more/less data than the average node? Will it lead to an imbalanced distribution of data or will Cassandra move some data between nodes to keep them evenly used? ----- Server-Load: If I have a small portion of data that is read very often which is unfortunately on the same node. Will this lead to an unbalanced Server-Load or will Cassandra distribute data also based on how often it it accessed? There is this comment on the auto_bootstap documentation: (If no InitialToken is specified, they will pick one such that they will get half the range of the most-loaded node.) Does this mean the CPU Load or data load/storage? ----- Node down: If I have a node that went down and took all its data with it. Will a new node with auto_bootstrap true will replace it or do I need to specify the token of the lost node? Thank you in advance for your help, Mario