You can also scale not "horizontally" but "diagonally", i.e. raid SSDs and have multicore CPUs. This means that you'll have same performance with less nodes, making it far easier to manage.
SSDs by themselves will give you an order of magnitude improvement on I/O. On 1/19/2012 9:17 PM, Thorsten von Eicken wrote:
We're embarking on a project where we estimate we will need on the order of 100 cassandra nodes. The data set is perfectly partitionable, meaning we have no queries that need to have access to all the data at once. We expect to run with RF=2 or =3. Is there some notion of ideal cluster size? Or perhaps asked differently, would it be easier to run one large cluster or would it be easier to run a bunch of, say, 16 node clusters? Everything we've done to date has fit into 4-5 node clusters.