Distribution Factor: part of the solution to many-CF problem?

David Boxenhorn Mon, 21 Feb 2011 04:28:56 -0800

Cassandra is both distributed and replicated. We have Replication Factor but
no Distribution Factor!


Distribution Factor would define over how many nodes a CF should be
distributed.

Say you want to support millions of multi-tenant users in clusters with
thousands of nodes, where you don't know the user's schema in advance, so
you can't have users share CFs.

In this case you wouldn't want to spread out each user's Column Families
over thousands of nodes! You would want something like: RF=3, DF=10 i.e.
distribute each CF over 10 nodes, within those nodes replicate 3 times.

One implementation of DF would be to hash the CF name, and use the same
strategies defined for RF to choose the N nodes in DF=N.

Distribution Factor: part of the solution to many-CF problem?

Reply via email to