Many keyspaces pattern

Jonathan Ballet Tue, 24 Nov 2015 02:06:07 -0800

Hi,

we are running an application which produces every night a batch withseveral hundreds of Gigabytes of data. Once a batch has been computed,it is never modified (nor updates nor deletes), we just keep producingnew batches every day.

Now, we are *sometimes* interested to remove a complete specific batchaltogether. At the moment, we are accumulating all these data into onlyone keyspace which has a batch ID column in all our tables which is alsopart of the primary key. A sample table looks similar to this:


  CREATE TABLE computation_results (
      batch_id int,
      id1 int,
      id2 int,
      value double,
      PRIMARY KEY ((batch_id, id1), id2)
  ) WITH CLUSTERING ORDER BY (id2 ASC);

But we found out it is very difficult to remove a specific batch as weneed to know all the IDs to delete the entries and it's both time andresource consuming (ie. it takes a long time and I'm not sure it's goingto scale at all.)

So, we are currently looking into having each of our batches in akeyspace of their own so removing a batch is merely equivalent to deletea keyspace. Potentially, it means we will end up having several hundredsof keyspaces in one cluster, although most of the time only the verylast one will be used (we might still want to access the older ones, butthat would be a very seldom use-case.) At the moment, the keyspace hasabout 14 tables and is probably not going to evolve much.

Are there any counter-indications of using lot of keyspaces (300+) intoone Cassandra cluster?

Are there any good practices that we should follow?

After reading the "Anti-patterns in Cassandra > Too many keyspaces ortables", does it mean we should plan ahead to already split our keyspaceamong several clusters?


Finally, would there be any other way to achieve what we want to do?

Thanks for your help!

 Jonathan

Many keyspaces pattern

Reply via email to