Re: Keyspace and table/cf limits
has there been any recent discussion on multitenancy namespaces ? I think this would effectively solve the scenario -- a formalized partition-key that's enforced at the storage layer, similar to oracle's virtual private database it was on the wiki from ~ Aug 2010 http://wiki.apache.org/cassandra/MultiTenant Namespaces - in a multi-tenant use case, each user might like to have a keyspace XYZ for whatever reason. So it might be nice to have namespaces so that keyspace XYZ could be specific to their user. Ideally this would be an option that would not affect those that don't use namespaces. - The distinction from keyspaces is that a namespace would be completely transparent to the user: the existence of namespaces would not be exposed. It might be returned by the authentication backend on login, and prefixed to keyspaces transparently. thanks !!! On Sat, Dec 6, 2014 at 11:25 PM, Jason Wee peich...@gmail.com wrote: +1 well said Jack! On Sun, Dec 7, 2014 at 6:13 AM, Jack Krupansky j...@basetechnology.com wrote: Generally, limit a Cassandra cluster low hundreds of tables, regardless of number of keyspaces. Beyond low hundreds is certainly an “expert” feature and requires great care. Sure, maybe you can have 500 or 750 or maybe even 1,000 tables in a cluster, but don’t be surprised if you start running into memory and performance issues. There is an undocumented method to reduce the table overhead to support more tables, but... if you are not expert enough to find it on your own, then you are definitely not expert enough to be using it. -- Jack Krupansky *From:* Raj N raj.cassan...@gmail.com *Sent:* Tuesday, November 25, 2014 12:07 PM *To:* user@cassandra.apache.org *Subject:* Keyspace and table/cf limits What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? -Raj -- Frank Hsueh | frank.hs...@gmail.com
Re: Keyspace and table/cf limits
Based on recent conversations with Datastax engineers, the recommendation is definitely still to run a finite and reasonable set of column families. The best way I know of to support multitenancy is to include tenant id in all of your partition keys. On Fri Dec 05 2014 at 7:39:47 PM Kai Wang dep...@gmail.com wrote: On Fri, Dec 5, 2014 at 4:32 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Dec 3, 2014 at 1:54 PM, Raj N raj.cassan...@gmail.com wrote: The question is more from a multi-tenancy point of view. We wanted to see if we can have a keyspace per client. Each keyspace may have 50 column families, but if we have 200 clients, that would be 10,000 column families. Do you think that's reasonable to support? I know that key cache capacity is reserved in heap still. Any plans to move it off-heap? That's an order of magnitude more CFs than I would want to try to operate. But then, I wouldn't want to operate Cassandra multi-tenant AT ALL, so grain of salt. =Rob http://twitter.com/rcolidba I don't know if it's still true but Jonathan Ellis wrote in an old post saying there's a fixed overhead per cf. Here is the link. http://dba.stackexchange.com/a/12413. Even if it's improved since C* 1.0, I still don't feel comfortable to scale my system by creating CFs.
Re: Keyspace and table/cf limits
Generally, limit a Cassandra cluster low hundreds of tables, regardless of number of keyspaces. Beyond low hundreds is certainly an “expert” feature and requires great care. Sure, maybe you can have 500 or 750 or maybe even 1,000 tables in a cluster, but don’t be surprised if you start running into memory and performance issues. There is an undocumented method to reduce the table overhead to support more tables, but... if you are not expert enough to find it on your own, then you are definitely not expert enough to be using it. -- Jack Krupansky From: Raj N Sent: Tuesday, November 25, 2014 12:07 PM To: user@cassandra.apache.org Subject: Keyspace and table/cf limits What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? -Raj
Re: Keyspace and table/cf limits
There are two categorically distinct forms of multi-tenancy: 1) You control the apps and simply want client data isolation, and 2) The client has their own apps and doing direct access to the cluster and using access control at the table level to isolate the client data. Using a tenant ID in the partition key is the preferred approach and works well for the first use case, but it doesn’t provide the strict isolation of data needed for the second use case. Still, try to use that first approach if you can. You should also consider an application layer which would intermediate between the tenant clients and the cluster, supplying the tenant ID in the partition key. That does add an extra hop for data access, but is a cleaner design. If you really do need to maintain separate tables and keyspaces, use what I call “sharded clusters” – multiple, independent clusters with a hash on the user/tenant ID to select which cluster to use, but limit each cluster to low hundreds of tables. It is worth noting that if each tenant needs to be isolated anyway, there is clearly no need to store independent tenants on the same cluster. You will have to do your own proof of concept implementation to determine what table limit works best for your use case. -- Jack Krupansky From: Raj N Sent: Wednesday, December 3, 2014 4:54 PM To: user@cassandra.apache.org Subject: Re: Keyspace and table/cf limits The question is more from a multi-tenancy point of view. We wanted to see if we can have a keyspace per client. Each keyspace may have 50 column families, but if we have 200 clients, that would be 10,000 column families. Do you think that's reasonable to support? I know that key cache capacity is reserved in heap still. Any plans to move it off-heap? -Raj On Tue, Nov 25, 2014 at 3:10 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 25, 2014 at 9:07 AM, Raj N raj.cassan...@gmail.com wrote: What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? Most relevant changes lately would be : https://issues.apache.org/jira/browse/CASSANDRA-6689 and https://issues.apache.org/jira/browse/CASSANDRA-6694 Which should meaningfully reduce the amount of heap memtables consume. That heap can then be used to support more heap-persistent structures associated with many CFs. I have no idea how to estimate the scale of the improvement. As a general/meta statement, Cassandra is very multi-threaded, and consumes file handles like crazy. How many different query cases do you really want to put on one cluster/node? ;D =Rob
Re: Keyspace and table/cf limits
+1 well said Jack! On Sun, Dec 7, 2014 at 6:13 AM, Jack Krupansky j...@basetechnology.com wrote: Generally, limit a Cassandra cluster low hundreds of tables, regardless of number of keyspaces. Beyond low hundreds is certainly an “expert” feature and requires great care. Sure, maybe you can have 500 or 750 or maybe even 1,000 tables in a cluster, but don’t be surprised if you start running into memory and performance issues. There is an undocumented method to reduce the table overhead to support more tables, but... if you are not expert enough to find it on your own, then you are definitely not expert enough to be using it. -- Jack Krupansky *From:* Raj N raj.cassan...@gmail.com *Sent:* Tuesday, November 25, 2014 12:07 PM *To:* user@cassandra.apache.org *Subject:* Keyspace and table/cf limits What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? -Raj
Re: Keyspace and table/cf limits
On Wed, Dec 3, 2014 at 1:54 PM, Raj N raj.cassan...@gmail.com wrote: The question is more from a multi-tenancy point of view. We wanted to see if we can have a keyspace per client. Each keyspace may have 50 column families, but if we have 200 clients, that would be 10,000 column families. Do you think that's reasonable to support? I know that key cache capacity is reserved in heap still. Any plans to move it off-heap? That's an order of magnitude more CFs than I would want to try to operate. But then, I wouldn't want to operate Cassandra multi-tenant AT ALL, so grain of salt. =Rob http://twitter.com/rcolidba
Re: Keyspace and table/cf limits
On Fri, Dec 5, 2014 at 4:32 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Dec 3, 2014 at 1:54 PM, Raj N raj.cassan...@gmail.com wrote: The question is more from a multi-tenancy point of view. We wanted to see if we can have a keyspace per client. Each keyspace may have 50 column families, but if we have 200 clients, that would be 10,000 column families. Do you think that's reasonable to support? I know that key cache capacity is reserved in heap still. Any plans to move it off-heap? That's an order of magnitude more CFs than I would want to try to operate. But then, I wouldn't want to operate Cassandra multi-tenant AT ALL, so grain of salt. =Rob http://twitter.com/rcolidba I don't know if it's still true but Jonathan Ellis wrote in an old post saying there's a fixed overhead per cf. Here is the link. http://dba.stackexchange.com/a/12413. Even if it's improved since C* 1.0, I still don't feel comfortable to scale my system by creating CFs.
Re: Keyspace and table/cf limits
The question is more from a multi-tenancy point of view. We wanted to see if we can have a keyspace per client. Each keyspace may have 50 column families, but if we have 200 clients, that would be 10,000 column families. Do you think that's reasonable to support? I know that key cache capacity is reserved in heap still. Any plans to move it off-heap? -Raj On Tue, Nov 25, 2014 at 3:10 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 25, 2014 at 9:07 AM, Raj N raj.cassan...@gmail.com wrote: What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? Most relevant changes lately would be : https://issues.apache.org/jira/browse/CASSANDRA-6689 and https://issues.apache.org/jira/browse/CASSANDRA-6694 Which should meaningfully reduce the amount of heap memtables consume. That heap can then be used to support more heap-persistent structures associated with many CFs. I have no idea how to estimate the scale of the improvement. As a general/meta statement, Cassandra is very multi-threaded, and consumes file handles like crazy. How many different query cases do you really want to put on one cluster/node? ;D =Rob
Re: Keyspace and table/cf limits
We had the similar problem - multi-tenancy and multiple DC support. But we did not really have strict requirement of one keyspace per tenant. Our row keys allow us to put any number of tenants per keyspace. So, on one side - we could put all data in a single keyspace for all tenants. And size the cluster for it, at the end the total amount of data would be the same :) However, we wanted different replication strategy for different customers. And the replication strategy is a keyspace setting. Thus, it wold be simpler to have one keyspace per customer. The cost, as it was mentioned, is per CF. The more keyspaces we have, the more CFs we have. So we did not want this to be too high. The decision we've made was to have something in between. We'd define a number of keyspaces with different replication strategies (possibly even duplicate ones) and map tenants to these keyspaces. Thus, there would be a couple of tenants in one keyspace all sharing the same properties (replication strategy in our case). We could even create a keyspace that will group some tenants that currently share the same replication requirements and that may be moved/replicated to a specific DC in the future. On Wed, Dec 3, 2014 at 4:54 PM, Raj N raj.cassan...@gmail.com wrote: The question is more from a multi-tenancy point of view. We wanted to see if we can have a keyspace per client. Each keyspace may have 50 column families, but if we have 200 clients, that would be 10,000 column families. Do you think that's reasonable to support? I know that key cache capacity is reserved in heap still. Any plans to move it off-heap? -Raj On Tue, Nov 25, 2014 at 3:10 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 25, 2014 at 9:07 AM, Raj N raj.cassan...@gmail.com wrote: What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? Most relevant changes lately would be : https://issues.apache.org/jira/browse/CASSANDRA-6689 and https://issues.apache.org/jira/browse/CASSANDRA-6694 Which should meaningfully reduce the amount of heap memtables consume. That heap can then be used to support more heap-persistent structures associated with many CFs. I have no idea how to estimate the scale of the improvement. As a general/meta statement, Cassandra is very multi-threaded, and consumes file handles like crazy. How many different query cases do you really want to put on one cluster/node? ;D =Rob -- Nikolai Grigoriev (514) 772-5178
Keyspace and table/cf limits
What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? -Raj
Re: Keyspace and table/cf limits
On Tue, Nov 25, 2014 at 9:07 AM, Raj N raj.cassan...@gmail.com wrote: What's the latest on the maximum number of keyspaces and/or tables that one can have in Cassandra 2.1.x? Most relevant changes lately would be : https://issues.apache.org/jira/browse/CASSANDRA-6689 and https://issues.apache.org/jira/browse/CASSANDRA-6694 Which should meaningfully reduce the amount of heap memtables consume. That heap can then be used to support more heap-persistent structures associated with many CFs. I have no idea how to estimate the scale of the improvement. As a general/meta statement, Cassandra is very multi-threaded, and consumes file handles like crazy. How many different query cases do you really want to put on one cluster/node? ;D =Rob