Re: Keyspace and table/cf limits

2014-12-08 Thread Frank Hsueh
has there been any recent discussion on multitenancy namespaces ?  I think
this would effectively solve the scenario -- a formalized partition-key
that's enforced at the storage layer, similar to oracle's virtual private
database

it was on the wiki from ~ Aug 2010

http://wiki.apache.org/cassandra/MultiTenant

Namespaces - in a multi-tenant use case, each user might like to have a
keyspace XYZ for whatever reason. So it might be nice to have namespaces so
that keyspace XYZ could be specific to their user. Ideally this would be an
option that would not affect those that don't use namespaces.

   - The distinction from keyspaces is that a namespace would be completely
   transparent to the user: the existence of namespaces would not be exposed.
   It might be returned by the authentication backend on login, and prefixed
   to keyspaces transparently.



thanks !!!


On Sat, Dec 6, 2014 at 11:25 PM, Jason Wee peich...@gmail.com wrote:

 +1 well said Jack!

 On Sun, Dec 7, 2014 at 6:13 AM, Jack Krupansky j...@basetechnology.com
 wrote:

   Generally, limit a Cassandra cluster low hundreds of tables,
 regardless of number of keyspaces. Beyond low hundreds is certainly an
 “expert” feature and requires great care. Sure, maybe you can have 500 or
 750 or maybe even 1,000 tables in a cluster, but don’t be surprised if you
 start running into memory and performance issues.

 There is an undocumented method to reduce the table overhead to support
 more tables, but... if you are not expert enough to find it on your own,
 then you are definitely not expert enough to be using it.

 -- Jack Krupansky

  *From:* Raj N raj.cassan...@gmail.com
 *Sent:* Tuesday, November 25, 2014 12:07 PM
 *To:* user@cassandra.apache.org
 *Subject:* Keyspace and table/cf limits

  What's the latest on the maximum number of keyspaces and/or tables that
 one can have in Cassandra 2.1.x?

 -Raj





-- 
Frank Hsueh | frank.hs...@gmail.com


Re: Keyspace and table/cf limits

2014-12-06 Thread Eric Stevens
Based on recent conversations with Datastax engineers, the recommendation
is definitely still to run a finite and reasonable set of column families.

The best way I know of to support multitenancy is to include tenant id in
all of your partition keys.

On Fri Dec 05 2014 at 7:39:47 PM Kai Wang dep...@gmail.com wrote:

 On Fri, Dec 5, 2014 at 4:32 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Dec 3, 2014 at 1:54 PM, Raj N raj.cassan...@gmail.com wrote:

 The question is more from a multi-tenancy point of view. We wanted to
 see if we can have a keyspace per client. Each keyspace may have 50 column
 families, but if we have 200 clients, that would be 10,000 column families.
 Do you think that's reasonable to support? I know that key cache capacity
 is reserved in heap still. Any plans to move it off-heap?


 That's an order of magnitude more CFs than I would want to try to operate.

 But then, I wouldn't want to operate Cassandra multi-tenant AT ALL, so
 grain of salt.

 =Rob
 http://twitter.com/rcolidba


 I don't know if it's still true but Jonathan Ellis wrote in an old post
 saying there's a fixed overhead per cf. Here is the link.
 http://dba.stackexchange.com/a/12413. Even if it's improved since C* 1.0,
 I still don't feel comfortable to scale my system by creating CFs.




Re: Keyspace and table/cf limits

2014-12-06 Thread Jack Krupansky
Generally, limit a Cassandra cluster low hundreds of tables, regardless of 
number of keyspaces. Beyond low hundreds is certainly an “expert” feature and 
requires great care. Sure, maybe you can have 500 or 750 or maybe even 1,000 
tables in a cluster, but don’t be surprised if you start running into memory 
and performance issues.

There is an undocumented method to reduce the table overhead to support more 
tables, but... if you are not expert enough to find it on your own, then you 
are definitely not expert enough to be using it.

-- Jack Krupansky

From: Raj N 
Sent: Tuesday, November 25, 2014 12:07 PM
To: user@cassandra.apache.org 
Subject: Keyspace and table/cf limits

What's the latest on the maximum number of keyspaces and/or tables that one can 
have in Cassandra 2.1.x? 

-Raj

Re: Keyspace and table/cf limits

2014-12-06 Thread Jack Krupansky
There are two categorically distinct forms of multi-tenancy: 1) You control the 
apps and simply want client data isolation, and 2) The client has their own 
apps and doing direct access to the cluster and using access control at the 
table level to isolate the client data.

Using a tenant ID in the partition key is the preferred approach and works well 
for the first use case, but it doesn’t provide the strict isolation of data 
needed for the second use case. Still, try to use that first approach if you 
can.

You should also consider an application layer which would intermediate between 
the tenant clients and the cluster, supplying the tenant ID in the partition 
key. That does add an extra hop for data access, but is a cleaner design.

If you really do need to maintain separate tables and keyspaces, use what I 
call “sharded clusters” – multiple, independent clusters with a hash on the 
user/tenant ID to select which cluster to use, but limit each cluster to low 
hundreds of tables. It is worth noting that if each tenant needs to be isolated 
anyway, there is clearly no need to store independent tenants on the same 
cluster.

You will have to do your own proof of concept implementation to determine what 
table limit works best for your use case.

-- Jack Krupansky

From: Raj N 
Sent: Wednesday, December 3, 2014 4:54 PM
To: user@cassandra.apache.org 
Subject: Re: Keyspace and table/cf limits

The question is more from a multi-tenancy point of view. We wanted to see if we 
can have a keyspace per client. Each keyspace may have 50 column families, but 
if we have 200 clients, that would be 10,000 column families. Do you think 
that's reasonable to support? I know that key cache capacity is reserved in 
heap still. Any plans to move it off-heap? 

-Raj

On Tue, Nov 25, 2014 at 3:10 PM, Robert Coli rc...@eventbrite.com wrote:

  On Tue, Nov 25, 2014 at 9:07 AM, Raj N raj.cassan...@gmail.com wrote:

What's the latest on the maximum number of keyspaces and/or tables that one 
can have in Cassandra 2.1.x?

  Most relevant changes lately would be :

  https://issues.apache.org/jira/browse/CASSANDRA-6689

  and
  https://issues.apache.org/jira/browse/CASSANDRA-6694


  Which should meaningfully reduce the amount of heap memtables consume. That 
heap can then be used to support more heap-persistent structures associated 
with many CFs. I have no idea how to estimate the scale of the improvement.

  As a general/meta statement, Cassandra is very multi-threaded, and consumes 
file handles like crazy. How many different query cases do you really want to 
put on one cluster/node? ;D

  =Rob



Re: Keyspace and table/cf limits

2014-12-06 Thread Jason Wee
+1 well said Jack!

On Sun, Dec 7, 2014 at 6:13 AM, Jack Krupansky j...@basetechnology.com
wrote:

   Generally, limit a Cassandra cluster low hundreds of tables, regardless
 of number of keyspaces. Beyond low hundreds is certainly an “expert”
 feature and requires great care. Sure, maybe you can have 500 or 750 or
 maybe even 1,000 tables in a cluster, but don’t be surprised if you start
 running into memory and performance issues.

 There is an undocumented method to reduce the table overhead to support
 more tables, but... if you are not expert enough to find it on your own,
 then you are definitely not expert enough to be using it.

 -- Jack Krupansky

  *From:* Raj N raj.cassan...@gmail.com
 *Sent:* Tuesday, November 25, 2014 12:07 PM
 *To:* user@cassandra.apache.org
 *Subject:* Keyspace and table/cf limits

  What's the latest on the maximum number of keyspaces and/or tables that
 one can have in Cassandra 2.1.x?

 -Raj



Re: Keyspace and table/cf limits

2014-12-05 Thread Robert Coli
On Wed, Dec 3, 2014 at 1:54 PM, Raj N raj.cassan...@gmail.com wrote:

 The question is more from a multi-tenancy point of view. We wanted to see
 if we can have a keyspace per client. Each keyspace may have 50 column
 families, but if we have 200 clients, that would be 10,000 column families.
 Do you think that's reasonable to support? I know that key cache capacity
 is reserved in heap still. Any plans to move it off-heap?


That's an order of magnitude more CFs than I would want to try to operate.

But then, I wouldn't want to operate Cassandra multi-tenant AT ALL, so
grain of salt.

=Rob
http://twitter.com/rcolidba


Re: Keyspace and table/cf limits

2014-12-05 Thread Kai Wang
On Fri, Dec 5, 2014 at 4:32 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Dec 3, 2014 at 1:54 PM, Raj N raj.cassan...@gmail.com wrote:

 The question is more from a multi-tenancy point of view. We wanted to see
 if we can have a keyspace per client. Each keyspace may have 50 column
 families, but if we have 200 clients, that would be 10,000 column families.
 Do you think that's reasonable to support? I know that key cache capacity
 is reserved in heap still. Any plans to move it off-heap?


 That's an order of magnitude more CFs than I would want to try to operate.

 But then, I wouldn't want to operate Cassandra multi-tenant AT ALL, so
 grain of salt.

 =Rob
 http://twitter.com/rcolidba


I don't know if it's still true but Jonathan Ellis wrote in an old post
saying there's a fixed overhead per cf. Here is the link.
http://dba.stackexchange.com/a/12413. Even if it's improved since C* 1.0, I
still don't feel comfortable to scale my system by creating CFs.


Re: Keyspace and table/cf limits

2014-12-03 Thread Raj N
The question is more from a multi-tenancy point of view. We wanted to see
if we can have a keyspace per client. Each keyspace may have 50 column
families, but if we have 200 clients, that would be 10,000 column families.
Do you think that's reasonable to support? I know that key cache capacity
is reserved in heap still. Any plans to move it off-heap?

-Raj

On Tue, Nov 25, 2014 at 3:10 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Nov 25, 2014 at 9:07 AM, Raj N raj.cassan...@gmail.com wrote:

 What's the latest on the maximum number of keyspaces and/or tables that
 one can have in Cassandra 2.1.x?


 Most relevant changes lately would be :

 https://issues.apache.org/jira/browse/CASSANDRA-6689
 and
 https://issues.apache.org/jira/browse/CASSANDRA-6694

 Which should meaningfully reduce the amount of heap memtables consume.
 That heap can then be used to support more heap-persistent structures
 associated with many CFs. I have no idea how to estimate the scale of the
 improvement.

 As a general/meta statement, Cassandra is very multi-threaded, and
 consumes file handles like crazy. How many different query cases do you
 really want to put on one cluster/node? ;D

 =Rob




Re: Keyspace and table/cf limits

2014-12-03 Thread Nikolai Grigoriev
We had the similar problem - multi-tenancy and multiple DC support. But we
did not really have strict requirement of one keyspace per tenant. Our row
keys allow us to put any number of tenants per keyspace.

So, on one side - we could put all data in a single keyspace for all
tenants. And size the cluster for it, at the end the total amount of data
would be the same :)

However, we wanted different replication strategy for different customers.
And the replication strategy is a keyspace setting. Thus, it wold be
simpler to have one keyspace per customer.

The cost, as it was mentioned, is per CF. The more keyspaces we have, the
more CFs we have. So we did not want this to be too high.

The decision we've made was to have something in between. We'd define a
number of keyspaces with different replication strategies (possibly even
duplicate ones) and map tenants to these keyspaces. Thus, there would be a
couple of tenants in one keyspace all sharing the same properties
(replication strategy in our case). We could even create a keyspace that
will group some tenants that currently share the same replication
requirements and that may be moved/replicated to a specific DC in the
future.

On Wed, Dec 3, 2014 at 4:54 PM, Raj N raj.cassan...@gmail.com wrote:

 The question is more from a multi-tenancy point of view. We wanted to see
 if we can have a keyspace per client. Each keyspace may have 50 column
 families, but if we have 200 clients, that would be 10,000 column families.
 Do you think that's reasonable to support? I know that key cache capacity
 is reserved in heap still. Any plans to move it off-heap?

 -Raj

 On Tue, Nov 25, 2014 at 3:10 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Nov 25, 2014 at 9:07 AM, Raj N raj.cassan...@gmail.com wrote:

 What's the latest on the maximum number of keyspaces and/or tables that
 one can have in Cassandra 2.1.x?


 Most relevant changes lately would be :

 https://issues.apache.org/jira/browse/CASSANDRA-6689
 and
 https://issues.apache.org/jira/browse/CASSANDRA-6694

 Which should meaningfully reduce the amount of heap memtables consume.
 That heap can then be used to support more heap-persistent structures
 associated with many CFs. I have no idea how to estimate the scale of the
 improvement.

 As a general/meta statement, Cassandra is very multi-threaded, and
 consumes file handles like crazy. How many different query cases do you
 really want to put on one cluster/node? ;D

 =Rob





-- 
Nikolai Grigoriev
(514) 772-5178


Keyspace and table/cf limits

2014-11-25 Thread Raj N
What's the latest on the maximum number of keyspaces and/or tables that one
can have in Cassandra 2.1.x?

-Raj


Re: Keyspace and table/cf limits

2014-11-25 Thread Robert Coli
On Tue, Nov 25, 2014 at 9:07 AM, Raj N raj.cassan...@gmail.com wrote:

 What's the latest on the maximum number of keyspaces and/or tables that
 one can have in Cassandra 2.1.x?


Most relevant changes lately would be :

https://issues.apache.org/jira/browse/CASSANDRA-6689
and
https://issues.apache.org/jira/browse/CASSANDRA-6694

Which should meaningfully reduce the amount of heap memtables consume. That
heap can then be used to support more heap-persistent structures associated
with many CFs. I have no idea how to estimate the scale of the improvement.

As a general/meta statement, Cassandra is very multi-threaded, and consumes
file handles like crazy. How many different query cases do you really want
to put on one cluster/node? ;D

=Rob