Re: 10000+ CF support from Cassandra

Jonathan Haddad Mon, 01 Jun 2015 19:46:56 -0700

> Sorry for this naive question but how important is this tuning? Can this
have a huge impact in production?


Massive.  Here's a graph of when we did some JVM tuning at my previous
company:

http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png

About an order of magnitude difference in performance.

Jon

On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya <chaitan64a...@gmail.com>
wrote:

> Thanks Jon and Jack,
>
> > I strongly advise against this approach.
> Jon, I think so too. But so you actually foresee any problems with this
> approach?
> I can think of a few. [I want to evaluate if we can live with this problem]
>
>    - No more CQL.
>    - No data types, everything needs to be a blob.
>    - Limited clustering Keys and default clustering order.
>
> > First off, different workloads need different tuning.
> Sorry for this naive question but how important is this tuning? Can this
> have a huge impact in production?
>
> > You might want to consider a model where you have an application layer
> that maps logical tenant tables into partition keys within a single large
> Casandra table, or at least a relatively small number of  Cassandra tables.
> It will depend on the typical size of your tenant tables - very small ones
> would make sense within a single partition, while larger ones should have
> separate partitions for a tenant's data. The key here is that tables are
> expensive, but partitions are cheap and scale very well with Cassandra.
> We are actually trying similar approach. But we don't want to expose this
> to application layer. We are attempting to hide this and provide an API.
>
> > Finally, you said "10 clusters", but did you mean 10 nodes? You might
> want to consider a model where you do indeed have multiple clusters, where
> each handles a fraction of the tenants, since there is no need for separate
> tenants to be on the same cluster.
> I meant 10 clusters. We want to split our tables across multiple clusters
> if above approach is not possible. [But it seems to be very costly]
>
> Thanks,
>
>
>
>
>
>
>
> On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
>> How big is each of the tables - are they all fairly small or fairly
>> large? Small as in no more than thousands of rows or large as in tens of
>> millions or hundreds of millions of rows?
>>
>> Small tables are are not ideal for a Cassandra cluster since the rows
>> would be spread out across the nodes, even though it might make more sense
>> for each small table to be on a single node.
>>
>> You might want to consider a model where you have an application layer
>> that maps logical tenant tables into partition keys within a single large
>> Casandra table, or at least a relatively small number of Cassandra tables.
>> It will depend on the typical size of your tenant tables - very small ones
>> would make sense within a single partition, while larger ones should have
>> separate partitions for a tenant's data. The key here is that tables are
>> expensive, but partitions are cheap and scale very well with Cassandra.
>>
>> Finally, you said "10 clusters", but did you mean 10 nodes? You might
>> want to consider a model where you do indeed have multiple clusters, where
>> each handles a fraction of the tenants, since there is no need for separate
>> tenants to be on the same cluster.
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya <chaitan64a...@gmail.com
>> > wrote:
>>
>>> Good Day Everyone,
>>>
>>> I am very happy with the (almost) linear scalability offered by C*. We
>>> had a lot of problems with RDBMS.
>>>
>>> But, I heard that C* has a limit on number of column families that can
>>> be created in a single cluster.
>>> The reason being each CF stores 1-2 MB on the JVM heap.
>>>
>>> In our use case, we have about 10000+ CF and we want to support
>>> multi-tenancy.
>>> (i.e 10000 * no of tenants)
>>>
>>> We are new to C* and being from RDBMS background, I would like to
>>> understand how to tackle this scenario from your advice.
>>>
>>> Our plan is to use Off-Heap memtable approach.
>>> http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
>>>
>>> Each node in the cluster has following configuration
>>> 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
>>> IMO, this should be able to support 1000 CF with no(very less) impact on
>>> performance and startup time.
>>>
>>> We tackle multi-tenancy using different keyspaces.(Solution I found on
>>> the web)
>>>
>>> Using this approach we can have 10 clusters doing the job. (We actually
>>> are worried about the cost)
>>>
>>> Can you please help us evaluate this strategy? I want to hear
>>> communities opinion on this.
>>>
>>> My major concerns being,
>>>
>>> 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000
>>> CF right?
>>>
>>> 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number
>>> of column families increase even when we use multiple keyspace.
>>>
>>> 3. I understand the complexity using multi-cluster for single
>>> application. The code base will get tightly coupled with infrastructure. Is
>>> this the right approach?
>>>
>>> Any suggestion is appreciated.
>>>
>>> Thanks,
>>> Arun
>>>
>>
>>
>

Re: 10000+ CF support from Cassandra

Reply via email to