> Sorry for this naive question but how important is this tuning? Can this have a huge impact in production?
Massive. Here's a graph of when we did some JVM tuning at my previous company: http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png About an order of magnitude difference in performance. Jon On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya <chaitan64a...@gmail.com> wrote: > Thanks Jon and Jack, > > > I strongly advise against this approach. > Jon, I think so too. But so you actually foresee any problems with this > approach? > I can think of a few. [I want to evaluate if we can live with this problem] > > - No more CQL. > - No data types, everything needs to be a blob. > - Limited clustering Keys and default clustering order. > > > First off, different workloads need different tuning. > Sorry for this naive question but how important is this tuning? Can this > have a huge impact in production? > > > You might want to consider a model where you have an application layer > that maps logical tenant tables into partition keys within a single large > Casandra table, or at least a relatively small number of Cassandra tables. > It will depend on the typical size of your tenant tables - very small ones > would make sense within a single partition, while larger ones should have > separate partitions for a tenant's data. The key here is that tables are > expensive, but partitions are cheap and scale very well with Cassandra. > We are actually trying similar approach. But we don't want to expose this > to application layer. We are attempting to hide this and provide an API. > > > Finally, you said "10 clusters", but did you mean 10 nodes? You might > want to consider a model where you do indeed have multiple clusters, where > each handles a fraction of the tenants, since there is no need for separate > tenants to be on the same cluster. > I meant 10 clusters. We want to split our tables across multiple clusters > if above approach is not possible. [But it seems to be very costly] > > Thanks, > > > > > > > > On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: > >> How big is each of the tables - are they all fairly small or fairly >> large? Small as in no more than thousands of rows or large as in tens of >> millions or hundreds of millions of rows? >> >> Small tables are are not ideal for a Cassandra cluster since the rows >> would be spread out across the nodes, even though it might make more sense >> for each small table to be on a single node. >> >> You might want to consider a model where you have an application layer >> that maps logical tenant tables into partition keys within a single large >> Casandra table, or at least a relatively small number of Cassandra tables. >> It will depend on the typical size of your tenant tables - very small ones >> would make sense within a single partition, while larger ones should have >> separate partitions for a tenant's data. The key here is that tables are >> expensive, but partitions are cheap and scale very well with Cassandra. >> >> Finally, you said "10 clusters", but did you mean 10 nodes? You might >> want to consider a model where you do indeed have multiple clusters, where >> each handles a fraction of the tenants, since there is no need for separate >> tenants to be on the same cluster. >> >> >> -- Jack Krupansky >> >> On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya <chaitan64a...@gmail.com >> > wrote: >> >>> Good Day Everyone, >>> >>> I am very happy with the (almost) linear scalability offered by C*. We >>> had a lot of problems with RDBMS. >>> >>> But, I heard that C* has a limit on number of column families that can >>> be created in a single cluster. >>> The reason being each CF stores 1-2 MB on the JVM heap. >>> >>> In our use case, we have about 10000+ CF and we want to support >>> multi-tenancy. >>> (i.e 10000 * no of tenants) >>> >>> We are new to C* and being from RDBMS background, I would like to >>> understand how to tackle this scenario from your advice. >>> >>> Our plan is to use Off-Heap memtable approach. >>> http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 >>> >>> Each node in the cluster has following configuration >>> 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap) >>> IMO, this should be able to support 1000 CF with no(very less) impact on >>> performance and startup time. >>> >>> We tackle multi-tenancy using different keyspaces.(Solution I found on >>> the web) >>> >>> Using this approach we can have 10 clusters doing the job. (We actually >>> are worried about the cost) >>> >>> Can you please help us evaluate this strategy? I want to hear >>> communities opinion on this. >>> >>> My major concerns being, >>> >>> 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 >>> CF right? >>> >>> 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number >>> of column families increase even when we use multiple keyspace. >>> >>> 3. I understand the complexity using multi-cluster for single >>> application. The code base will get tightly coupled with infrastructure. Is >>> this the right approach? >>> >>> Any suggestion is appreciated. >>> >>> Thanks, >>> Arun >>> >> >> >