Eric, Is the keyspace as a multitenancy solution as bad as the many tables pattern? Is the memory overhead of keyspaces as heavy as that of tables?
Cheers, Brian On Tuesday, March 1, 2016, Eric Stevens <migh...@gmail.com> wrote: > It's definitely not true for every use case of a large number of tables, > but for many uses where you'd be tempted to do that, adding whatever would > have driven your table naming instead as a column in your partition key on > a smaller number of tables will meet your needs. This is especially true > if you're looking to solve multi-tenancy, unless you let your tenants > dynamically drive your schema (which is a separate can of worms). > > On Tue, Mar 1, 2016 at 9:08 AM Jack Krupansky <jack.krupan...@gmail.com > <javascript:_e(%7B%7D,'cvml','jack.krupan...@gmail.com');>> wrote: > >> I don't think Cassandra was "purposefully developed" for some target >> number of tables - there is no evidence of any such an explicit intent. >> Instead, it would be fair to say that Cassandra was "not purposefully >> developed" with a goal of supporting "large numbers of tables." Sometimes >> features and capabilities come for free or as a side effect of the >> technologies used, but usually specific features and specific capabilities >> (such as large numbers of tables) require explicit intent and explicit >> effort. >> >> One could indeed endeavor to design a data store (I'm not even sure it >> would still be considered a database per se) that supported either large >> numbers of tables or an additional level of storage model in between table >> and row (call it "group" maybe or "sub-table".) But obviously Cassandra was >> not designed with that goal in mind. >> >> Traditionally, a "table" is a defined relation over a set of data. >> Relation and data are distinct concepts. And a relation name is not simply >> a Java-style "object". A relation (table) name is supposed to represent an >> abstraction or entity type, while essentially all of the cases I have heard >> of for wanting thousands (or even hundreds) of tables are trying to use >> table as more of a container for a group of rows for a specific entity >> instance rather than a distinct entity type. Granted, Cassandra is not >> obligated to be limited to the relational model, but Cassandra, especially >> CQL, is intentionally modeled reasonably closely with the relational model >> in terms of the data modeling abstractions even though the storage engine >> is designed to scale across nodes. >> >> You could file a Jira requesting such a feature improvement. And then we >> would see if sentiment has shifted over the years. >> >> The key thing is to offer up a use case that warrants support for large >> numbers of tables. So far, it has usually been the case that the perceived >> need for separate tables could easily be met using clustering columns of a >> single table. >> >> Seriously, if you guys can define a legitimate use case that can't easily >> be handled by a single table, that could get the discussion started. >> >> -- Jack Krupansky >> >> On Tue, Mar 1, 2016 at 9:11 AM, Fernando Jimenez < >> fernando.jime...@wealth-port.com >> <javascript:_e(%7B%7D,'cvml','fernando.jime...@wealth-port.com');>> >> wrote: >> >>> Hi Jack >>> >>> Being purposefully developed to only handle up to “a few hundred” tables >>> is reason enough. I accept that, and likely a use case with many tables was >>> never really considered. But I would still like to understand the design >>> choices made so perhaps we gain some confidence level in this upper limit >>> in the number of tables. The best estimate we have so far is “a few >>> hundred” which is a bit vague. >>> >>> Regarding scaling, I’m not talking about scaling in terms of data >>> volume, but on how the data is structured. One thousand tables with one row >>> each is the same data volume as one table with one thousand rows, excluding >>> any data structures required to maintain the extra tables. But whereas the >>> first seems likely to bring a Cassandra cluster to its knees, the second >>> will run happily on a single node cluster in a low end machine. >>> >>> We will design our code to use a single table to avoid having nightmares >>> with this issue. But if there is any authoritative documentation on this >>> characteristic of Cassandra, I would love to know more. >>> >>> FJ >>> >>> >>> On 01 Mar 2016, at 14:23, Jack Krupansky <jack.krupan...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','jack.krupan...@gmail.com');>> wrote: >>> >>> I don't think there are any "reasons behind it." It is simply empirical >>> experience - as reported here. >>> >>> Cassandra scales in two dimension - number of rows per node and number >>> of nodes. If some source of information lead you to believe otherwise, >>> please point out the source so that we can endeavor to correct it. >>> >>> The exact number of rows per node and tables per node will always have >>> to be evaluated empirically - a proof of concept implementation, since it >>> all depends on the mix of capabilities of your hardware combined with your >>> specific data model, your specific data values, your specific access >>> patterns, and your specific load. And it also depends on your own personal >>> tolerance for degradation of latency and throughput - some people might >>> find a given set of performance metrics acceptable while other might not. >>> >>> -- Jack Krupansky >>> >>> On Tue, Mar 1, 2016 at 3:54 AM, Fernando Jimenez < >>> fernando.jime...@wealth-port.com >>> <javascript:_e(%7B%7D,'cvml','fernando.jime...@wealth-port.com');>> >>> wrote: >>> >>>> Hi Tommaso >>>> >>>> It’s not that I _need_ a large number of tables. This approach maps >>>> easily to the problem we are trying to solve, but it’s becoming clear it’s >>>> not the right approach. >>>> >>>> At the moment I’m trying to understand the limitations in Cassandra >>>> regarding number of Tables and the reasons behind it. I’ve come to the >>>> email list as my Google-foo is not giving me what I’m looking for :( >>>> >>>> FJ >>>> >>>> >>>> >>>> On 01 Mar 2016, at 09:36, tommaso barbugli <tbarbu...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','tbarbu...@gmail.com');>> wrote: >>>> >>>> Hi Fernando, >>>> >>>> I used to have a cluster with ~300 tables (1 keyspace) on C* 2.0, it >>>> was a real pain in terms of operations. Repairs were terribly slow, boot of >>>> C* slowed down and in general tracking table metrics becomes bit more work. >>>> Why do you need this high number of tables? >>>> >>>> Tommaso >>>> >>>> On Tue, Mar 1, 2016 at 9:16 AM, Fernando Jimenez < >>>> fernando.jime...@wealth-port.com >>>> <javascript:_e(%7B%7D,'cvml','fernando.jime...@wealth-port.com');>> >>>> wrote: >>>> >>>>> Hi Jack >>>>> >>>>> By entry I mean row >>>>> >>>>> Apologies for the “obsolete terminology”. When I first looked at >>>>> Cassandra it was still on CQL2, and now that I’m looking at it again I’ve >>>>> defaulted to the terms I already knew. I will bear it in mind and call >>>>> them >>>>> tables from now on. >>>>> >>>>> Is there any documentation about this limit? for example, I’d be keen >>>>> to know how much memory is consumed per table, and I’m also curious about >>>>> the reasons for keeping this in memory. I’m trying to understand the >>>>> limitations here, rather than challenge them. >>>>> >>>>> So far I found nothing in my search, hence why I had to resort to some >>>>> “load testing” to see what happens when you push the table count high >>>>> >>>>> Thanks >>>>> FJ >>>>> >>>>> >>>>> On 01 Mar 2016, at 06:23, Jack Krupansky <jack.krupan...@gmail.com >>>>> <javascript:_e(%7B%7D,'cvml','jack.krupan...@gmail.com');>> wrote: >>>>> >>>>> 3,000 entries? What's an "entry"? Do you mean row, column, or... what? >>>>> >>>>> You are using the obsolete terminology of CQL2 and Thrift - column >>>>> family. With CQL3 you should be creating "tables". The practical >>>>> recommendation of an upper limit of a few hundred tables across all key >>>>> spaces remains. >>>>> >>>>> Technically you can go higher and technically you can reduce the >>>>> overhead per table (an undocumented Jira - intentionally undocumented >>>>> since >>>>> it is strongly not recommended), but... it is unlikely that you will be >>>>> happy with the results. >>>>> >>>>> What is the nature of the use case? >>>>> >>>>> You basically have two choices: an additional cluster column to >>>>> distinguish categories of table, or separate clusters for each few hundred >>>>> of tables. >>>>> >>>>> >>>>> -- Jack Krupansky >>>>> >>>>> On Mon, Feb 29, 2016 at 12:30 PM, Fernando Jimenez < >>>>> fernando.jime...@wealth-port.com >>>>> <javascript:_e(%7B%7D,'cvml','fernando.jime...@wealth-port.com');>> >>>>> wrote: >>>>> >>>>>> Hi all >>>>>> >>>>>> I have a use case for Cassandra that would require creating a large >>>>>> number of column families. I have found references to early versions of >>>>>> Cassandra where each column family would require a fixed amount of memory >>>>>> on all nodes, effectively imposing an upper limit on the total number of >>>>>> CFs. I have also seen rumblings that this may have been fixed in later >>>>>> versions. >>>>>> >>>>>> To put the question to rest, I have setup a DSE sandbox and created >>>>>> some code to generate column families populated with 3,000 entries each. >>>>>> >>>>>> Unfortunately I have now hit this issue: >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-9291 >>>>>> >>>>>> So I will have to retest against Cassandra 3.0 instead >>>>>> >>>>>> However, I would like to understand the limitations regarding >>>>>> creation of column families. >>>>>> >>>>>> * Is there a practical upper limit? >>>>>> * is this a fixed limit, or does it scale as more nodes are added >>>>>> into the cluster? >>>>>> * Is there a difference between one keyspace with thousands of column >>>>>> families, vs thousands of keyspaces with only a few column families each? >>>>>> >>>>>> I haven’t found any hard evidence/documentation to help me here, but >>>>>> if you can point me in the right direction, I will oblige and RTFM away. >>>>>> >>>>>> Many thanks for your help! >>>>>> >>>>>> Cheers >>>>>> FJ >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> -- Cheers, Brian http://www.integrallis.com