Yeah I don't anticipate more than 1000 properties, well under in fact. I guess the trade off of using the clustered columns is that I'd have a table that would be tall and skinny which also has its challenges w/r/t memory.
I'll look into your suggestion a bit more and consider some others around a hybrid of CQL and Thrift (where necssary). But from a newb's perspective, I sense the community is unsettled around this concept of truly dynamic columns. Coming from an HBase background, it's a consideration I didn't anticipate having to evaluate. -- about.me <http://about.me/markgreene> On Fri, Jun 13, 2014 at 4:19 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > Hi Mark > > I believe that in your table you want to have some "common" fields that > will be there whatever customer is, and other fields that are entirely > customer-dependent, isn't it ? > > In this case, creating a table with static columns for the common fields > and a clustering column representing all custom fields defined by a > customer could be a solution (see here for static column: > https://issues.apache.org/jira/browse/CASSANDRA-6561 ) > > CREATE TABLE user_data ( > user_id bigint, > user_firstname text static, > user_lastname text static, > ... > custom_property_name text, > custom_property_value text, > PRIMARY KEY(user_id, custom_property_name, custom_property_value)); > > Please note that with this solution you need to have "at least one" > custom property per customer to make it work > > The only thing to take care of is the type of custom_property_value. You > need to define it once for all. To accommodate for dynamic types, you can > either save the value as blob or text(as JSON) and take care of the > serialization/deserialization yourself at the client side > > As an alternative you can save custom properties in a map, provided that > their number is not too large. But considering the business case of CRM, I > believe that it's quite rare and user has more than 1000 custom properties > isn't it ? > > > > On Fri, Jun 13, 2014 at 10:03 PM, Mark Greene <green...@gmail.com> wrote: > >> My use case requires the support of arbitrary columns much like a CRM. My >> users can define 'custom' fields within the application. Ideally I wouldn't >> have to change the schema at all, which is why I like the old thrift >> approach rather than the CQL approach. >> >> Having said all that, I'd be willing to adapt my API to make explicit >> schema changes to Cassandra whenever my user makes a change to their custom >> fields if that's an accepted practice. >> >> Ultimately, I'm trying to figure out of the Cassandra community intends >> to support true schemaless use cases in the future. >> >> -- >> about.me <http://about.me/markgreene> >> >> >> On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan <doanduy...@gmail.com> >> wrote: >> >>> This strikes me as bad practice in the world of multi tenant systems. I >>> don't want to create a table per customer. So I'm wondering if dynamically >>> modifying the table is an accepted practice? --> Can you give some details >>> about your use case ? How would you "alter" a table structure to adapt it >>> to a new customer ? >>> >>> Wouldn't it be better to model your table so that it supports >>> addition/removal of customer ? >>> >>> >>> >>> On Fri, Jun 13, 2014 at 9:00 PM, Mark Greene <green...@gmail.com> wrote: >>> >>>> Thanks DuyHai, >>>> >>>> I have a follow up question to #2. You mentioned ideally I would create >>>> a new table instead of mutating an existing one. >>>> >>>> This strikes me as bad practice in the world of multi tenant systems. I >>>> don't want to create a table per customer. So I'm wondering if dynamically >>>> modifying the table is an accepted practice? >>>> >>>> -- >>>> about.me <http://about.me/markgreene> >>>> >>>> >>>> On Fri, Jun 13, 2014 at 2:54 PM, DuyHai Doan <doanduy...@gmail.com> >>>> wrote: >>>> >>>>> Hello Mark >>>>> >>>>> Dynamic columns, as you said, are perfectly supported by CQL3 via >>>>> clustering columns. And no, using collections for storing dynamic data is >>>>> a >>>>> very bad idea if the cardinality is very high (>> 1000 elements) >>>>> >>>>> 1) Is using Thrift a valid approach in the era of CQL? --> Less and >>>>> less. Unless you are looking for extreme performance, you'd better off >>>>> choosing CQL3. The ease of programming and querying with CQL3 does worth >>>>> the small overhead in CPU >>>>> >>>>> 2) If CQL is the best practice, should I alter the schema at runtime >>>>> when I detect I need to do an schema mutation? --> Ideally you should not >>>>> alter schema but create a new table to adapt to your changing >>>>> requirements. >>>>> >>>>> 3) If I utilize CQL collections, will Cassandra page the entire thing >>>>> into the heap? --> Of course. All collections and maps in Cassandra are >>>>> eagerly loaded entirely in memory on server side. That's why it is >>>>> recommended to limit their cardinality to ~ 1000 elements >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jun 13, 2014 at 8:33 PM, Mark Greene <green...@gmail.com> >>>>> wrote: >>>>> >>>>>> I'm looking for some best practices w/r/t supporting arbitrary >>>>>> columns. It seems from the docs I've read around CQL that they are >>>>>> supported in some capacity via collections but you can't exceed 64K in >>>>>> size. For my requirements that would cause problems. >>>>>> >>>>>> So my questions are: >>>>>> >>>>>> 1) Is using Thrift a valid approach in the era of CQL? >>>>>> >>>>>> 2) If CQL is the best practice, should I alter the schema at runtime >>>>>> when I detect I need to do an schema mutation? >>>>>> >>>>>> 3) If I utilize CQL collections, will Cassandra page the entire >>>>>> thing into the heap? >>>>>> >>>>>> My data model is akin to a CRM, arbitrary column definitions per >>>>>> customer. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Mark >>>>>> >>>>> >>>>> >>>> >>> >> >