when I say dynamic column, I mean non-static columns of different types within the same row. Some could be an object or one of the defined datatypes.
with thrift I use the appropriate serializer to handle these dynamic columns. On Fri, Jun 13, 2014 at 4:55 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > Well, before talking and discussing about "dynamic columns", we should > first define it clearly. What do people mean by "dynamic columns" exactly ? > Is it the ability to add many columns "of same type" to an existing > physical row? If yes then CQL3 does support it with clustering columns. > > > On Fri, Jun 13, 2014 at 10:36 PM, Mark Greene <green...@gmail.com> wrote: > >> Yeah I don't anticipate more than 1000 properties, well under in fact. I >> guess the trade off of using the clustered columns is that I'd have a table >> that would be tall and skinny which also has its challenges w/r/t memory. >> >> I'll look into your suggestion a bit more and consider some others around >> a hybrid of CQL and Thrift (where necssary). But from a newb's perspective, >> I sense the community is unsettled around this concept of truly dynamic >> columns. Coming from an HBase background, it's a consideration I didn't >> anticipate having to evaluate. >> >> >> -- >> about.me <http://about.me/markgreene> >> >> >> On Fri, Jun 13, 2014 at 4:19 PM, DuyHai Doan <doanduy...@gmail.com> >> wrote: >> >>> Hi Mark >>> >>> I believe that in your table you want to have some "common" fields that >>> will be there whatever customer is, and other fields that are entirely >>> customer-dependent, isn't it ? >>> >>> In this case, creating a table with static columns for the common >>> fields and a clustering column representing all custom fields defined by a >>> customer could be a solution (see here for static column: >>> https://issues.apache.org/jira/browse/CASSANDRA-6561 ) >>> >>> CREATE TABLE user_data ( >>> user_id bigint, >>> user_firstname text static, >>> user_lastname text static, >>> ... >>> custom_property_name text, >>> custom_property_value text, >>> PRIMARY KEY(user_id, custom_property_name, custom_property_value)); >>> >>> Please note that with this solution you need to have "at least one" >>> custom property per customer to make it work >>> >>> The only thing to take care of is the type of custom_property_value. >>> You need to define it once for all. To accommodate for dynamic types, you >>> can either save the value as blob or text(as JSON) and take care of the >>> serialization/deserialization yourself at the client side >>> >>> As an alternative you can save custom properties in a map, provided >>> that their number is not too large. But considering the business case of >>> CRM, I believe that it's quite rare and user has more than 1000 custom >>> properties isn't it ? >>> >>> >>> >>> On Fri, Jun 13, 2014 at 10:03 PM, Mark Greene <green...@gmail.com> >>> wrote: >>> >>>> My use case requires the support of arbitrary columns much like a CRM. >>>> My users can define 'custom' fields within the application. Ideally I >>>> wouldn't have to change the schema at all, which is why I like the old >>>> thrift approach rather than the CQL approach. >>>> >>>> Having said all that, I'd be willing to adapt my API to make explicit >>>> schema changes to Cassandra whenever my user makes a change to their custom >>>> fields if that's an accepted practice. >>>> >>>> Ultimately, I'm trying to figure out of the Cassandra community intends >>>> to support true schemaless use cases in the future. >>>> >>>> -- >>>> about.me <http://about.me/markgreene> >>>> >>>> >>>> On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan <doanduy...@gmail.com> >>>> wrote: >>>> >>>>> This strikes me as bad practice in the world of multi tenant systems. >>>>> I don't want to create a table per customer. So I'm wondering if >>>>> dynamically modifying the table is an accepted practice? --> Can you give >>>>> some details about your use case ? How would you "alter" a table structure >>>>> to adapt it to a new customer ? >>>>> >>>>> Wouldn't it be better to model your table so that it supports >>>>> addition/removal of customer ? >>>>> >>>>> >>>>> >>>>> On Fri, Jun 13, 2014 at 9:00 PM, Mark Greene <green...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks DuyHai, >>>>>> >>>>>> I have a follow up question to #2. You mentioned ideally I would >>>>>> create a new table instead of mutating an existing one. >>>>>> >>>>>> This strikes me as bad practice in the world of multi tenant systems. >>>>>> I don't want to create a table per customer. So I'm wondering if >>>>>> dynamically modifying the table is an accepted practice? >>>>>> >>>>>> -- >>>>>> about.me <http://about.me/markgreene> >>>>>> >>>>>> >>>>>> On Fri, Jun 13, 2014 at 2:54 PM, DuyHai Doan <doanduy...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hello Mark >>>>>>> >>>>>>> Dynamic columns, as you said, are perfectly supported by CQL3 via >>>>>>> clustering columns. And no, using collections for storing dynamic data >>>>>>> is a >>>>>>> very bad idea if the cardinality is very high (>> 1000 elements) >>>>>>> >>>>>>> 1) Is using Thrift a valid approach in the era of CQL? --> Less >>>>>>> and less. Unless you are looking for extreme performance, you'd better >>>>>>> off >>>>>>> choosing CQL3. The ease of programming and querying with CQL3 does worth >>>>>>> the small overhead in CPU >>>>>>> >>>>>>> 2) If CQL is the best practice, should I alter the schema at >>>>>>> runtime when I detect I need to do an schema mutation? --> Ideally you >>>>>>> should not alter schema but create a new table to adapt to your changing >>>>>>> requirements. >>>>>>> >>>>>>> 3) If I utilize CQL collections, will Cassandra page the entire >>>>>>> thing into the heap? --> Of course. All collections and maps in >>>>>>> Cassandra >>>>>>> are eagerly loaded entirely in memory on server side. That's why it is >>>>>>> recommended to limit their cardinality to ~ 1000 elements >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 13, 2014 at 8:33 PM, Mark Greene <green...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I'm looking for some best practices w/r/t supporting arbitrary >>>>>>>> columns. It seems from the docs I've read around CQL that they are >>>>>>>> supported in some capacity via collections but you can't exceed 64K in >>>>>>>> size. For my requirements that would cause problems. >>>>>>>> >>>>>>>> So my questions are: >>>>>>>> >>>>>>>> 1) Is using Thrift a valid approach in the era of CQL? >>>>>>>> >>>>>>>> 2) If CQL is the best practice, should I alter the schema at >>>>>>>> runtime when I detect I need to do an schema mutation? >>>>>>>> >>>>>>>> 3) If I utilize CQL collections, will Cassandra page the entire >>>>>>>> thing into the heap? >>>>>>>> >>>>>>>> My data model is akin to a CRM, arbitrary column definitions per >>>>>>>> customer. >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Mark >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >