Re: Dynamic Columns in Cassandra 2.X

Peter Lin Fri, 13 Jun 2014 14:07:28 -0700

when I say dynamic column, I mean non-static columns of different types
within the same row. Some could be an object or one of the defined
datatypes.


with thrift I use the appropriate serializer to handle these dynamic
columns.


On Fri, Jun 13, 2014 at 4:55 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Well, before talking and discussing about "dynamic columns", we should
> first define it clearly. What do people mean by "dynamic columns" exactly ?
> Is it the ability to add many columns "of same type" to an existing
> physical row?  If yes then CQL3 does support it with clustering columns.
>
>
> On Fri, Jun 13, 2014 at 10:36 PM, Mark Greene <green...@gmail.com> wrote:
>
>> Yeah I don't anticipate more than 1000 properties, well under in fact. I
>> guess the trade off of using the clustered columns is that I'd have a table
>> that would be tall and skinny which also has its challenges w/r/t memory.
>>
>> I'll look into your suggestion a bit more and consider some others around
>> a hybrid of CQL and Thrift (where necssary). But from a newb's perspective,
>> I sense the community is unsettled around this concept of truly dynamic
>> columns. Coming from an HBase background, it's a consideration I didn't
>> anticipate having to evaluate.
>>
>>
>> --
>> about.me <http://about.me/markgreene>
>>
>>
>> On Fri, Jun 13, 2014 at 4:19 PM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>>> Hi Mark
>>>
>>>  I believe that in your table you want to have some "common" fields that
>>> will be there whatever customer is, and other fields that are entirely
>>> customer-dependent, isn't it ?
>>>
>>>  In this case, creating a table with static columns for the common
>>> fields and a clustering column representing all custom fields defined by a
>>> customer could be a solution (see here for static column:
>>> https://issues.apache.org/jira/browse/CASSANDRA-6561 )
>>>
>>> CREATE TABLE user_data (
>>>    user_id bigint,
>>>    user_firstname text static,
>>>    user_lastname text static,
>>>    ...
>>>    custom_property_name text,
>>>    custom_property_value text,
>>>    PRIMARY KEY(user_id, custom_property_name, custom_property_value));
>>>
>>>  Please note that with this solution you need to have "at least one"
>>> custom property per customer to make it work
>>>
>>>  The only thing to take care of is the type of custom_property_value.
>>> You need to define it once for all. To accommodate for dynamic types, you
>>> can either save the value as blob or text(as JSON) and take care of the
>>> serialization/deserialization yourself at the client side
>>>
>>>  As an alternative you can save custom properties in a map, provided
>>> that their number is not too large. But considering the business case of
>>> CRM, I believe that it's quite rare and user has more than 1000 custom
>>> properties isn't it ?
>>>
>>>
>>>
>>> On Fri, Jun 13, 2014 at 10:03 PM, Mark Greene <green...@gmail.com>
>>> wrote:
>>>
>>>> My use case requires the support of arbitrary columns much like a CRM.
>>>> My users can define 'custom' fields within the application. Ideally I
>>>> wouldn't have to change the schema at all, which is why I like the old
>>>> thrift approach rather than the CQL approach.
>>>>
>>>> Having said all that, I'd be willing to adapt my API to make explicit
>>>> schema changes to Cassandra whenever my user makes a change to their custom
>>>> fields if that's an accepted practice.
>>>>
>>>> Ultimately, I'm trying to figure out of the Cassandra community intends
>>>> to support true schemaless use cases in the future.
>>>>
>>>> --
>>>> about.me <http://about.me/markgreene>
>>>>
>>>>
>>>> On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan <doanduy...@gmail.com>
>>>> wrote:
>>>>
>>>>> This strikes me as bad practice in the world of multi tenant systems.
>>>>> I don't want to create a table per customer. So I'm wondering if
>>>>> dynamically modifying the table is an accepted practice?  --> Can you give
>>>>> some details about your use case ? How would you "alter" a table structure
>>>>> to adapt it to a new customer ?
>>>>>
>>>>> Wouldn't it be better to model your table so that it supports
>>>>> addition/removal of customer ?
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 13, 2014 at 9:00 PM, Mark Greene <green...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks DuyHai,
>>>>>>
>>>>>> I have a follow up question to #2. You mentioned ideally I would
>>>>>> create a new table instead of mutating an existing one.
>>>>>>
>>>>>> This strikes me as bad practice in the world of multi tenant systems.
>>>>>> I don't want to create a table per customer. So I'm wondering if
>>>>>> dynamically modifying the table is an accepted practice?
>>>>>>
>>>>>> --
>>>>>> about.me <http://about.me/markgreene>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 13, 2014 at 2:54 PM, DuyHai Doan <doanduy...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Mark
>>>>>>>
>>>>>>>  Dynamic columns, as you said, are perfectly supported by CQL3 via
>>>>>>> clustering columns. And no, using collections for storing dynamic data 
>>>>>>> is a
>>>>>>> very bad idea if the cardinality is very high (>> 1000 elements)
>>>>>>>
>>>>>>> 1)  Is using Thrift a valid approach in the era of CQL?  --> Less
>>>>>>> and less. Unless you are looking for extreme performance, you'd better 
>>>>>>> off
>>>>>>> choosing CQL3. The ease of programming and querying with CQL3 does worth
>>>>>>> the small overhead in CPU
>>>>>>>
>>>>>>> 2) If CQL is the best practice,  should I alter the schema at
>>>>>>> runtime when I detect I need to do an schema mutation?  --> Ideally you
>>>>>>> should not alter schema but create a new table to adapt to your changing
>>>>>>> requirements.
>>>>>>>
>>>>>>> 3) If I utilize CQL collections, will Cassandra page the entire
>>>>>>> thing into the heap?  --> Of course. All collections and maps in 
>>>>>>> Cassandra
>>>>>>> are eagerly loaded entirely in memory on server side. That's why it is
>>>>>>> recommended to limit their cardinality to ~ 1000 elements
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 13, 2014 at 8:33 PM, Mark Greene <green...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm looking for some best practices w/r/t supporting arbitrary
>>>>>>>> columns. It seems from the docs I've read around CQL that they are
>>>>>>>> supported in some capacity via collections but you can't exceed 64K in
>>>>>>>> size. For my requirements that would cause problems.
>>>>>>>>
>>>>>>>> So my questions are:
>>>>>>>>
>>>>>>>> 1)  Is using Thrift a valid approach in the era of CQL?
>>>>>>>>
>>>>>>>> 2) If CQL is the best practice,  should I alter the schema at
>>>>>>>> runtime when I detect I need to do an schema mutation?
>>>>>>>>
>>>>>>>>  3) If I utilize CQL collections, will Cassandra page the entire
>>>>>>>> thing into the heap?
>>>>>>>>
>>>>>>>> My data model is akin to a CRM, arbitrary column definitions per
>>>>>>>> customer.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Mark
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Dynamic Columns in Cassandra 2.X

Reply via email to