Re: Dynamic Columns in Cassandra 2.X

Mark Greene Fri, 13 Jun 2014 13:37:22 -0700

Yeah I don't anticipate more than 1000 properties, well under in fact. I
guess the trade off of using the clustered columns is that I'd have a table
that would be tall and skinny which also has its challenges w/r/t memory.


I'll look into your suggestion a bit more and consider some others around a
hybrid of CQL and Thrift (where necssary). But from a newb's perspective, I
sense the community is unsettled around this concept of truly dynamic
columns. Coming from an HBase background, it's a consideration I didn't
anticipate having to evaluate.


--
about.me <http://about.me/markgreene>


On Fri, Jun 13, 2014 at 4:19 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Hi Mark
>
>  I believe that in your table you want to have some "common" fields that
> will be there whatever customer is, and other fields that are entirely
> customer-dependent, isn't it ?
>
>  In this case, creating a table with static columns for the common fields
> and a clustering column representing all custom fields defined by a
> customer could be a solution (see here for static column:
> https://issues.apache.org/jira/browse/CASSANDRA-6561 )
>
> CREATE TABLE user_data (
>    user_id bigint,
>    user_firstname text static,
>    user_lastname text static,
>    ...
>    custom_property_name text,
>    custom_property_value text,
>    PRIMARY KEY(user_id, custom_property_name, custom_property_value));
>
>  Please note that with this solution you need to have "at least one"
> custom property per customer to make it work
>
>  The only thing to take care of is the type of custom_property_value. You
> need to define it once for all. To accommodate for dynamic types, you can
> either save the value as blob or text(as JSON) and take care of the
> serialization/deserialization yourself at the client side
>
>  As an alternative you can save custom properties in a map, provided that
> their number is not too large. But considering the business case of CRM, I
> believe that it's quite rare and user has more than 1000 custom properties
> isn't it ?
>
>
>
> On Fri, Jun 13, 2014 at 10:03 PM, Mark Greene <green...@gmail.com> wrote:
>
>> My use case requires the support of arbitrary columns much like a CRM. My
>> users can define 'custom' fields within the application. Ideally I wouldn't
>> have to change the schema at all, which is why I like the old thrift
>> approach rather than the CQL approach.
>>
>> Having said all that, I'd be willing to adapt my API to make explicit
>> schema changes to Cassandra whenever my user makes a change to their custom
>> fields if that's an accepted practice.
>>
>> Ultimately, I'm trying to figure out of the Cassandra community intends
>> to support true schemaless use cases in the future.
>>
>> --
>> about.me <http://about.me/markgreene>
>>
>>
>> On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>>> This strikes me as bad practice in the world of multi tenant systems. I
>>> don't want to create a table per customer. So I'm wondering if dynamically
>>> modifying the table is an accepted practice?  --> Can you give some details
>>> about your use case ? How would you "alter" a table structure to adapt it
>>> to a new customer ?
>>>
>>> Wouldn't it be better to model your table so that it supports
>>> addition/removal of customer ?
>>>
>>>
>>>
>>> On Fri, Jun 13, 2014 at 9:00 PM, Mark Greene <green...@gmail.com> wrote:
>>>
>>>> Thanks DuyHai,
>>>>
>>>> I have a follow up question to #2. You mentioned ideally I would create
>>>> a new table instead of mutating an existing one.
>>>>
>>>> This strikes me as bad practice in the world of multi tenant systems. I
>>>> don't want to create a table per customer. So I'm wondering if dynamically
>>>> modifying the table is an accepted practice?
>>>>
>>>> --
>>>> about.me <http://about.me/markgreene>
>>>>
>>>>
>>>> On Fri, Jun 13, 2014 at 2:54 PM, DuyHai Doan <doanduy...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Mark
>>>>>
>>>>>  Dynamic columns, as you said, are perfectly supported by CQL3 via
>>>>> clustering columns. And no, using collections for storing dynamic data is 
>>>>> a
>>>>> very bad idea if the cardinality is very high (>> 1000 elements)
>>>>>
>>>>> 1)  Is using Thrift a valid approach in the era of CQL?  --> Less and
>>>>> less. Unless you are looking for extreme performance, you'd better off
>>>>> choosing CQL3. The ease of programming and querying with CQL3 does worth
>>>>> the small overhead in CPU
>>>>>
>>>>> 2) If CQL is the best practice,  should I alter the schema at runtime
>>>>> when I detect I need to do an schema mutation?  --> Ideally you should not
>>>>> alter schema but create a new table to adapt to your changing 
>>>>> requirements.
>>>>>
>>>>> 3) If I utilize CQL collections, will Cassandra page the entire thing
>>>>> into the heap?  --> Of course. All collections and maps in Cassandra are
>>>>> eagerly loaded entirely in memory on server side. That's why it is
>>>>> recommended to limit their cardinality to ~ 1000 elements
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 13, 2014 at 8:33 PM, Mark Greene <green...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm looking for some best practices w/r/t supporting arbitrary
>>>>>> columns. It seems from the docs I've read around CQL that they are
>>>>>> supported in some capacity via collections but you can't exceed 64K in
>>>>>> size. For my requirements that would cause problems.
>>>>>>
>>>>>> So my questions are:
>>>>>>
>>>>>> 1)  Is using Thrift a valid approach in the era of CQL?
>>>>>>
>>>>>> 2) If CQL is the best practice,  should I alter the schema at runtime
>>>>>> when I detect I need to do an schema mutation?
>>>>>>
>>>>>>  3) If I utilize CQL collections, will Cassandra page the entire
>>>>>> thing into the heap?
>>>>>>
>>>>>> My data model is akin to a CRM, arbitrary column definitions per
>>>>>> customer.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Mark
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Dynamic Columns in Cassandra 2.X

Reply via email to