Re: Help on Designing Cassandra table for my usecase

Naresh Yadav Thu, 09 Jan 2014 21:57:15 -0800

@Thunder thanks for suggesting design but my main problem is
indexing/quering dynamic Tag on each row that is main context of each row
and most of queries will include that..


As an alternative to cassandra, i tried Apache Blur, in blur table i am
able to store exact same data and all queries also worked..so blur  allows
dynamic indexing  of tag column BUT moving away from cassandra, i am
loosing its strength because of that i am not confident on this decision as
data will be huge in my case.

Please guide me on this with better suggestions.

Thanks
Naresh

On Fri, Jan 10, 2014 at 2:33 AM, Thunder Stumpges <
[email protected]> wrote:

> Well I think you have essentially time-series data, which C* should handle
> well, however I think your "Tag" column is going to cause troubles. C* does
> have collection columns, but they are not indexable nor usable in WHERE
> clause. Your example has both the uniqueness of the data (primary key) and
> query filtering on potentially multiple "Tag" columns. That is not
> supported in C* AFAIK.If it were a single Tag, that could be a column that
> is Indexed possibly.
>
> Ignoring that issue with the many different Tags, You could model the
> table as:
>
> CREATE TABLE metric_data (
>   metric text,
>   time text,
>   period text,
>   tag text,
>   value int,
>   PRIMARY KEY( (metric,time), period, tag)
> )
>
> That would make a composite partitioning key on metric and time meaning
> you'd always have to pass those (or else randomly page via TOKEN through
> all rows). After specifying metric and time, you could optionally also
> specify period and/or tag, and results would be ordered (clustered) by
> period. This would satisfy your queries a,b, and d but not c (as you did
> not specify time). If Time was a granularity column, does it even make
> sense to return records across differing time values? What does it mean to
> return the 4 month rows and 1 year row in your example? Could you issue N
> queries in this case (where N is a small number of each of your time
> granularities) ?
>
> I'm not sure how close that gets you, or if you can re-work your concept
> of Tag at all.
> Good luck.
> Thunder
>
>
>
> On Thu, Jan 9, 2014 at 10:45 AM, Hannu Kröger <[email protected]> wrote:
>
>> To my eye that looks something what the traditional analytics systems do.
>> You can check out e.g. Acunu Analytics which uses Cassandra as a backend.
>>
>> Cheers,
>> Hannu
>>
>>
>> 2014/1/9 Naresh Yadav <[email protected]>
>>
>>> Hi all,
>>>
>>> I have a use case with huge data which i am not able to design in
>>> cassandra.
>>>
>>> Table name : MetricResult
>>>
>>> Sample Data :
>>>
>>> Metric=Sales, Time=Month,  Period=Jan-10, Tag=U.S.A, Tag=Pen,
>>> Value=10
>>> Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pencil,  Value=20
>>> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pen,     Value=30
>>> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pencil,  Value=10
>>> Metric=Sales, Time=Month, Period=Feb-10, Tag=India,
>>> Value=90
>>> Metric=Sales, Time=Year, Period=2010,       Tag=U.S.A,
>>>    Value=70
>>> Metric=Cost,  Time=Year, Period=2010,    Tag=CPU,
>>> Value=8000
>>> Metric=Cost,  Time=Year,  Period=2010,    Tag=RAM,
>>> Value=4000
>>> Metric=Cost,  Time=Year  Period=2011,     Tag=CPU,
>>> Value=9000
>>> Metric=Resource, Time=Week Period=Week1-2013,
>>> Value=100
>>>
>>> So in above case i have case of
>>>          TimeSeries data  i.e Time,Period column
>>>          Dynamic columns i.e Tag column
>>>          Indexing on dynamic columns i.e Tag column
>>>          Aggregations SUM, AVERAGE
>>>          Same value comes again for a Metric, Time, Period, Tag then
>>> overwrite it
>>>
>>> Queries i need to support :
>>> --------------------------------------
>>> a)Give data for Metric=Sales AND Time=Month
>>>        O/P : 5 rows
>>> b)Give data for Metric=Sales AND Time=Month AND Period=Jan-10
>>>        O/P : 2 rows
>>> c)Give data for Metric=Sales AND Tag=U.S.A
>>>        O/P : 5 rows
>>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen
>>>        O/P :1 row
>>>
>>>
>>> This table can have TB's of data and for a Metric,Period can have
>>> millions of rows.
>>>
>>> Please give suggestion to design/model this table in Cassandra. If some
>>> limitation in Cassandra then suggest best technology to handle this.
>>>
>>>
>>> Thanks
>>> Naresh
>>>
>>
>>
>

Re: Help on Designing Cassandra table for my usecase

Reply via email to