This sort of work sounds much more like a Hadoop/Hive/Pig type of analysis. 

What are your latency requirements on queries? Are they ad-hoc or part of an 
application? What is the case where you would need to change an existing value? 
If it is write once, then Hadoop/Hive is great, if it changes randomly, then 
not so much. 

Cassandra has limitations that it does not support aggregation, that must be 
done by a client. In my experience it is really suited to quickly write lots of 
data and look it back up in a "random io" type manner if you already know the 
"key" you are looking for. 

If you have the high speed write and rewrite needs, but also the "full data" 
analytical requirements, there are plugins for using C* as a backing store for 
Pig/Hive. It is a little finicky to get working depending on all your versions 
but does work fairly well in my limited experience. 

Perhaps with a little better understanding of your workload needs others can 
chime in too. Good luck. 

-Thunder


> On Jan 9, 2014, at 5:15 AM, Naresh Yadav <[email protected]> wrote:
> 
> Hi all,
> 
> I have a use case with huge data which i am not able to design in cassandra.
> 
> Table name : MetricResult      
> 
> Sample Data :
> 
> Metric=Sales, Time=Month,  Period=Jan-10, Tag=U.S.A, Tag=Pen,     Value=10
> Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pencil,  Value=20
> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pen,     Value=30
> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pencil,  Value=10
> Metric=Sales, Time=Month, Period=Feb-10, Tag=India,                      
> Value=90
> Metric=Sales, Time=Year, Period=2010,       Tag=U.S.A,                    
> Value=70
> Metric=Cost,  Time=Year, Period=2010,    Tag=CPU,                     
> Value=8000
> Metric=Cost,  Time=Year,  Period=2010,    Tag=RAM,                    
> Value=4000
> Metric=Cost,  Time=Year  Period=2011,     Tag=CPU,                     
> Value=9000
> Metric=Resource, Time=Week Period=Week1-2013,                      Value=100
> 
> So in above case i have case of 
>          TimeSeries data  i.e Time,Period column
>          Dynamic columns i.e Tag column
>          Indexing on dynamic columns i.e Tag column
>          Aggregations SUM, AVERAGE
>          Same value comes again for a Metric, Time, Period, Tag then 
> overwrite it 
> 
> Queries i need to support :
> --------------------------------------
> a)Give data for Metric=Sales AND Time=Month
>        O/P : 5 rows
> b)Give data for Metric=Sales AND Time=Month AND Period=Jan-10
>        O/P : 2 rows
> c)Give data for Metric=Sales AND Tag=U.S.A
>        O/P : 5 rows
> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen
>        O/P :1 row
> 
> 
> This table can have TB's of data and for a Metric,Period can have millions of 
> rows.
> 
> Please give suggestion to design/model this table in Cassandra. If some 
> limitation in Cassandra then suggest best technology to handle this.
> 
> 
> Thanks
> Naresh

Reply via email to