This sort of work sounds much more like a Hadoop/Hive/Pig type of analysis.
What are your latency requirements on queries? Are they ad-hoc or part of an application? What is the case where you would need to change an existing value? If it is write once, then Hadoop/Hive is great, if it changes randomly, then not so much. Cassandra has limitations that it does not support aggregation, that must be done by a client. In my experience it is really suited to quickly write lots of data and look it back up in a "random io" type manner if you already know the "key" you are looking for. If you have the high speed write and rewrite needs, but also the "full data" analytical requirements, there are plugins for using C* as a backing store for Pig/Hive. It is a little finicky to get working depending on all your versions but does work fairly well in my limited experience. Perhaps with a little better understanding of your workload needs others can chime in too. Good luck. -Thunder > On Jan 9, 2014, at 5:15 AM, Naresh Yadav <[email protected]> wrote: > > Hi all, > > I have a use case with huge data which i am not able to design in cassandra. > > Table name : MetricResult > > Sample Data : > > Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pen, Value=10 > Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pencil, Value=20 > Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pen, Value=30 > Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pencil, Value=10 > Metric=Sales, Time=Month, Period=Feb-10, Tag=India, > Value=90 > Metric=Sales, Time=Year, Period=2010, Tag=U.S.A, > Value=70 > Metric=Cost, Time=Year, Period=2010, Tag=CPU, > Value=8000 > Metric=Cost, Time=Year, Period=2010, Tag=RAM, > Value=4000 > Metric=Cost, Time=Year Period=2011, Tag=CPU, > Value=9000 > Metric=Resource, Time=Week Period=Week1-2013, Value=100 > > So in above case i have case of > TimeSeries data i.e Time,Period column > Dynamic columns i.e Tag column > Indexing on dynamic columns i.e Tag column > Aggregations SUM, AVERAGE > Same value comes again for a Metric, Time, Period, Tag then > overwrite it > > Queries i need to support : > -------------------------------------- > a)Give data for Metric=Sales AND Time=Month > O/P : 5 rows > b)Give data for Metric=Sales AND Time=Month AND Period=Jan-10 > O/P : 2 rows > c)Give data for Metric=Sales AND Tag=U.S.A > O/P : 5 rows > d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen > O/P :1 row > > > This table can have TB's of data and for a Metric,Period can have millions of > rows. > > Please give suggestion to design/model this table in Cassandra. If some > limitation in Cassandra then suggest best technology to handle this. > > > Thanks > Naresh
