@thunder thanks for guidance queries will be fired by application on this table when users login and browse the application and also through mobile apps through webservice. Response needs to be quick as user will be doing analysis over this data on the fly. Writes also needs to be fast as there is time limit we need to show this data to user everyday.
Aggregation we can build in application outside cassandra. But we are not clear what table we should design in cassandra for the queries we need..Please give guidance on the possible design to handle dynamic tags indexing for queries.. Thanks Naresh On Thu, Jan 9, 2014 at 9:41 PM, Thunder Stumpges <[email protected] > wrote: > This sort of work sounds much more like a Hadoop/Hive/Pig type of analysis. > > What are your latency requirements on queries? Are they ad-hoc or part of > an application? What is the case where you would need to change an existing > value? If it is write once, then Hadoop/Hive is great, if it changes > randomly, then not so much. > > Cassandra has limitations that it does not support aggregation, that must > be done by a client. In my experience it is really suited to quickly write > lots of data and look it back up in a "random io" type manner if you > already know the "key" you are looking for. > > If you have the high speed write and rewrite needs, but also the "full > data" analytical requirements, there are plugins for using C* as a backing > store for Pig/Hive. It is a little finicky to get working depending on all > your versions but does work fairly well in my limited experience. > > Perhaps with a little better understanding of your workload needs others > can chime in too. Good luck. > > -Thunder > > > > On Jan 9, 2014, at 5:15 AM, Naresh Yadav <[email protected]> wrote: > > > > Hi all, > > > > I have a use case with huge data which i am not able to design in > cassandra. > > > > Table name : MetricResult > > > > Sample Data : > > > > Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pen, > Value=10 > > Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pencil, Value=20 > > Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pen, Value=30 > > Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pencil, Value=10 > > Metric=Sales, Time=Month, Period=Feb-10, Tag=India, > Value=90 > > Metric=Sales, Time=Year, Period=2010, Tag=U.S.A, > Value=70 > > Metric=Cost, Time=Year, Period=2010, Tag=CPU, > Value=8000 > > Metric=Cost, Time=Year, Period=2010, Tag=RAM, > Value=4000 > > Metric=Cost, Time=Year Period=2011, Tag=CPU, > Value=9000 > > Metric=Resource, Time=Week Period=Week1-2013, > Value=100 > > > > So in above case i have case of > > TimeSeries data i.e Time,Period column > > Dynamic columns i.e Tag column > > Indexing on dynamic columns i.e Tag column > > Aggregations SUM, AVERAGE > > Same value comes again for a Metric, Time, Period, Tag then > overwrite it > > > > Queries i need to support : > > -------------------------------------- > > a)Give data for Metric=Sales AND Time=Month > > O/P : 5 rows > > b)Give data for Metric=Sales AND Time=Month AND Period=Jan-10 > > O/P : 2 rows > > c)Give data for Metric=Sales AND Tag=U.S.A > > O/P : 5 rows > > d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen > > O/P :1 row > > > > > > This table can have TB's of data and for a Metric,Period can have > millions of rows. > > > > Please give suggestion to design/model this table in Cassandra. If some > limitation in Cassandra then suggest best technology to handle this. > > > > > > Thanks > > Naresh >
