@thunder It will be write once 80% of time but there can be cases client makes correction in data and then we need to overwrite that......
Thanks Naresh On Thu, Jan 9, 2014 at 11:49 PM, Naresh Yadav <[email protected]> wrote: > @thunder thanks for guidance queries will be fired by application on this > table when users login and browse the application and also through mobile > apps through webservice. Response needs to be quick as user will be doing > analysis over this data on the fly. Writes also needs to be fast as there > is time limit we need to show this data to user everyday. > > Aggregation we can build in application outside cassandra. But we are not > clear what table we should design in cassandra for the queries we > need..Please give guidance on the possible design to handle dynamic tags > indexing for queries.. > > Thanks > Naresh > > > > On Thu, Jan 9, 2014 at 9:41 PM, Thunder Stumpges < > [email protected]> wrote: > >> This sort of work sounds much more like a Hadoop/Hive/Pig type of >> analysis. >> >> What are your latency requirements on queries? Are they ad-hoc or part of >> an application? What is the case where you would need to change an existing >> value? If it is write once, then Hadoop/Hive is great, if it changes >> randomly, then not so much. >> >> Cassandra has limitations that it does not support aggregation, that must >> be done by a client. In my experience it is really suited to quickly write >> lots of data and look it back up in a "random io" type manner if you >> already know the "key" you are looking for. >> >> If you have the high speed write and rewrite needs, but also the "full >> data" analytical requirements, there are plugins for using C* as a backing >> store for Pig/Hive. It is a little finicky to get working depending on all >> your versions but does work fairly well in my limited experience. >> >> Perhaps with a little better understanding of your workload needs others >> can chime in too. Good luck. >> >> -Thunder >> >> >> > On Jan 9, 2014, at 5:15 AM, Naresh Yadav <[email protected]> wrote: >> > >> > Hi all, >> > >> > I have a use case with huge data which i am not able to design in >> cassandra. >> > >> > Table name : MetricResult >> > >> > Sample Data : >> > >> > Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pen, >> Value=10 >> > Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pencil, >> Value=20 >> > Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pen, >> Value=30 >> > Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pencil, >> Value=10 >> > Metric=Sales, Time=Month, Period=Feb-10, Tag=India, >> Value=90 >> > Metric=Sales, Time=Year, Period=2010, Tag=U.S.A, >> Value=70 >> > Metric=Cost, Time=Year, Period=2010, Tag=CPU, >> Value=8000 >> > Metric=Cost, Time=Year, Period=2010, Tag=RAM, >> Value=4000 >> > Metric=Cost, Time=Year Period=2011, Tag=CPU, >> Value=9000 >> > Metric=Resource, Time=Week Period=Week1-2013, >> Value=100 >> > >> > So in above case i have case of >> > TimeSeries data i.e Time,Period column >> > Dynamic columns i.e Tag column >> > Indexing on dynamic columns i.e Tag column >> > Aggregations SUM, AVERAGE >> > Same value comes again for a Metric, Time, Period, Tag then >> overwrite it >> > >> > Queries i need to support : >> > -------------------------------------- >> > a)Give data for Metric=Sales AND Time=Month >> > O/P : 5 rows >> > b)Give data for Metric=Sales AND Time=Month AND Period=Jan-10 >> > O/P : 2 rows >> > c)Give data for Metric=Sales AND Tag=U.S.A >> > O/P : 5 rows >> > d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen >> > O/P :1 row >> > >> > >> > This table can have TB's of data and for a Metric,Period can have >> millions of rows. >> > >> > Please give suggestion to design/model this table in Cassandra. If some >> limitation in Cassandra then suggest best technology to handle this. >> > >> > >> > Thanks >> > Naresh >> > > > >
