@Thunder thanks for suggesting design but my main problem is indexing/quering dynamic Tag on each row that is main context of each row and most of queries will include that..
As an alternative to cassandra, i tried Apache Blur, in blur table i am able to store exact same data and all queries also worked..so blur allows dynamic indexing of tag column BUT moving away from cassandra, i am loosing its strength because of that i am not confident on this decision as data will be huge in my case. Please guide me on this with better suggestions. Thanks Naresh On Fri, Jan 10, 2014 at 2:33 AM, Thunder Stumpges < [email protected]> wrote: > Well I think you have essentially time-series data, which C* should handle > well, however I think your "Tag" column is going to cause troubles. C* does > have collection columns, but they are not indexable nor usable in WHERE > clause. Your example has both the uniqueness of the data (primary key) and > query filtering on potentially multiple "Tag" columns. That is not > supported in C* AFAIK.If it were a single Tag, that could be a column that > is Indexed possibly. > > Ignoring that issue with the many different Tags, You could model the > table as: > > CREATE TABLE metric_data ( > metric text, > time text, > period text, > tag text, > value int, > PRIMARY KEY( (metric,time), period, tag) > ) > > That would make a composite partitioning key on metric and time meaning > you'd always have to pass those (or else randomly page via TOKEN through > all rows). After specifying metric and time, you could optionally also > specify period and/or tag, and results would be ordered (clustered) by > period. This would satisfy your queries a,b, and d but not c (as you did > not specify time). If Time was a granularity column, does it even make > sense to return records across differing time values? What does it mean to > return the 4 month rows and 1 year row in your example? Could you issue N > queries in this case (where N is a small number of each of your time > granularities) ? > > I'm not sure how close that gets you, or if you can re-work your concept > of Tag at all. > Good luck. > Thunder > > > > On Thu, Jan 9, 2014 at 10:45 AM, Hannu Kröger <[email protected]> wrote: > >> To my eye that looks something what the traditional analytics systems do. >> You can check out e.g. Acunu Analytics which uses Cassandra as a backend. >> >> Cheers, >> Hannu >> >> >> 2014/1/9 Naresh Yadav <[email protected]> >> >>> Hi all, >>> >>> I have a use case with huge data which i am not able to design in >>> cassandra. >>> >>> Table name : MetricResult >>> >>> Sample Data : >>> >>> Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pen, >>> Value=10 >>> Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pencil, Value=20 >>> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pen, Value=30 >>> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pencil, Value=10 >>> Metric=Sales, Time=Month, Period=Feb-10, Tag=India, >>> Value=90 >>> Metric=Sales, Time=Year, Period=2010, Tag=U.S.A, >>> Value=70 >>> Metric=Cost, Time=Year, Period=2010, Tag=CPU, >>> Value=8000 >>> Metric=Cost, Time=Year, Period=2010, Tag=RAM, >>> Value=4000 >>> Metric=Cost, Time=Year Period=2011, Tag=CPU, >>> Value=9000 >>> Metric=Resource, Time=Week Period=Week1-2013, >>> Value=100 >>> >>> So in above case i have case of >>> TimeSeries data i.e Time,Period column >>> Dynamic columns i.e Tag column >>> Indexing on dynamic columns i.e Tag column >>> Aggregations SUM, AVERAGE >>> Same value comes again for a Metric, Time, Period, Tag then >>> overwrite it >>> >>> Queries i need to support : >>> -------------------------------------- >>> a)Give data for Metric=Sales AND Time=Month >>> O/P : 5 rows >>> b)Give data for Metric=Sales AND Time=Month AND Period=Jan-10 >>> O/P : 2 rows >>> c)Give data for Metric=Sales AND Tag=U.S.A >>> O/P : 5 rows >>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen >>> O/P :1 row >>> >>> >>> This table can have TB's of data and for a Metric,Period can have >>> millions of rows. >>> >>> Please give suggestion to design/model this table in Cassandra. If some >>> limitation in Cassandra then suggest best technology to handle this. >>> >>> >>> Thanks >>> Naresh >>> >> >> >
