Well I think you have essentially time-series data, which C* should handle well, however I think your "Tag" column is going to cause troubles. C* does have collection columns, but they are not indexable nor usable in WHERE clause. Your example has both the uniqueness of the data (primary key) and query filtering on potentially multiple "Tag" columns. That is not supported in C* AFAIK.If it were a single Tag, that could be a column that is Indexed possibly.
Ignoring that issue with the many different Tags, You could model the table as: CREATE TABLE metric_data ( metric text, time text, period text, tag text, value int, PRIMARY KEY( (metric,time), period, tag) ) That would make a composite partitioning key on metric and time meaning you'd always have to pass those (or else randomly page via TOKEN through all rows). After specifying metric and time, you could optionally also specify period and/or tag, and results would be ordered (clustered) by period. This would satisfy your queries a,b, and d but not c (as you did not specify time). If Time was a granularity column, does it even make sense to return records across differing time values? What does it mean to return the 4 month rows and 1 year row in your example? Could you issue N queries in this case (where N is a small number of each of your time granularities) ? I'm not sure how close that gets you, or if you can re-work your concept of Tag at all. Good luck. Thunder On Thu, Jan 9, 2014 at 10:45 AM, Hannu Kröger <[email protected]> wrote: > To my eye that looks something what the traditional analytics systems do. > You can check out e.g. Acunu Analytics which uses Cassandra as a backend. > > Cheers, > Hannu > > > 2014/1/9 Naresh Yadav <[email protected]> > >> Hi all, >> >> I have a use case with huge data which i am not able to design in >> cassandra. >> >> Table name : MetricResult >> >> Sample Data : >> >> Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pen, Value=10 >> Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pencil, Value=20 >> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pen, Value=30 >> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pencil, Value=10 >> Metric=Sales, Time=Month, Period=Feb-10, Tag=India, >> Value=90 >> Metric=Sales, Time=Year, Period=2010, Tag=U.S.A, >> Value=70 >> Metric=Cost, Time=Year, Period=2010, Tag=CPU, >> Value=8000 >> Metric=Cost, Time=Year, Period=2010, Tag=RAM, >> Value=4000 >> Metric=Cost, Time=Year Period=2011, Tag=CPU, >> Value=9000 >> Metric=Resource, Time=Week Period=Week1-2013, >> Value=100 >> >> So in above case i have case of >> TimeSeries data i.e Time,Period column >> Dynamic columns i.e Tag column >> Indexing on dynamic columns i.e Tag column >> Aggregations SUM, AVERAGE >> Same value comes again for a Metric, Time, Period, Tag then >> overwrite it >> >> Queries i need to support : >> -------------------------------------- >> a)Give data for Metric=Sales AND Time=Month >> O/P : 5 rows >> b)Give data for Metric=Sales AND Time=Month AND Period=Jan-10 >> O/P : 2 rows >> c)Give data for Metric=Sales AND Tag=U.S.A >> O/P : 5 rows >> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen >> O/P :1 row >> >> >> This table can have TB's of data and for a Metric,Period can have >> millions of rows. >> >> Please give suggestion to design/model this table in Cassandra. If some >> limitation in Cassandra then suggest best technology to handle this. >> >> >> Thanks >> Naresh >> > >
