@Thunder I just came to know about (CASSANDRA-4511<https://issues.apache.org/jira/browse/CASSANDRA-4511>) which allows Index on Collections and that will be part of release 2.1. I hope in that case my problem will be solved by changing your designed table with tag column as set<text> and defining secondary index on it. Is there any risk of performance problem of this design keeping in mind huge data ???
Naresh On Fri, Jan 10, 2014 at 10:26 AM, Naresh Yadav <[email protected]> wrote: > @Thunder thanks for suggesting design but my main problem is > indexing/quering dynamic Tag on each row that is main context of each row > and most of queries will include that.. > > As an alternative to cassandra, i tried Apache Blur, in blur table i am > able to store exact same data and all queries also worked..so blur allows > dynamic indexing of tag column BUT moving away from cassandra, i am > loosing its strength because of that i am not confident on this decision as > data will be huge in my case. > > Please guide me on this with better suggestions. > > Thanks > Naresh > > On Fri, Jan 10, 2014 at 2:33 AM, Thunder Stumpges < > [email protected]> wrote: > >> Well I think you have essentially time-series data, which C* should >> handle well, however I think your "Tag" column is going to cause troubles. >> C* does have collection columns, but they are not indexable nor usable in >> WHERE clause. Your example has both the uniqueness of the data (primary >> key) and query filtering on potentially multiple "Tag" columns. That is not >> supported in C* AFAIK.If it were a single Tag, that could be a column that >> is Indexed possibly. >> >> Ignoring that issue with the many different Tags, You could model the >> table as: >> >> CREATE TABLE metric_data ( >> metric text, >> time text, >> period text, >> tag text, >> value int, >> PRIMARY KEY( (metric,time), period, tag) >> ) >> >> That would make a composite partitioning key on metric and time meaning >> you'd always have to pass those (or else randomly page via TOKEN through >> all rows). After specifying metric and time, you could optionally also >> specify period and/or tag, and results would be ordered (clustered) by >> period. This would satisfy your queries a,b, and d but not c (as you did >> not specify time). If Time was a granularity column, does it even make >> sense to return records across differing time values? What does it mean to >> return the 4 month rows and 1 year row in your example? Could you issue N >> queries in this case (where N is a small number of each of your time >> granularities) ? >> >> I'm not sure how close that gets you, or if you can re-work your concept >> of Tag at all. >> Good luck. >> Thunder >> >> >> >> On Thu, Jan 9, 2014 at 10:45 AM, Hannu Kröger <[email protected]> wrote: >> >>> To my eye that looks something what the traditional analytics systems >>> do. You can check out e.g. Acunu Analytics which uses Cassandra as a >>> backend. >>> >>> Cheers, >>> Hannu >>> >>> >>> 2014/1/9 Naresh Yadav <[email protected]> >>> >>>> Hi all, >>>> >>>> I have a use case with huge data which i am not able to design in >>>> cassandra. >>>> >>>> Table name : MetricResult >>>> >>>> Sample Data : >>>> >>>> Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pen, >>>> Value=10 >>>> Metric=Sales, Time=Month, Period=Jan-10, Tag=U.S.A, Tag=Pencil, >>>> Value=20 >>>> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pen, >>>> Value=30 >>>> Metric=Sales, Time=Month, Period=Feb-10, Tag=U.S.A, Tag=Pencil, >>>> Value=10 >>>> Metric=Sales, Time=Month, Period=Feb-10, Tag=India, >>>> Value=90 >>>> Metric=Sales, Time=Year, Period=2010, Tag=U.S.A, >>>> Value=70 >>>> Metric=Cost, Time=Year, Period=2010, Tag=CPU, >>>> Value=8000 >>>> Metric=Cost, Time=Year, Period=2010, Tag=RAM, >>>> Value=4000 >>>> Metric=Cost, Time=Year Period=2011, Tag=CPU, >>>> Value=9000 >>>> Metric=Resource, Time=Week Period=Week1-2013, >>>> Value=100 >>>> >>>> So in above case i have case of >>>> TimeSeries data i.e Time,Period column >>>> Dynamic columns i.e Tag column >>>> Indexing on dynamic columns i.e Tag column >>>> Aggregations SUM, AVERAGE >>>> Same value comes again for a Metric, Time, Period, Tag then >>>> overwrite it >>>> >>>> Queries i need to support : >>>> -------------------------------------- >>>> a)Give data for Metric=Sales AND Time=Month >>>> O/P : 5 rows >>>> b)Give data for Metric=Sales AND Time=Month AND Period=Jan-10 >>>> O/P : 2 rows >>>> c)Give data for Metric=Sales AND Tag=U.S.A >>>> O/P : 5 rows >>>> d)Give data for Metric=Sales AND Period=Jan-10 AND Tag=U.S.A AND Tag=Pen >>>> O/P :1 row >>>> >>>> >>>> This table can have TB's of data and for a Metric,Period can have >>>> millions of rows. >>>> >>>> Please give suggestion to design/model this table in Cassandra. If some >>>> limitation in Cassandra then suggest best technology to handle this. >>>> >>>> >>>> Thanks >>>> Naresh >>>> >>> >>> >> > > >
