Timeseries analysis using Cassandra and partition by date period

Serega Sheypak Sat, 04 Apr 2015 04:05:11 -0700

Hi, I switched from HBase to Cassandra and try to find problem solution for
timeseries analysis on top Cassandra.
I have a entity named "Event".
"Event" has attributes:
user_id - a guy who triggered event
event_ts - when even happened
event_type - type of event
some_other_attr - some other attrs we don't care about right now.


The DDL for entity event looks this way:

CREATE TABLE user_plans (

  id timeuuid,
  user_id timeuuid,
  event_ts timestamp,
  event_type int,
  some_other_attr text

PRIMARY KEY (user_id, ends)
);

Table is "infinite", It would grow continuously during application lifetime.
I want to ask question:
Cassandra, give me all event where event_ts >= xxx and event_ts <=yyy.

Right now it would lead to full table scan.

There is a trick in HBase. HBase has table abstraction and HBase has Column
Family abstraction.
Column family should be declared in advance.
Column family - physically is a pack of HFiles ("SSTables in C*").
So I can easily add partitioning for my HBase table:
alter table hbase_events add column familiy '2015_01'
and store all 2015 January data to Column familiy named '2015_01'.

When I want to get January data, I would directly access column family
named '2015_01' and I won't massage all data in table, just this piece.

What is approach in C* in this case?
I have an idea create several tables: event_2015_01, event_2015_02, e.t.c.
but it looks rather ugly from my current understanding how it works.

Timeseries analysis using Cassandra and partition by date period

Reply via email to