Hbase Data Model to purge old data.

Padmanaban Thu, 26 Jul 2012 06:31:43 -0700

We have the following use case:

Store telecom CDR data on a per subscriber basis
data is time series based and every record is per-subscriber based
comes in round the clock 
the expected volume of data would be around 300 million records/day. 
this data is to be queried 24/7 by an online system where the filters are
subscriber id and date range


Since the volume of data is huge, we have data retention policies to archive old
data on a daily basis. 
For example, if retention is set to 90 days, every day a offline process would
delete data from Hbase which is older than 90 days and archive it on tape.

The current HBase data model design is as follows:
Separate table for every day's data with row key as subscriber id: reason for
this is bulk delete of one days data within a big table is more expensive than
dropping a one day table
In this per-day-separate-table model, the load balancer will never get triggered
as the current days table is always in memory, and daughter regions will
continuously get assigned to same region server. This leads to a region server
hotspots.

Please feedback on whether the per-day-separate-table model is the best-practice
for this use case considering the data life cycle management requirement. If
yes, how do we solve the side effect of region server hotspot? If no, please
advice alternate model

Thanks in advance,
Padmanaban M

Hbase Data Model to purge old data.

Reply via email to