Hi all !

Before I start, I'd like to have some feedback about TTL performance in HBase.

My use case is the following. I have constantly data coming in the base (i.e. a write-instensive application). This data should be kept during a certain amount of time, either 3, 6, 12... monthes, depending on some external conditions. I can live with some data registered to live only 3 monthes even if conditions eventually change to 6 months.

I can see three options here:

opt. 1: indexing in a secondary table using salted timestamp as a key (this is not a problem in my case) opt. 2: creating different tables like 'to-be-destroyed-in-august-2012', 'to be destroyed-in-june-2012'... and then merely killing them with a cron job opt. 3: creating tables like 'to-be-destroyed-in-3-monthes' (with a 3 monthes TTL), 'to-be-destroyed-in-6-monthes' (with a 6 monthes TTL)...

What do you think is the most efficient ?
opt1. overloads a little bit more my already write intensive context
opt2. looks nice (regarding deletion), but to read, I need to scan at least 12 different tables, and each month, my data will be buffered during table creation (and region splitting ! which I still don't really know how to choose split keys) opt3. looks the nicest (only 3-4 tables to scan when reading), but won't my daily major compact become crazy ?

It would be great having some clue before doing the job :-) !

Best regards,

Frédéric.

Reply via email to