Hi all !
Before I start, I'd like to have some feedback about TTL performance in
HBase.
My use case is the following. I have constantly data coming in the base
(i.e. a write-instensive application). This data should be kept during a
certain amount of time, either 3, 6, 12... monthes, depending on some
external conditions. I can live with some data registered to live only 3
monthes even if conditions eventually change to 6 months.
I can see three options here:
opt. 1: indexing in a secondary table using salted timestamp as a
key (this is not a problem in my case)
opt. 2: creating different tables like
'to-be-destroyed-in-august-2012', 'to be destroyed-in-june-2012'... and
then merely killing them with a cron job
opt. 3: creating tables like 'to-be-destroyed-in-3-monthes' (with a
3 monthes TTL), 'to-be-destroyed-in-6-monthes' (with a 6 monthes TTL)...
What do you think is the most efficient ?
opt1. overloads a little bit more my already write intensive context
opt2. looks nice (regarding deletion), but to read, I need to scan at
least 12 different tables, and each month, my data will be buffered
during table creation (and region splitting ! which I still don't really
know how to choose split keys)
opt3. looks the nicest (only 3-4 tables to scan when reading), but won't
my daily major compact become crazy ?
It would be great having some clue before doing the job :-) !
Best regards,
Frédéric.