The choice among the 3 options you list depends upon your read (ie, scan)
pattern.  Is is limited by a time-window?

If so, I like opt 2 the most, because once you roll into a new month, you
can compact the previous month and put it away. The data is no longer
scanned over and over again during compaction or reads.

Between opt 1 or opt 3, as you point out opt 1 will suffer from serious
write-amplification during compaction, but so will opt 2.

Another approach is to split based on time and keep the hfile size small
(so that you have smaller regions, eg, one for each day). Then you have
fewer tables, but also smaller compactions.  Can you split like that?


On Thu, Jun 21, 2012 at 7:19 AM, Frédéric Fondement <
[email protected]> wrote:

> Hi all !
>
> Before I start, I'd like to have some feedback about TTL performance in
> HBase.
>
> My use case is the following. I have constantly data coming in the base
> (i.e. a write-instensive application). This data should be kept during a
> certain amount of time, either 3, 6, 12... monthes, depending on some
> external conditions. I can live with some data registered to live only 3
> monthes even if conditions eventually change to 6 months.
>
> I can see three options here:
>
>     opt. 1: indexing in a secondary table using salted timestamp as a key
> (this is not a problem in my case)
>     opt. 2: creating different tables like 'to-be-destroyed-in-august-**2012',
> 'to be destroyed-in-june-2012'... and then merely killing them with a cron
> job
>     opt. 3: creating tables like 'to-be-destroyed-in-3-monthes' (with a 3
> monthes TTL), 'to-be-destroyed-in-6-monthes' (with a 6 monthes TTL)...
>
> What do you think is the most efficient ?
> opt1. overloads a little bit more my already write intensive context
> opt2. looks nice (regarding deletion), but to read, I need to scan at
> least 12 different tables, and each month, my data will be buffered during
> table creation (and region splitting ! which I still don't really know how
> to choose split keys)
> opt3. looks the nicest (only 3-4 tables to scan when reading), but won't
> my daily major compact become crazy ?
>
> It would be great having some clue before doing the job :-) !
>
> Best regards,
>
> Frédéric.
>
>

Reply via email to