If the users cannot be using their own column families (TTL is associated
with HColumnDescriptor), you should try to figure out acceptable TTL for
these users.

On Wed, Mar 9, 2011 at 8:04 PM, Otis Gospodnetic <[email protected]
> wrote:

> Hi,
>
> For some reason there are suddenly lots of questions about purging old
> data.
> I'm looking at the same thing and was wondering:
>
> * In my case, the same table is shared by multiple users, each of which may
> have
> a different data retention policy.  Thus, I think I need to look at each
> and
> every row and check if it's considered "expired" and thus ready for
> deletion.
> Ideally, I'd associate a TTL when I Put a row and HBase would automagically
> remove it when its time is up, but I don't think TTLs per row are doable,
> and
> neither is automagical expiration, right?
>
> * Is the only option to have a column with the expiration timestamp, and
> have a
> nightly MR job that does a full table scan and purges all expired rows?
> Wouldn't that be *super* costly because *all* data would have to be read
> from
> disk just for this one thing?  And this would evict all good stuff from the
> OS
> cache (and maybe block cache and memstore?)  Is there a better way?
>
> * Are there specific recommendations for how to define tables to be able
>  to
> efficiently remove batches of rows on a regular basis?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>

Reply via email to