Here are a few links on table cleanup and major compactions...

http://hbase.apache.org/book.html#schema.minversions   (ttl related)

http://hbase.apache.org/book.html#perf.deleting.queue

http://hbase.apache.org/book.html#compaction





On 9/29/11 2:29 PM, "Ted Yu" <[email protected]> wrote:

>Doug Meil may point you to related doc.
>
>Take a look at this as well:
>https://issues.apache.org/jira/browse/HBASE-4241
>
>On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp <[email protected]> wrote:
>
>> Hm, well I didn't mention a number of other requirements for the feature
>> I'm building, but long story short, I need to keep track of millions to
>> billions of these counters and need the lookup time to be as close to
>> constant time as possible, thus I was really hoping to avoid doing table
>> scans.
>>
>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>> article / documentation I could read about it? Google wasn't very
>>helpful.
>>
>>
>> --
>> Jameson Lopp
>> Software Engineer
>> Bronto Software, Inc
>>
>>
>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>
>>> My advice usually regarding timestamps is if it's part of your data
>>> model, it should appear somewhere in an HBase key. 99% of the time
>>> overloading the HBase timestamps is a bad idea, especially with
>>> counters since there's auto-pruning done in the Memstore!
>>>
>>> I would suggest you make time part of your row key, maybe one counter
>>> per day, and then set the TTL on your table to 30 days. Then all you
>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>> that refers to some event id.
>>>
>>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>>
>>> J-D
>>>
>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<[email protected]>
>>>  wrote:
>>>
>>>> I wish to store a count of 30-day trailing event data (e.g. # of
>>>>clicks
>>>> in
>>>> past 30 days) and ended up reading the documentation for setTimeRange
>>>>in
>>>> the
>>>> Increment operation.
>>>> http://hbase.apache.org/**apidocs/org/apache/hadoop/**
>>>> 
>>>>hbase/client/Increment.html#**getTimeRange%28%29<http://hbase.apache.or
>>>>g/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28
>>>>%29>
>>>>
>>>> I was hoping someone could clarify if it works as I'm imagining in
>>>>this
>>>> example scenario.
>>>>
>>>> 1) Current click count is 0
>>>>
>>>> 2) I process a click and I perform an increment operation with the
>>>>time
>>>> range set to minStamp = now and maxStamp = 30 days from now
>>>>
>>>> 3) I query for the value immediately and find it to be 1
>>>>
>>>> 4) Assuming no other clicks come in, if I query for the value in 31
>>>>days,
>>>> it
>>>> will be returned as 0
>>>>
>>>> In essence, I'm looking for a way to set a TTL on my increment
>>>>operation.
>>>> Is
>>>> this how it actually works? The documentation is a bit vague and I
>>>>could
>>>> imagine several other scenarios.
>>>> --
>>>> Jameson Lopp
>>>> Software Engineer
>>>> Bronto Software, Inc
>>>>
>>>>

Reply via email to