Re: setTimeRange for HBase Increment

Gary Helmling Tue, 04 Oct 2011 10:37:17 -0700

Jameson,

The TimeRange you set on the Increment is used in looking up the previous
value that you'll be incrementing.  It's not stored with the incremented
value as a data "lifetime" or anything.  If a previously stored value is
found within the given time range, it will be incremented.  If no value is
found within that range, a new value is stored with using the value from
your Increment.


As other have already covered, if you're looking for auto-cleanup of data
you would set a TTL on the column family.

So let me tweak your scenario a bit to explain how it might work:

0) Say you have a previous value on column "c1" of 2, last incremented 31
days ago

1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
days, maxStamp = now

2) There is now a new version of "c1", with value=1, timestamp=now.  The
previous version, with value=2, timestamp=now - 31 days, still exists and
may be automatically cleaned up, subject to your settings for max versions
and TTL.  So you would have:

c1:
  - v2: ts=now, value=1
  - v1: ts=now-31days, value=2

3) Reading the current value of "c1" will return 1

4a) If you repeat step #1 in 31 days from now, you would wind up with a
third version of "c1", again with value=1:

c1:
  - v3: ts=now, value=1
  - v2: ts=now-31days, value=1
  - v1: ts=now-62days, value=2

4b) If you instead repeat step #1 31 days from now, but using minStamp=now -
60 days, maxStamp=now, then you would be incrementing the existing "v2" of
"c1", since it falls within the time range:

c1:
  - v2: ts=now, value=2
  - v1: ts=now-62days, value=2


I hope this clarifies things.

--gh


On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp <[email protected]> wrote:

> Thanks! Nevertheless, can anyone confirm / deny if the scenario I described
> would play out in that manner? Just want to make sure I understand the
> functionality.
>
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>
> On 09/29/2011 03:32 PM, Doug Meil wrote:
>
>>
>> Here are a few links on table cleanup and major compactions...
>>
>> http://hbase.apache.org/book.**html#schema.minversions<http://hbase.apache.org/book.html#schema.minversions>
>>   (ttl related)
>>
>> http://hbase.apache.org/book.**html#perf.deleting.queue<http://hbase.apache.org/book.html#perf.deleting.queue>
>>
>> http://hbase.apache.org/book.**html#compaction<http://hbase.apache.org/book.html#compaction>
>>
>>
>>
>>
>>
>> On 9/29/11 2:29 PM, "Ted Yu"<[email protected]>  wrote:
>>
>>  Doug Meil may point you to related doc.
>>>
>>> Take a look at this as well:
>>> https://issues.apache.org/**jira/browse/HBASE-4241<https://issues.apache.org/jira/browse/HBASE-4241>
>>>
>>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<[email protected]>
>>>  wrote:
>>>
>>>  Hm, well I didn't mention a number of other requirements for the feature
>>>> I'm building, but long story short, I need to keep track of millions to
>>>> billions of these counters and need the lookup time to be as close to
>>>> constant time as possible, thus I was really hoping to avoid doing table
>>>> scans.
>>>>
>>>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>>>> article / documentation I could read about it? Google wasn't very
>>>> helpful.
>>>>
>>>>
>>>> --
>>>> Jameson Lopp
>>>> Software Engineer
>>>> Bronto Software, Inc
>>>>
>>>>
>>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>>>
>>>>  My advice usually regarding timestamps is if it's part of your data
>>>>> model, it should appear somewhere in an HBase key. 99% of the time
>>>>> overloading the HBase timestamps is a bad idea, especially with
>>>>> counters since there's auto-pruning done in the Memstore!
>>>>>
>>>>> I would suggest you make time part of your row key, maybe one counter
>>>>> per day, and then set the TTL on your table to 30 days. Then all you
>>>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>>>> that refers to some event id.
>>>>>
>>>>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<[email protected]>
>>>>>  wrote:
>>>>>
>>>>>  I wish to store a count of 30-day trailing event data (e.g. # of
>>>>>> clicks
>>>>>> in
>>>>>> past 30 days) and ended up reading the documentation for setTimeRange
>>>>>> in
>>>>>> the
>>>>>> Increment operation.
>>>>>> http://hbase.apache.org/****apidocs/org/apache/hadoop/**<http://hbase.apache.org/**apidocs/org/apache/hadoop/**>
>>>>>>
>>>>>> hbase/client/Increment.html#****getTimeRange%28%29<http://**
>>>>>> hbase.apache.or <http://hbase.apache.or>
>>>>>> g/apidocs/org/apache/hadoop/**hbase/client/Increment.html#**
>>>>>> getTimeRange%28
>>>>>> %29>
>>>>>>
>>>>>> I was hoping someone could clarify if it works as I'm imagining in
>>>>>> this
>>>>>> example scenario.
>>>>>>
>>>>>> 1) Current click count is 0
>>>>>>
>>>>>> 2) I process a click and I perform an increment operation with the
>>>>>> time
>>>>>> range set to minStamp = now and maxStamp = 30 days from now
>>>>>>
>>>>>> 3) I query for the value immediately and find it to be 1
>>>>>>
>>>>>> 4) Assuming no other clicks come in, if I query for the value in 31
>>>>>> days,
>>>>>> it
>>>>>> will be returned as 0
>>>>>>
>>>>>> In essence, I'm looking for a way to set a TTL on my increment
>>>>>> operation.
>>>>>> Is
>>>>>> this how it actually works? The documentation is a bit vague and I
>>>>>> could
>>>>>> imagine several other scenarios.
>>>>>> --
>>>>>> Jameson Lopp
>>>>>> Software Engineer
>>>>>> Bronto Software, Inc
>>>>>>
>>>>>>
>>>>>>
>>

Re: setTimeRange for HBase Increment

Reply via email to