Jameson, The TimeRange you set on the Increment is used in looking up the previous value that you'll be incrementing. It's not stored with the incremented value as a data "lifetime" or anything. If a previously stored value is found within the given time range, it will be incremented. If no value is found within that range, a new value is stored with using the value from your Increment.
As other have already covered, if you're looking for auto-cleanup of data you would set a TTL on the column family. So let me tweak your scenario a bit to explain how it might work: 0) Say you have a previous value on column "c1" of 2, last incremented 31 days ago 1) You perform an increment on "c1" with a value of 1, minStamp = now - 30 days, maxStamp = now 2) There is now a new version of "c1", with value=1, timestamp=now. The previous version, with value=2, timestamp=now - 31 days, still exists and may be automatically cleaned up, subject to your settings for max versions and TTL. So you would have: c1: - v2: ts=now, value=1 - v1: ts=now-31days, value=2 3) Reading the current value of "c1" will return 1 4a) If you repeat step #1 in 31 days from now, you would wind up with a third version of "c1", again with value=1: c1: - v3: ts=now, value=1 - v2: ts=now-31days, value=1 - v1: ts=now-62days, value=2 4b) If you instead repeat step #1 31 days from now, but using minStamp=now - 60 days, maxStamp=now, then you would be incrementing the existing "v2" of "c1", since it falls within the time range: c1: - v2: ts=now, value=2 - v1: ts=now-62days, value=2 I hope this clarifies things. --gh On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp <[email protected]> wrote: > Thanks! Nevertheless, can anyone confirm / deny if the scenario I described > would play out in that manner? Just want to make sure I understand the > functionality. > > > -- > Jameson Lopp > Software Engineer > Bronto Software, Inc > > On 09/29/2011 03:32 PM, Doug Meil wrote: > >> >> Here are a few links on table cleanup and major compactions... >> >> http://hbase.apache.org/book.**html#schema.minversions<http://hbase.apache.org/book.html#schema.minversions> >> (ttl related) >> >> http://hbase.apache.org/book.**html#perf.deleting.queue<http://hbase.apache.org/book.html#perf.deleting.queue> >> >> http://hbase.apache.org/book.**html#compaction<http://hbase.apache.org/book.html#compaction> >> >> >> >> >> >> On 9/29/11 2:29 PM, "Ted Yu"<[email protected]> wrote: >> >> Doug Meil may point you to related doc. >>> >>> Take a look at this as well: >>> https://issues.apache.org/**jira/browse/HBASE-4241<https://issues.apache.org/jira/browse/HBASE-4241> >>> >>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<[email protected]> >>> wrote: >>> >>> Hm, well I didn't mention a number of other requirements for the feature >>>> I'm building, but long story short, I need to keep track of millions to >>>> billions of these counters and need the lookup time to be as close to >>>> constant time as possible, thus I was really hoping to avoid doing table >>>> scans. >>>> >>>> I'll admit I know nothing of the dangers of auto-pruning; is there an >>>> article / documentation I could read about it? Google wasn't very >>>> helpful. >>>> >>>> >>>> -- >>>> Jameson Lopp >>>> Software Engineer >>>> Bronto Software, Inc >>>> >>>> >>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote: >>>> >>>> My advice usually regarding timestamps is if it's part of your data >>>>> model, it should appear somewhere in an HBase key. 99% of the time >>>>> overloading the HBase timestamps is a bad idea, especially with >>>>> counters since there's auto-pruning done in the Memstore! >>>>> >>>>> I would suggest you make time part of your row key, maybe one counter >>>>> per day, and then set the TTL on your table to 30 days. Then all you >>>>> need to do is a sequential scan for those 30 days maybe with a prefix >>>>> that refers to some event id. >>>>> >>>>> OpenTSDB is another way of doing it: http://opentsdb.net/ >>>>> >>>>> J-D >>>>> >>>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<[email protected]> >>>>> wrote: >>>>> >>>>> I wish to store a count of 30-day trailing event data (e.g. # of >>>>>> clicks >>>>>> in >>>>>> past 30 days) and ended up reading the documentation for setTimeRange >>>>>> in >>>>>> the >>>>>> Increment operation. >>>>>> http://hbase.apache.org/****apidocs/org/apache/hadoop/**<http://hbase.apache.org/**apidocs/org/apache/hadoop/**> >>>>>> >>>>>> hbase/client/Increment.html#****getTimeRange%28%29<http://** >>>>>> hbase.apache.or <http://hbase.apache.or> >>>>>> g/apidocs/org/apache/hadoop/**hbase/client/Increment.html#** >>>>>> getTimeRange%28 >>>>>> %29> >>>>>> >>>>>> I was hoping someone could clarify if it works as I'm imagining in >>>>>> this >>>>>> example scenario. >>>>>> >>>>>> 1) Current click count is 0 >>>>>> >>>>>> 2) I process a click and I perform an increment operation with the >>>>>> time >>>>>> range set to minStamp = now and maxStamp = 30 days from now >>>>>> >>>>>> 3) I query for the value immediately and find it to be 1 >>>>>> >>>>>> 4) Assuming no other clicks come in, if I query for the value in 31 >>>>>> days, >>>>>> it >>>>>> will be returned as 0 >>>>>> >>>>>> In essence, I'm looking for a way to set a TTL on my increment >>>>>> operation. >>>>>> Is >>>>>> this how it actually works? The documentation is a bit vague and I >>>>>> could >>>>>> imagine several other scenarios. >>>>>> -- >>>>>> Jameson Lopp >>>>>> Software Engineer >>>>>> Bronto Software, Inc >>>>>> >>>>>> >>>>>> >>
