Thanks, that makes sense. Unfortunately, it sounds like this feature is unable to solve my particular problem...
--
Jameson Lopp
Software Engineer
Bronto Software, Inc

On 10/04/2011 01:36 PM, Gary Helmling wrote:
Jameson,

The TimeRange you set on the Increment is used in looking up the previous
value that you'll be incrementing.  It's not stored with the incremented
value as a data "lifetime" or anything.  If a previously stored value is
found within the given time range, it will be incremented.  If no value is
found within that range, a new value is stored with using the value from
your Increment.

As other have already covered, if you're looking for auto-cleanup of data
you would set a TTL on the column family.

So let me tweak your scenario a bit to explain how it might work:

0) Say you have a previous value on column "c1" of 2, last incremented 31
days ago

1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
days, maxStamp = now

2) There is now a new version of "c1", with value=1, timestamp=now.  The
previous version, with value=2, timestamp=now - 31 days, still exists and
may be automatically cleaned up, subject to your settings for max versions
and TTL.  So you would have:

c1:
   - v2: ts=now, value=1
   - v1: ts=now-31days, value=2

3) Reading the current value of "c1" will return 1

4a) If you repeat step #1 in 31 days from now, you would wind up with a
third version of "c1", again with value=1:

c1:
   - v3: ts=now, value=1
   - v2: ts=now-31days, value=1
   - v1: ts=now-62days, value=2

4b) If you instead repeat step #1 31 days from now, but using minStamp=now -
60 days, maxStamp=now, then you would be incrementing the existing "v2" of
"c1", since it falls within the time range:

c1:
   - v2: ts=now, value=2
   - v1: ts=now-62days, value=2


I hope this clarifies things.

--gh


On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp<[email protected]>  wrote:

Thanks! Nevertheless, can anyone confirm / deny if the scenario I described
would play out in that manner? Just want to make sure I understand the
functionality.


--
Jameson Lopp
Software Engineer
Bronto Software, Inc

On 09/29/2011 03:32 PM, Doug Meil wrote:


Here are a few links on table cleanup and major compactions...

http://hbase.apache.org/book.**html#schema.minversions<http://hbase.apache.org/book.html#schema.minversions>
   (ttl related)

http://hbase.apache.org/book.**html#perf.deleting.queue<http://hbase.apache.org/book.html#perf.deleting.queue>

http://hbase.apache.org/book.**html#compaction<http://hbase.apache.org/book.html#compaction>





On 9/29/11 2:29 PM, "Ted Yu"<[email protected]>   wrote:

  Doug Meil may point you to related doc.

Take a look at this as well:
https://issues.apache.org/**jira/browse/HBASE-4241<https://issues.apache.org/jira/browse/HBASE-4241>

On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<[email protected]>
  wrote:

  Hm, well I didn't mention a number of other requirements for the feature
I'm building, but long story short, I need to keep track of millions to
billions of these counters and need the lookup time to be as close to
constant time as possible, thus I was really hoping to avoid doing table
scans.

I'll admit I know nothing of the dangers of auto-pruning; is there an
article / documentation I could read about it? Google wasn't very
helpful.


--
Jameson Lopp
Software Engineer
Bronto Software, Inc


On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:

  My advice usually regarding timestamps is if it's part of your data
model, it should appear somewhere in an HBase key. 99% of the time
overloading the HBase timestamps is a bad idea, especially with
counters since there's auto-pruning done in the Memstore!

I would suggest you make time part of your row key, maybe one counter
per day, and then set the TTL on your table to 30 days. Then all you
need to do is a sequential scan for those 30 days maybe with a prefix
that refers to some event id.

OpenTSDB is another way of doing it: http://opentsdb.net/

J-D

On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<[email protected]>
  wrote:

  I wish to store a count of 30-day trailing event data (e.g. # of
clicks
in
past 30 days) and ended up reading the documentation for setTimeRange
in
the
Increment operation.
http://hbase.apache.org/****apidocs/org/apache/hadoop/**<http://hbase.apache.org/**apidocs/org/apache/hadoop/**>

hbase/client/Increment.html#****getTimeRange%28%29<http://**
hbase.apache.or<http://hbase.apache.or>
g/apidocs/org/apache/hadoop/**hbase/client/Increment.html#**
getTimeRange%28
%29>

I was hoping someone could clarify if it works as I'm imagining in
this
example scenario.

1) Current click count is 0

2) I process a click and I perform an increment operation with the
time
range set to minStamp = now and maxStamp = 30 days from now

3) I query for the value immediately and find it to be 1

4) Assuming no other clicks come in, if I query for the value in 31
days,
it
will be returned as 0

In essence, I'm looking for a way to set a TTL on my increment
operation.
Is
this how it actually works? The documentation is a bit vague and I
could
imagine several other scenarios.
--
Jameson Lopp
Software Engineer
Bronto Software, Inc





Reply via email to