Thanks! Nevertheless, can anyone confirm / deny if the scenario I
described would play out in that manner? Just want to make sure I
understand the functionality.
--
Jameson Lopp
Software Engineer
Bronto Software, Inc
On 09/29/2011 03:32 PM, Doug Meil wrote:
Here are a few links on table cleanup and major compactions...
http://hbase.apache.org/book.html#schema.minversions (ttl related)
http://hbase.apache.org/book.html#perf.deleting.queue
http://hbase.apache.org/book.html#compaction
On 9/29/11 2:29 PM, "Ted Yu"<[email protected]> wrote:
Doug Meil may point you to related doc.
Take a look at this as well:
https://issues.apache.org/jira/browse/HBASE-4241
On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<[email protected]> wrote:
Hm, well I didn't mention a number of other requirements for the feature
I'm building, but long story short, I need to keep track of millions to
billions of these counters and need the lookup time to be as close to
constant time as possible, thus I was really hoping to avoid doing table
scans.
I'll admit I know nothing of the dangers of auto-pruning; is there an
article / documentation I could read about it? Google wasn't very
helpful.
--
Jameson Lopp
Software Engineer
Bronto Software, Inc
On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
My advice usually regarding timestamps is if it's part of your data
model, it should appear somewhere in an HBase key. 99% of the time
overloading the HBase timestamps is a bad idea, especially with
counters since there's auto-pruning done in the Memstore!
I would suggest you make time part of your row key, maybe one counter
per day, and then set the TTL on your table to 30 days. Then all you
need to do is a sequential scan for those 30 days maybe with a prefix
that refers to some event id.
OpenTSDB is another way of doing it: http://opentsdb.net/
J-D
On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<[email protected]>
wrote:
I wish to store a count of 30-day trailing event data (e.g. # of
clicks
in
past 30 days) and ended up reading the documentation for setTimeRange
in
the
Increment operation.
http://hbase.apache.org/**apidocs/org/apache/hadoop/**
hbase/client/Increment.html#**getTimeRange%28%29<http://hbase.apache.or
g/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28
%29>
I was hoping someone could clarify if it works as I'm imagining in
this
example scenario.
1) Current click count is 0
2) I process a click and I perform an increment operation with the
time
range set to minStamp = now and maxStamp = 30 days from now
3) I query for the value immediately and find it to be 1
4) Assuming no other clicks come in, if I query for the value in 31
days,
it
will be returned as 0
In essence, I'm looking for a way to set a TTL on my increment
operation.
Is
this how it actually works? The documentation is a bit vague and I
could
imagine several other scenarios.
--
Jameson Lopp
Software Engineer
Bronto Software, Inc