Here are a few links on table cleanup and major compactions... http://hbase.apache.org/book.html#schema.minversions (ttl related)
http://hbase.apache.org/book.html#perf.deleting.queue http://hbase.apache.org/book.html#compaction On 9/29/11 2:29 PM, "Ted Yu" <[email protected]> wrote: >Doug Meil may point you to related doc. > >Take a look at this as well: >https://issues.apache.org/jira/browse/HBASE-4241 > >On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp <[email protected]> wrote: > >> Hm, well I didn't mention a number of other requirements for the feature >> I'm building, but long story short, I need to keep track of millions to >> billions of these counters and need the lookup time to be as close to >> constant time as possible, thus I was really hoping to avoid doing table >> scans. >> >> I'll admit I know nothing of the dangers of auto-pruning; is there an >> article / documentation I could read about it? Google wasn't very >>helpful. >> >> >> -- >> Jameson Lopp >> Software Engineer >> Bronto Software, Inc >> >> >> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote: >> >>> My advice usually regarding timestamps is if it's part of your data >>> model, it should appear somewhere in an HBase key. 99% of the time >>> overloading the HBase timestamps is a bad idea, especially with >>> counters since there's auto-pruning done in the Memstore! >>> >>> I would suggest you make time part of your row key, maybe one counter >>> per day, and then set the TTL on your table to 30 days. Then all you >>> need to do is a sequential scan for those 30 days maybe with a prefix >>> that refers to some event id. >>> >>> OpenTSDB is another way of doing it: http://opentsdb.net/ >>> >>> J-D >>> >>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<[email protected]> >>> wrote: >>> >>>> I wish to store a count of 30-day trailing event data (e.g. # of >>>>clicks >>>> in >>>> past 30 days) and ended up reading the documentation for setTimeRange >>>>in >>>> the >>>> Increment operation. >>>> http://hbase.apache.org/**apidocs/org/apache/hadoop/** >>>> >>>>hbase/client/Increment.html#**getTimeRange%28%29<http://hbase.apache.or >>>>g/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28 >>>>%29> >>>> >>>> I was hoping someone could clarify if it works as I'm imagining in >>>>this >>>> example scenario. >>>> >>>> 1) Current click count is 0 >>>> >>>> 2) I process a click and I perform an increment operation with the >>>>time >>>> range set to minStamp = now and maxStamp = 30 days from now >>>> >>>> 3) I query for the value immediately and find it to be 1 >>>> >>>> 4) Assuming no other clicks come in, if I query for the value in 31 >>>>days, >>>> it >>>> will be returned as 0 >>>> >>>> In essence, I'm looking for a way to set a TTL on my increment >>>>operation. >>>> Is >>>> this how it actually works? The documentation is a bit vague and I >>>>could >>>> imagine several other scenarios. >>>> -- >>>> Jameson Lopp >>>> Software Engineer >>>> Bronto Software, Inc >>>> >>>>
