I like this idea. St.Ack
On Thu, May 27, 2010 at 6:29 PM, Andrew Purtell <[email protected]> wrote: > We could put a hook out of that iterator up into RegionObserver (HBASE-2001), > for example. > > Currently the observer only gets notified that a compaction has happened. > > - Andy > >> From: Jonathan Gray <[email protected]> >> Subject: RE: Custom compaction >> To: "[email protected]" <[email protected]> >> Date: Thursday, May 27, 2010, 6:21 AM >> And of course, HBase is open source >> so you can hack it up to do what you want :) >> >> The compaction API basically has an iterator of KeyValues >> as input and then returns KeyValues as well. >> >> > -----Original Message----- >> > From: Friso van Vollenhoven [mailto:[email protected]] >> > Sent: Thursday, May 27, 2010 1:34 AM >> > To: [email protected] >> > Subject: Re: Custom compaction >> > >> > Hi, >> > >> > Actually, for us it would be nice to be able to hook >> > into the compaction, too. >> > >> > We store records that are basically events that occur >> at certain times. >> > We store the record itself as qualifier and a timeline >> as column value >> > (so multiple records+timelines per row key is >> possible). So when a new >> > record comes in, we do a get for the timeline, merge >> the new timestamp >> > with the existing timeline in memory and do a put to >> update the column >> > value with the new timeline. >> > >> > In our first version, we just wrote the individual >> timestamps as values >> > and used versioning to keep all timestamps in the >> value. Then we >> > combined all the timelines and individual timestamp >> into a single >> > timeline in memory on each read. We ran a MR job >> periodically to do the >> > timeline combining in the table and delete the >> obsolete timestamps in >> > order to keep read performance OK (because otherwise >> the read operation >> > would involve a lot of additional work to create a >> timeline and lots of >> > versions would be created). In the end, the deletes in >> the MR job were >> > a bottleneck (as I understand, but I was not on the >> project at that >> > moment). >> > >> > Now, if we could hook into the compactions, then we >> could just always >> > insert individual timestamps as new versions and do >> the combining of >> > versions into a single timeline during compaction (as >> compaction needs >> > to go through the complete table anyway). This would >> also improve our >> > insertion performance (no more gets in there, just >> puts like in the >> > first version), which is nice. We collect internet >> routing information, >> > which is collected at 80 million records per day with >> updates coming in >> > in batches every 5 minutes (http://ris.ripe.net). We'd like to try to >> > be efficient before just throwing more machines at the >> problem. >> > >> > Will there be anything like this on the roadmap? >> > >> > >> > Cheers, >> > Friso >> > >> > >> > >> > On May 27, 2010, at 1:01 AM, Jean-Daniel Cryans >> wrote: >> > >> > > Invisible. What's your need? >> > > >> > > J-D >> > > >> > > On Wed, May 26, 2010 at 3:56 PM, Vidhyashankar >> Venkataraman >> > > <[email protected]> >> wrote: >> > >> Is there a way to customize the compaction >> function (like a hook >> > provided by the API) or is it invisible to the user? >> > >> >> > >> Thank you >> > >> Vidhya >> > >> >> >> > > > > >
