>> And of course, HBase is open source so you can hack it up to do what you >> want :)
:) That was what we had been thinking too.. Thanks Friso! Your problem is quite similar to the problem that we our having: We also want to perform a content-based merge i.e., merging a row only after performing a read operation on one or more values in the row.. Vidhya > -----Original Message----- > From: Friso van Vollenhoven [mailto:[email protected]] > Sent: Thursday, May 27, 2010 1:34 AM > To: [email protected] > Subject: Re: Custom compaction > > Hi, > > Actually, for us it would be nice to be able to hook into the > compaction, too. > > We store records that are basically events that occur at certain times. > We store the record itself as qualifier and a timeline as column value > (so multiple records+timelines per row key is possible). So when a new > record comes in, we do a get for the timeline, merge the new timestamp > with the existing timeline in memory and do a put to update the column > value with the new timeline. > > In our first version, we just wrote the individual timestamps as values > and used versioning to keep all timestamps in the value. Then we > combined all the timelines and individual timestamp into a single > timeline in memory on each read. We ran a MR job periodically to do the > timeline combining in the table and delete the obsolete timestamps in > order to keep read performance OK (because otherwise the read operation > would involve a lot of additional work to create a timeline and lots of > versions would be created). In the end, the deletes in the MR job were > a bottleneck (as I understand, but I was not on the project at that > moment). > > Now, if we could hook into the compactions, then we could just always > insert individual timestamps as new versions and do the combining of > versions into a single timeline during compaction (as compaction needs > to go through the complete table anyway). This would also improve our > insertion performance (no more gets in there, just puts like in the > first version), which is nice. We collect internet routing information, > which is collected at 80 million records per day with updates coming in > in batches every 5 minutes (http://ris.ripe.net). We'd like to try to > be efficient before just throwing more machines at the problem. > > Will there be anything like this on the roadmap? > > > Cheers, > Friso > > > > On May 27, 2010, at 1:01 AM, Jean-Daniel Cryans wrote: > > > Invisible. What's your need? > > > > J-D > > > > On Wed, May 26, 2010 at 3:56 PM, Vidhyashankar Venkataraman > > <[email protected]> wrote: > >> Is there a way to customize the compaction function (like a hook > provided by the API) or is it invisible to the user? > >> > >> Thank you > >> Vidhya > >>
