And of course, HBase is open source so you can hack it up to do what you want :)

The compaction API basically has an iterator of KeyValues as input and then 
returns KeyValues as well. 

> -----Original Message-----
> From: Friso van Vollenhoven [mailto:[email protected]]
> Sent: Thursday, May 27, 2010 1:34 AM
> To: [email protected]
> Subject: Re: Custom compaction
> 
> Hi,
> 
> Actually, for us it would be nice to be able to hook into the
> compaction, too.
> 
> We store records that are basically events that occur at certain times.
> We store the record itself as qualifier and a timeline as column value
> (so multiple records+timelines per row key is possible). So when a new
> record comes in, we do a get for the timeline, merge the new timestamp
> with the existing timeline in memory and do a put to update the column
> value with the new timeline.
> 
> In our first version, we just wrote the individual timestamps as values
> and used versioning to keep all timestamps in the value. Then we
> combined all the timelines and individual timestamp into a single
> timeline in memory on each read. We ran a MR job periodically to do the
> timeline combining in the table and delete the obsolete timestamps in
> order to keep read performance OK (because otherwise the read operation
> would involve a lot of additional work to create a timeline and lots of
> versions would be created). In the end, the deletes in the MR job were
> a bottleneck (as I understand, but I was not on the project at that
> moment).
> 
> Now, if we could hook into the compactions, then we could just always
> insert individual timestamps as new versions and do the combining of
> versions into a single timeline during compaction (as compaction needs
> to go through the complete table anyway). This would also improve our
> insertion performance (no more gets in there, just puts like in the
> first version), which is nice. We collect internet routing information,
> which is collected at 80 million records per day with updates coming in
> in batches every 5 minutes (http://ris.ripe.net). We'd like to try to
> be efficient before just throwing more machines at the problem.
> 
> Will there be anything like this on the roadmap?
> 
> 
> Cheers,
> Friso
> 
> 
> 
> On May 27, 2010, at 1:01 AM, Jean-Daniel Cryans wrote:
> 
> > Invisible. What's your need?
> >
> > J-D
> >
> > On Wed, May 26, 2010 at 3:56 PM, Vidhyashankar Venkataraman
> > <[email protected]> wrote:
> >> Is there a way to customize the compaction function (like a hook
> provided by the API) or is it invisible to the user?
> >>
> >> Thank you
> >> Vidhya
> >>

Reply via email to