We were having the exact same problem when we were doing our own load testing with hbase. We
found that a memstore would reach its hbase.hstore.blockingStoreFiles limit or its
hbase.hregion.memstore.block.multiplier. Hitting either of those limits prevents writes to the
specific region and the client would have to pause until a compaction could come through and clean
stuff up.
However the biggest problem is that there would be a descent size compaction queue, we'd hit one of
those limits, and then get put on the *back* of the queue and would have to wait *minutes* before it
finally got to do the compaction we needed to stop the blocking. I created a jira to address the
issue HBASE-2646. There is a patch in the jira for 0.20.4 that creates a priority compaction queue
that greatly helped our problem. In fact we saw little to no pausing after applying the patch. In
the comments of the jira you can see some of the settings we used to prevent the problem without the
patch.
Apparently there is some work going on to do concurrent priority compaction (Jonathan Gray has been
working on it) but I haven't seen anything yet in hbase and don't know the time line. My personal
opinion is that we should integrate the patch into trunk and use it until the more advanced
compactions are implemented.
~Jeff
On 9/10/2010 2:27 AM, Jeff Hammerbacher wrote:
We've been brainstorming some ideas to "smooth out" these performance
lapses, so instead of getting a 10 second period of unavailability, you get
a 30 second period of slower performance, which is usually preferable.
Where is this brainstorming taking place? Could we open a JIRA issue to
capture the brainstorming in public and searchable fashion?
--
Jeff Whiting
Qualtrics Senior Software Engineer
[email protected]