Inline. J-D
> We're running a 33-regionserver hbase cluster on top of cdh3u0 suites. On > average, we have 2400 regions hosted > on each regionserver. (hbase.hregion.max.filesize is 1.5GB, and we have > value size up to 4MB per object). 2400 region is just too many, if you are importing data at a high rate (which might be the case with such fat values) and well distributed among those regions then you will be force flushing tons of small files everytime a region server needs to clean a log file. Try to set it to 100 per node, then disable splitting, and also set a bigger flush size on your table. > > I check the log of regionserver, it seems like the compaction queue size is > about 1700, and every the compaction action > takes about 1 minute, and more over, most of the compaction are triggered to > a major one. > > My question are, > > 1. Would this cause the performance degradation? It seems like "GET" action > in the interval that two minutes before/after > the compaction takes much longer time than usual. I thought the compaction > is a asynchronous operation. It's async, but still uses IO resources which may impact latency. Compacting also creates new blocks so the block cache is churning through invalid blocks. > 2. Any issue would cause long-term compaction? It's more like a systemic issue, everything influences everything else. > 3. It seems like HBASE-1476 is going to implement multi-threaded compaction, > I guess it would help to reduce > the size of compaction queue. But also generate more IO, in your case 1476 would probably not help at all.
