bq. job A that read from HBase and it takes about 30 minutes In your current practice, do you observe increase in duration for job A ? This would be an indication of whether the minor compactions have reduced the number of HFiles to acceptable level.
You should take a look at http://hbase.apache.org/book.html#hbase_metrics, especially 15.4.4.12. NumberOfStorefiles Cheers On Fri, Feb 21, 2014 at 10:55 AM, Chen Song <[email protected]> wrote: > Below is the brief description of our use case > > > * We use Cloudera CDH4 > * Our cluster has ~85 nodes and it stores one big table called "imps" > * imps table is pre-split into 5000 regions, hence each node (region > server) has about 60 regions. > * Each hour, we bulk load one hour worth of data in the volume of 40G, > with partitions generated based on region splits. We set TTL to 33 days so > total amount of data stored for that table is 33 * 40 * 24 = ~31T, assuming > major compaction works properly. Another indication of this process is > every our, each region will have its HFile count increased by 1. > * We have automatic major compaction disabled as suggested. > * There is an hourly job A that read from HBase and it takes about 30 > minutes, so we can't really keep HBase downtime for hours. > * The loading time + job A running time is about 45 minutes. So > effectively these is 15 minutes each hour HBase is not being used. > > The problem we have ran into is with compaction. > > > * The first thing we tried is to explicitly schedule major compaction > for the top (with most HFiles) 5 -10 regions per region server. This is > done every hour and the idea behind it is that we want to use the 15 > minutes to compact the heaviest regions and with the hope to cycle through > all regions in each RS in 6 - 12 hours. However, there was some problem. > > > * On CDH4, HBase major compaction can only be scheduled as > asynchronous and there is no way to trace it. > * It used to work fine but as data grows more and more, the > asynchronous major compaction took more and more time. > * Because of the above 2 facts, we see compaction queue piped up > and compaction never caught up. > * > Then we disabled the first option and we resorted to automatic minor > compaction. We let HBase to manage itself and it works so far. However, our > concern is still there, as data grows, will it have the same problem as the > first option? > > Let me know if you need further clarification or any questions. Thank you > very much in advance. > > Best, > > -- > Chen Song >
