Hello In my hbase cluster, I observe the following consistently happening over several days:-
- There is a spike in compaction time avg time metric. At the same time the swap bytes in and swap bytes out also have higher value. - Around the same time, I see the FS PRead and FS Read latencies and client latencies doing random reads increase. My hbase cluster consisting of 16 nodes and setup with a replication to another cluster of 16 nodes has the following workload:- - There are around 4 tables which have lot of write activity(around 500k per second writes on m1/m15 moving average). 2 of these tables have atomic counter columns keeping track of some analytics data and being incremented with every write. - There are 2 tables which receive bulk uploaded data periodically(around once a day) - We expect reads at around 100k per second mainly from tables which have bulk upload data and the one which has counter columns. The read latencies(p99) spike up to around 1000-5000 ms when the above compaction time avg time metric increases. In other times, they are below 100 ms. I have set the hbase.hregion.majorcompaction to 0 on region servers; I plan to set it to 0 on master nodes too so that I can take out the possibility of time triggered major compactions being the problem. But I suspect there are lot of minor compactions and those leading to major compactions happening at the time of spikes. *Any suggestions on how to avoid this situation of read latency spikes and have better read performance?* Thanks, Girish.
