Hi, Have anyone ran into a case where a Region Server is hosting regions, in which some regions are getting lots of write requests, and the rest gets maye 1/1000 of the rate of write requests?
This leads to a situation where the HLog queue reaches its maxlogs limit since, those HLogs containing the puts from slow-write regions are "stuck" until the region will flush. Since those regions barely make it to their 256MB flush limit (our configuration), they won't flush. The HLogs queue gets bigger due to the fast-write regions, until reaches the stress mode of "We have too many logs". This in turn flushes out lots of regions, many of them (about 100) are ultra small (10k - 3mb). After 3 rounds like this, the compaction queue gets very big....in the end the region server drops dead, and this load somehow is moved to another RS, ... We are running 0.94.7 with 30 RS. I was wondering how did people handled a mix of slow-write-rate and high-write-rate of regions in 1 RS? I was thinking of writing a customer load balancer, which keeps tabs on the write request count and memstore size, and move all the slow-write regions to 20% of cluster RS dedicated to slow regions, thus releasing the fast write regions to work freely. Since this issue is hammering our production, we're about to try to shut-down the WAL, and risk losing some information in those slow-write regions until we can come up with a better solution. Any advice would be highly appreciated. Oh - our rowkey is quite normal: <customerId><bucket><Timestamp><uniqueId> Thanks!