Hi,

Have anyone ran into a case where a Region Server is hosting regions, in
which some regions are getting lots of write requests, and the rest gets
maye 1/1000 of the rate of write requests?

This leads to a situation where the HLog queue reaches its maxlogs limit
since, those HLogs containing the puts from slow-write regions are "stuck"
until the region will flush. Since those regions barely make it to their
256MB flush limit (our configuration), they won't flush. The HLogs queue
gets bigger due to the fast-write regions, until reaches the stress mode of
"We have too many logs".
This in turn flushes out lots of regions, many of them (about 100) are
ultra small (10k - 3mb). After 3 rounds like this, the compaction queue
gets very big....in the end the region server drops dead, and this load
somehow is moved to another RS, ...

We are running 0.94.7 with 30 RS.

I was wondering how did people handled a mix of slow-write-rate and
high-write-rate of regions in 1 RS? I was thinking of writing a customer
load balancer, which keeps tabs on the write request count and memstore
size, and move all the slow-write regions to 20% of cluster RS dedicated to
slow regions, thus releasing the fast write regions to work freely.

Since this issue is hammering our production, we're about to try to
shut-down the WAL, and risk losing some information in those slow-write
regions until we can come up with a better solution.

Any advice would be highly appreciated.

Oh - our rowkey is quite normal:
<customerId><bucket><Timestamp><uniqueId>

Thanks!

Reply via email to