Hi everyone,

When I do a scan on a table with about 700 rows (about 50 columns each), the 
RegionServers will systematically go offline one at a time until all the 
RegionServers are offline. This is probably due to there not being enough 
memory available for the RegionServer processes (we are working with sub-1G for 
our max heap size on our test clusters atm).

Increasing the max heap size for the RegionServers alleviates this problem. 
However, my concern is that this kind of cascading failure occurs on production 
with large datasets even with a larger heap size.

What steps can I take to prevent this kind of cascading error? Is there a way 
to configure RegionServers to return an error instead of just failing (and 
causing HBase Master to hand the task to the next available RegionServer)?

Thanks,
Mark

Reply via email to