Runaway region splitting

Austin Heyne Tue, 05 Feb 2019 14:49:09 -0800

Hey all,

We've just recently completed a bulk load of data into a number oftables and once complete we restarted the cluster and migrated from EMR5.19 to 5.20 thus moving our HBase version from 1.4.7 to 1.4.8.Everything seemed stable at the time and there was no activity acrossthe cluster for a few days. Yesterday we activated our live ingestpipeline that is now pushing more data into those bulk loaded tables.What we're seeing now after about 24hrs of live ingest (~450 requests/sacross 80 regionservers) is that HBase is splitting regions like crazy.In the last 24hrs we've double our region count, going from 24k to 51kregions.

Looking at the tables, the regions being split are of relatively smallsize. One example table had 136 regions yesterday with about 8-10GB perregion. That table now has 1446 regions, each at 1-2GB but has onlygrown by ~700GB.

The current configuration we're using has the following values set whichI was under the impression would prevent this situation.


<property>
  <name>hbase.hregion.max.filesize</name>
  <value>21474836480</value>
</property>

<property>
  <name>hbase.regionserver.regionSplitLimit</name>
  <value>256</value>
</property>

<property>
  <name>hbase.regionserver.region.split.policy</name>
<value>org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy</value>
</property>

For now I've disable splits through the shell. Any insight into what maybe causing this would be really appreciated. Also if anyone is aware ofa log4j config for this situation that would be very useful.


Thanks,
Austin

Runaway region splitting

Reply via email to