Re: Hbase Unusable after auto split to 1024 regions

Pere Kyle Thu, 06 Nov 2014 11:15:14 -0800

Thanks again for your help!

I do not see a single entry in my logs for memstore pressure/global heap. I do 
see tons of logs from the periodicFlusher:
http://pastebin.com/8ZyVz8AH


This seems odd to me. Today alone there are 1829 flushes from periodicFlusher. 
Is there some other log4j property I need to set?

Here are some logs from memstore flushes:
2014-11-06 19:11:42,000 INFO org.apache.hadoop.hbase.regionserver.StoreFile 
(regionserver60020.cacheFlusher): NO General Bloom and NO DeleteFamily was 
added to HFile 
(hdfs://10.227.42.38:9000/hbase/weaver_events/4bafc4f16d984b2cca905e149584df8e/.tmp/c42aacd7e6c047229bb12291510bff50)
 
2014-11-06 19:11:42,000 INFO org.apache.hadoop.hbase.regionserver.Store 
(regionserver60020.cacheFlusher): Flushed , sequenceid=67387584, memsize=29.5 
K, into tmp file 
hdfs://10.227.42.38:9000/hbase/weaver_events/4bafc4f16d984b2cca905e149584df8e/.tmp/c42aacd7e6c047229bb12291510bff50
2014-11-06 19:11:44,683 INFO org.apache.hadoop.hbase.regionserver.Store 
(regionserver60020.cacheFlusher): Added 
hdfs://10.227.42.38:9000/hbase/weaver_events/4bafc4f16d984b2cca905e149584df8e/d/c42aacd7e6c047229bb12291510bff50,
 entries=150, sequenceid=67387584, filesize=3.2 K
2014-11-06 19:11:44,685 INFO org.apache.hadoop.hbase.regionserver.HRegion 
(regionserver60020.cacheFlusher): Finished memstore flush of ~29.5 K/30176, 
currentsize=0/0 for region 
weaver_events,21476b2c-7257-4787-9309-aaeab1e85392,1415157492044.4bafc4f16d984b2cca905e149584df8e.
 in 3880ms, sequenceid=67387584, compaction requested=false
2014-11-06 19:11:44,714 INFO org.apache.hadoop.hbase.regionserver.StoreFile 
(regionserver60020.cacheFlusher): Delete Family Bloom filter type for 
hdfs://10.227.42.38:9000/hbase/weaver_events/9b4c4b73035749a9865103366c9a5a87/.tmp/f10e53628784487290f788802808777a:
 CompoundBloomFilterWriter
2014-11-06 19:11:44,729 INFO org.apache.hadoop.hbase.regionserver.StoreFile 
(regionserver60020.cacheFlusher): NO General Bloom and NO DeleteFamily was 
added to HFile 
(hdfs://10.227.42.38:9000/hbase/weaver_events/9b4c4b73035749a9865103366c9a5a87/.tmp/f10e53628784487290f788802808777a)
 
2014-11-06 19:11:44,729 INFO org.apache.hadoop.hbase.regionserver.Store 
(regionserver60020.cacheFlusher): Flushed , sequenceid=67387656, memsize=41.2 
K, into tmp file 
hdfs://10.227.42.38:9000/hbase/weaver_events/9b4c4b73035749a9865103366c9a5a87/.tmp/f10e53628784487290f788802808777a
2014-11-06 19:11:44,806 INFO org.apache.hadoop.hbase.regionserver.Store 
(regionserver60020.cacheFlusher): Added 
hdfs://10.227.42.38:9000/hbase/weaver_events/9b4c4b73035749a9865103366c9a5a87/d/f10e53628784487290f788802808777a,
 entries=210, sequenceid=67387656, filesize=4.1 K
2014-11-06 19:11:44,807 INFO org.apache.hadoop.hbase.regionserver.HRegion 
(regionserver60020.cacheFlusher): Finished memstore flush of ~41.2 K/42232, 
currentsize=17.7 K/18080 for region 
weaver_events,30f6c923-8a37-4324-a404-377decd3ae06,1415154978597.9b4c4b73035749a9865103366c9a5a87.
 in 99ms, sequenceid=67387656, compaction requested=false

Thanks!
-Pere
On Nov 6, 2014, at 10:27 AM, Ted Yu <[email protected]> wrote:

> bq. Do I need to restart master for the memstore to take effect?
> No. memstore is used by region server.
> 
> Looks like debug logging was not turned on (judging from your previous
> pastebin).
> Some of flush related logs are at INFO level. e.g. Do you see any of the
> following log ?
> 
>      LOG.info("Flush of region " + regionToFlush + " due to global heap
> pressure");
> 
> Take a look
> at ./src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
> and you will find all the logs.
> 
> Cheers
> 
> On Thu, Nov 6, 2014 at 10:05 AM, Pere Kyle <[email protected]> wrote:
> 
>> So I have set the heap to 12Gb and the memstore limit to upperLimit .5
>> lowerLimit .45. I am not seeing any changes in behavior from the cluster so
>> far, i have restarted 4/17 region servers. Do I need to restart master for
>> the memstore to take effect? Also how do I enable logging to show why a
>> region is being flushed? I don’t ever seen the region flushes in my logs.
>> 
>> Thanks,
>> Pere
>> On Nov 6, 2014, at 7:12 AM, Ted Yu <[email protected]> wrote:
>> 
>>> bq. to increase heap and increase the memstore limit?
>>> 
>>> Yes. That would be an action that bears fruit.
>>> Long term, you should merge the small regions.
>>> 
>>> Cheers
>>> 
>>> On Wed, Nov 5, 2014 at 11:20 PM, Pere Kyle <[email protected]> wrote:
>>> 
>>>> Watching closely a region server in action. It seems that the memstores
>>>> are being flushed at around  2MB on the regions. This would seem to
>>>> indicate that there is not enough heap for the memstore and I am hitting
>>>> the upper bound of limit (default). Would this be a fair assumption?
>> Should
>>>> I look to increase heap and increase the memstore limit?
>>>> 
>>>> Thanks!
>>>> -Pere
>>>> 
>>>> On Nov 5, 2014, at 10:26 PM, Ted Yu <[email protected]> wrote:
>>>> 
>>>>> You can use ConstantSizeRegionSplitPolicy.
>>>>> Split policy can be specified per table. See the following example
>>>>> in create.rb :
>>>>> 
>>>>> hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO =>
>>>>> 'HexStringSplit'}
>>>>> 
>>>>> In 0.94.18, there isn't online merge. So you have to use other method
>> to
>>>>> merge the small regions.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Wed, Nov 5, 2014 at 10:14 PM, Pere Kyle <[email protected]> wrote:
>>>>> 
>>>>>> Ted,
>>>>>> 
>>>>>> Thanks so much for that information. I now see why this split too
>> often,
>>>>>> but what I am not sure of is how to fix this without blowing away the
>>>>>> cluster. Add more heap?
>>>>>> 
>>>>>> Another symptom I have noticed is that load on the Master instance
>> hbase
>>>>>> daemon has been pretty high (load average 4.0, whereas it used to be
>>>> 1.0)
>>>>>> 
>>>>>> Thanks,
>>>>>> Pere
>>>>>> 
>>>>>> On Nov 5, 2014, at 9:56 PM, Ted Yu <[email protected]> wrote:
>>>>>> 
>>>>>>> IncreasingToUpperBoundRegionSplitPolicy is the default split policy.
>>>>>>> 
>>>>>>> You can read the javadoc of this class to see how it works.
>>>>>>> 
>>>>>>> Cheers
>>>>>>> 
>>>>>>> On Wed, Nov 5, 2014 at 9:39 PM, Ted Yu <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Can you provide a bit more information (such as HBase release) ?
>>>>>>>> 
>>>>>>>> If you pastebin one of the region servers' log, that would help us
>>>>>>>> determine the cause.
>>>>>>>> 
>>>>>>>> Cheers
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Nov 5, 2014 at 9:29 PM, Pere Kyle <[email protected]> wrote:
>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> Recently our cluster which has been running fine for 2 weeks split
>> to
>>>>>>>>> 1024 regions at 1GB per region, after this split the cluster is
>>>>>> unusable.
>>>>>>>>> Using the performance benchmark I was getting a little better than
>>>> 100
>>>>>> w/s,
>>>>>>>>> whereas before it was 5000 w/s. There are 15 nodes of m2.2xlarge
>> with
>>>>>> 8GB
>>>>>>>>> heap reserved for Hbase
>>>>>>>>> 
>>>>>>>>> Any Ideas? I am stumped:
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Pere
>>>>>>>>> 
>>>>>>>>> Here is the current
>>>>>>>>> hbase-site.xml
>>>>>>>>> <?xml version="1.0"?>
>>>>>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>>>>>>>> <configuration>
>>>>>>>>> <property>
>>>>>>>>> <name>hbase.snapshot.enabled</name>
>>>>>>>>> <value>true</value>
>>>>>>>>> </property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>fs.hdfs.impl</name><value>emr.hbase.fs.BlockableFileSystem</value></property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.regionserver.handler.count</name><value>50</value></property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.cluster.distributed</name><value>true</value></property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.tmp.dir</name><value>/mnt/var/lib/hbase/tmp-data</value></property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.master.wait.for.log.splitting</name><value>true</value></property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.hregion.memstore.flush.size</name><value>134217728</value></property>
>>>>>>>>> <property><name>hbase.hregion.max.filesize</name><value>5073741824
>>>>>>>>> </value></property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>zookeeper.session.timeout</name><value>60000</value></property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.thrift.maxQueuedRequests</name><value>0</value></property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.client.scanner.caching</name><value>1000</value></property>
>>>>>>>>> 
>>>>>>>>> 
>>>>>> 
>>>> 
>> <property><name>hbase.hregion.memstore.block.multiplier</name><value>4</value></property>
>>>>>>>>> </configuration>
>>>>>>>>> 
>>>>>>>>> hbase-env.sh
>>>>>>>>> # The maximum amount of heap to use, in MB. Default is 1000.
>>>>>>>>> export HBASE_HEAPSIZE=8000
>>>>>>>>> 
>>>>>>>>> # Extra Java runtime options.
>>>>>>>>> # Below are what we set by default.  May only work with SUN JVM.
>>>>>>>>> # For more on why as well as other possible settings,
>>>>>>>>> # see http://wiki.apache.org/hadoop/PerformanceTuning
>>>>>>>>> export HBASE_OPTS="-XX:+UseConcMarkSweepGC”
>>>>>>>>> 
>>>>>>>>> hbase-env.sh
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Hbase Unusable after auto split to 1024 regions

Reply via email to