Re: YCSB load failed because hbase region too busy

louis.hust Mon, 24 Nov 2014 23:43:49 -0800

yes, the stack trace like below:

2014-11-25 13:35:40:946 4260 sec: 232700856 operations; 28173.18 current 
ops/sec; [INSERT AverageLatency(us)=637.59]
2014-11-25 13:35:50:946 4270 sec: 232700856 operations; 0 current ops/sec;
14/11/25 13:35:59 INFO client.AsyncProcess: #14, table=usertable2, 
attempt=10/35 failed 109 ops, last exception: 
org.apache.hadoop.hbase.RegionTooBusyException: 
org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
regionName=usertable2,user8289,1416889268210.7e8fd83bb34b155bd0385aa63124a875., 
server=l-hbase10.dba.cn1.qunar.com,60020,1416889404151, memstoreSize=536886800, 
blockingMemStoreSize=536870912
        at 
org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2822)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2234)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2201)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2205)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4253)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3469)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3359)
        at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29503)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
        at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
        at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
        at java.lang.Thread.run(Thread.java:744)


Then i loopup the memstore size for user8289, is 512M. and now is still 
512M(15:40)

The region server log is attached which maybe help.

log.tar.gz
Description: GNU Zip compressed data




On Nov 25, 2014, at 15:27, ramkrishna vasudevan 
<[email protected]> wrote:

> Are you getting any exceptions in the log?  Do you have a stack trace when
> it is blocked?
> 
> On Tue, Nov 25, 2014 at 12:30 PM, louis.hust <[email protected]> wrote:
> 
>> hi，Ram
>> 
>> After i modify the  hbase.hstore.flusher.count, it just improve the load,
>> but after one hour , the YCSB
>> load program is still blocked! Then I change hbase.hstore.flusher.count to
>> 40, but it’s the same as 20,
>> 
>> On Nov 25, 2014, at 14:47, ramkrishna vasudevan <
>> [email protected]> wrote:
>> 
>>>>> hbase.hstore.flusher.count to 20 (default value is 2), and run the YCSB
>>> to load data
>>> with 32 threads
>>> 
>>> Apologies for the late reply. Your change of configuraton from 2 to 20 is
>>> right in this case because you are data ingest rate is high I suppose.
>>> 
>>> Thanks for the reply.
>>> 
>>> Regards
>>> Ram
>>> 
>>> On Tue, Nov 25, 2014 at 12:09 PM, louis.hust <[email protected]>
>> wrote:
>>> 
>>>> hi, all
>>>> 
>>>> I retest the YCSB load data, and here is a situation which may explain
>> the
>>>> load data blocked.
>>>> 
>>>> I use too many threads to insert values, so the flush thread is not
>>>> effectively to handle all memstore,
>>>> and the user9099 memstore is queued at last, and waiting for flush too
>>>> long which blocks the YCSB request.
>>>> 
>>>> Then I modify the configuration, set hbase.hstore.flusher.count to 20
>>>> (default value is 2), and run the YCSB to load data
>>>> with 32 threads, it can run for 1 hour (with 2 threads just run for less
>>>> than half 1 hour).
>>>> 
>>>> 
>>>> On Nov 20, 2014, at 23:20, louis.hust <[email protected]> wrote:
>>>> 
>>>>> Hi Ram,
>>>>> 
>>>>> Thanks for your reply!
>>>>> 
>>>>> I use YCSB workloadc to load data, and from the web request monitor i
>>>> can see that
>>>>> the write requests are distributed among all regions, so i think the
>>>> data get distributed,
>>>>> 
>>>>> And there are 32 thread writing to the region server, may be the
>>>> concurrency and write rate is too high.
>>>>> The writes are blocked but the memstore do not get flushed, i want to
>>>> know why?
>>>>> 
>>>>> The jvm heap is 64G and hbase.regionserver.global.memstore.size is
>>>> default(0.4) about 25.6G,
>>>>> and hbase.hregion.memstore.flush.size is default(132M),  but the
>> blocked
>>>> memstore user9099
>>>>> reach 512m and do not flush at all.
>>>>> 
>>>>> other memstore related options:
>>>>> 
>>>>> hbase.hregion.memstore.mslab.enabled=true
>>>>> hbase.regionserver.global.memstore.upperLimit=0.4
>>>>> hbase.regionserver.global.memstore.lowerLimit=0.38
>>>>> hbase.hregion.memstore.block.multiplier=4
>>>>> 
>>>>> 
>>>>> On Nov 20, 2014, at 20:38, ramkrishna vasudevan <
>>>> [email protected]> wrote:
>>>>> 
>>>>>> Check if the writes are going to that particular region and its rate
>> is
>>>> too high.  Ensure that the data gets distributed among all regions.
>>>>>> What is the memstore size?
>>>>>> 
>>>>>> If the rate of writes is very high then the flushing will get queued
>>>> and until the memstore gets flushed such that it goes down the global
>> upper
>>>> limit writes will be blocked.
>>>>>> 
>>>>>> I don't have the code now to see the exact config related to memstore.
>>>>>> 
>>>>>> Regards
>>>>>> Ram
>>>>>> 
>>>>>> On Thu, Nov 20, 2014 at 4:50 PM, louis.hust <[email protected]>
>>>> wrote:
>>>>>> hi all,
>>>>>> 
>>>>>> I build an HBASE test environment, with three PC server, with CHD
>> 5.1.0
>>>>>> 
>>>>>> pc1 pc2 pc3
>>>>>> 
>>>>>> pc1 and pc2 as HMASTER and hadoop namenode
>>>>>> pc3 as RegionServer and datanode
>>>>>> 
>>>>>> Then I create user as following:
>>>>>> create 'usertable', 'family', {SPLITS => (1..100).map {|i|
>>>> "user#{1000+i*(9999-1000)/100}"} }
>>>>>> Using YCSB for load data as following:
>>>>>> 
>>>>>> ./bin/ycsb  load  hbase   -P workloads/workloadc  -p
>>>> columnfamily=family -p recordcount=1000000000   -p threadcount=32  -s  >
>>>> result/workloadc
>>>>>> 
>>>>>> 
>>>>>> But when after a while, the ycsb return with following error:
>>>>>> 
>>>>>> 14/11/20 12:23:44 INFO client.AsyncProcess: #15, table=usertable,
>>>> attempt=35/35 failed 715 ops, last exception:
>>>> org.apache.hadoop.hbase.RegionTooBusyException:
>>>> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit,
>>>> 
>> regionName=usertable,user9099,1416453519676.2552d36eb407a8af12d2b58c973d68a9.,
>>>> server=l-hbase10.dba.cn1,60020,1416451280772, memstoreSize=536897120,
>>>> blockingMemStoreSize=536870912
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2822)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2234)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2201)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2205)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4253)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3469)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3359)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29503)
>>>>>>       at
>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
>>>>>>       at
>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
>>>>>>       at
>>>> 
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
>>>>>>       at java.lang.Thread.run(Thread.java:744)
>>>>>> on l-hbase10.dba.cn1,60020,1416451280772, tracking started Thu Nov 20
>>>> 12:15:07 CST 2014, retrying after 20051 ms, replay 715 ops.
>>>>>> 
>>>>>> 
>>>>>> It seems the user9099 region is too busy, so I lookup the memstore
>>>> metrics in web:
>>>>>> 
>>>>>> 
>>>>>> As you see, the user9099 is bigger than other region, I think it is
>>>> flushing, but after a while, it does not change to a small size and YCSB
>>>> quit finally.
>>>>>> 
>>>>>> But when i change the concurrency threads to 4, all is right. I want
>> to
>>>> know why?
>>>>>> 
>>>>>> Any idea will be appreciated.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: YCSB load failed because hbase region too busy

Reply via email to