RE: YCSB load failed because hbase region too busy

Liu, Ming (HPIT-GADSC) Tue, 25 Nov 2014 07:36:12 -0800

Hi, Louis,

Sorry I cannot help here, I am just very curious about your YCSB test results. 
It is not easy to find latest YCSB testing result on internet.


In my own hbase env, my testing result is 'update is always better than read' 
and 'Scan is slightly better than update'. I tried many times with various 
hbase configuration and tested on two different hardware and get same result, 
although the absolute number is different, but 'random single write is always 
better than random single read' and '100-length scan is better than single 
write', very stable result. I modified the workload to add a pure update 
workload, it will be very appreciated that you can do that test too (By setting 
readproportion to 0 and updateproportion to 1), I also change the 
CoreWorkload.java doTransactionScan() to always do a 100-len scan instead of a 
random len scan, so I can easily get how many rows scanned and compare to pure 
write and pure read result.

If it is not proper to show the absolute number of your test result, could you 
at least tell me if in your test 'read is better than write' or 'write is 
better than read', and by how much? I asked a few times in this mailing list, 
and I think people explained to me that it is possible that write is better 
than read in HBase, but I still want to know if this is common or just in my 
env.

And I thought you may meet the 'stuck' issue mentioned in 
http://hbase.apache.org/book.html in section 9.7.7.7.1.1 , but I am not sure. 
Happy to know how you solve the issue later. And as Ram and Qiang,Tian 
mentioned, you can only 'alleviate' the issue by increasing the knob but if you 
give hbase too much pressure, it will not work well sooner or later. Everyone 
has its own limitation :-) 

Thanks,
Ming

-----Original Message-----
From: louis hust [mailto:[email protected]] 
Sent: Tuesday, November 25, 2014 9:44 PM
To: [email protected]
Subject: Re: YCSB load failed because hbase region too busy

hi ram,
thanks for help, i just do a test for bucket cache, in product env, we will 
follow your suggestion

Sent from my iPhone

> On 2014年11月25日, at 20:36, ramkrishna vasudevan 
> <[email protected]> wrote:
> 
> Your write ingest is too high. You have to control that by first 
> adding more nodes and ensuring that you have a more distributed load.  
> And also try with the changing the hbase.hstore.blockingStoreFiles.
> 
> Even changing the above value if your write ingest is so high such 
> that if it can reach this configured value again you can see blocking writes.
> 
> Regards
> RAm
> 
> 
>> On Tue, Nov 25, 2014 at 2:20 PM, Qiang Tian <[email protected]> wrote:
>> 
>> in your log:
>> 2014-11-25 13:31:35,048 WARN  [MemStoreFlusher.13]
>> regionserver.MemStoreFlusher: Region
>> usertable2,user8289,1416889268210.7e8fd83bb34b155bd0385aa63124a875. 
>> has too many store files; delaying flush up to 90000ms
>> 
>> please see my original reply...you can try increasing 
>> "hbase.hstore.blockingStoreFiles", also you have only 1 RS and you 
>> split to
>> 100 regions....you can try 2 RS with 20 regions.
>> 
>> 
>> 
>>> On Tue, Nov 25, 2014 at 3:42 PM, louis.hust <[email protected]> wrote:
>>> 
>>> yes, the stack trace like below:
>>> 
>>> 2014-11-25 13:35:40:946 4260 sec: 232700856 operations; 28173.18 
>>> current ops/sec; [INSERT AverageLatency(us)=637.59]
>>> 2014-11-25 13:35:50:946 4270 sec: 232700856 operations; 0 current
>> ops/sec;
>>> 14/11/25 13:35:59 INFO client.AsyncProcess: #14, table=usertable2,
>>> attempt=10/35 failed 109 ops, last exception:
>>> org.apache.hadoop.hbase.RegionTooBusyException:
>>> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore 
>>> limit,
>> regionName=usertable2,user8289,1416889268210.7e8fd83bb34b155bd0385aa6
>> 3124a875.,
>>> server=l-hbase10.dba.cn1.qunar.com,60020,1416889404151,
>>> memstoreSize=536886800, blockingMemStoreSize=536870912
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.j
>> ava:2822)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java
>> :2234)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java
>> :2201)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java
>> :2205)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionS
>> erver.java:4253)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionM
>> utation(HRegionServer.java:3469)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServe
>> r.java:3359)
>>>        at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService
>> $2.callBlockingMethod(ClientProtos.java:29503)
>>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
>>>        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>        at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpc
>> Scheduler.java:160)
>>>        at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSc
>> heduler.java:38)
>>>        at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedul
>> er.java:110)
>>>        at java.lang.Thread.run(Thread.java:744)
>>> 
>>> Then i loopup the memstore size for user8289, is 512M. and now is 
>>> still
>>> 512M(15:40)
>>> 
>>> The region server log is attached which maybe help.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Nov 25, 2014, at 15:27, ramkrishna vasudevan < 
>>> [email protected]> wrote:
>>> 
>>>> Are you getting any exceptions in the log?  Do you have a stack 
>>>> trace
>>> when
>>>> it is blocked?
>>>> 
>>>>> On Tue, Nov 25, 2014 at 12:30 PM, louis.hust 
>>>>> <[email protected]>
>>>> wrote:
>>>> 
>>>>> hi，Ram
>>>>> 
>>>>> After i modify the  hbase.hstore.flusher.count, it just improve 
>>>>> the
>>> load,
>>>>> but after one hour , the YCSB
>>>>> load program is still blocked! Then I change
>> hbase.hstore.flusher.count
>>> to
>>>>> 40, but it’s the same as 20,
>>>>> 
>>>>> On Nov 25, 2014, at 14:47, ramkrishna vasudevan < 
>>>>> [email protected]> wrote:
>>>>> 
>>>>>>>> hbase.hstore.flusher.count to 20 (default value is 2), and run 
>>>>>>>> the
>>> YCSB
>>>>>> to load data
>>>>>> with 32 threads
>>>>>> 
>>>>>> Apologies for the late reply. Your change of configuraton from 2 
>>>>>> to
>> 20
>>> is
>>>>>> right in this case because you are data ingest rate is high I
>> suppose.
>>>>>> 
>>>>>> Thanks for the reply.
>>>>>> 
>>>>>> Regards
>>>>>> Ram
>>>>>> 
>>>>>>> On Tue, Nov 25, 2014 at 12:09 PM, louis.hust 
>>>>>>> <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> hi, all
>>>>>>> 
>>>>>>> I retest the YCSB load data, and here is a situation which may
>> explain
>>>>> the
>>>>>>> load data blocked.
>>>>>>> 
>>>>>>> I use too many threads to insert values, so the flush thread is 
>>>>>>> not effectively to handle all memstore, and the user9099 
>>>>>>> memstore is queued at last, and waiting for flush
>> too
>>>>>>> long which blocks the YCSB request.
>>>>>>> 
>>>>>>> Then I modify the configuration, set hbase.hstore.flusher.count 
>>>>>>> to
>> 20
>>>>>>> (default value is 2), and run the YCSB to load data with 32 
>>>>>>> threads, it can run for 1 hour (with 2 threads just run for
>>> less
>>>>>>> than half 1 hour).
>>>>>>> 
>>>>>>> 
>>>>>>>> On Nov 20, 2014, at 23:20, louis.hust <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Hi Ram,
>>>>>>>> 
>>>>>>>> Thanks for your reply!
>>>>>>>> 
>>>>>>>> I use YCSB workloadc to load data, and from the web request
>> monitor i
>>>>>>> can see that
>>>>>>>> the write requests are distributed among all regions, so i 
>>>>>>>> think
>> the
>>>>>>> data get distributed,
>>>>>>>> 
>>>>>>>> And there are 32 thread writing to the region server, may be 
>>>>>>>> the
>>>>>>> concurrency and write rate is too high.
>>>>>>>> The writes are blocked but the memstore do not get flushed, i 
>>>>>>>> want
>> to
>>>>>>> know why?
>>>>>>>> 
>>>>>>>> The jvm heap is 64G and hbase.regionserver.global.memstore.size 
>>>>>>>> is
>>>>>>> default(0.4) about 25.6G,
>>>>>>>> and hbase.hregion.memstore.flush.size is default(132M),  but 
>>>>>>>> the
>>>>> blocked
>>>>>>> memstore user9099
>>>>>>>> reach 512m and do not flush at all.
>>>>>>>> 
>>>>>>>> other memstore related options:
>>>>>>>> 
>>>>>>>> hbase.hregion.memstore.mslab.enabled=true
>>>>>>>> hbase.regionserver.global.memstore.upperLimit=0.4
>>>>>>>> hbase.regionserver.global.memstore.lowerLimit=0.38
>>>>>>>> hbase.hregion.memstore.block.multiplier=4
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 20, 2014, at 20:38, ramkrishna vasudevan <
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>>> Check if the writes are going to that particular region and 
>>>>>>>>> its
>> rate
>>>>> is
>>>>>>> too high.  Ensure that the data gets distributed among all regions.
>>>>>>>>> What is the memstore size?
>>>>>>>>> 
>>>>>>>>> If the rate of writes is very high then the flushing will get
>> queued
>>>>>>> and until the memstore gets flushed such that it goes down the
>> global
>>>>> upper
>>>>>>> limit writes will be blocked.
>>>>>>>>> 
>>>>>>>>> I don't have the code now to see the exact config related to
>>> memstore.
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> Ram
>>>>>>>>> 
>>>>>>>>> On Thu, Nov 20, 2014 at 4:50 PM, louis.hust 
>>>>>>>>> <[email protected]
>>> 
>>>>>>> wrote:
>>>>>>>>> hi all,
>>>>>>>>> 
>>>>>>>>> I build an HBASE test environment, with three PC server, with 
>>>>>>>>> CHD
>>>>> 5.1.0
>>>>>>>>> 
>>>>>>>>> pc1 pc2 pc3
>>>>>>>>> 
>>>>>>>>> pc1 and pc2 as HMASTER and hadoop namenode
>>>>>>>>> pc3 as RegionServer and datanode
>>>>>>>>> 
>>>>>>>>> Then I create user as following:
>>>>>>>>> create 'usertable', 'family', {SPLITS => (1..100).map {|i|
>>>>>>> "user#{1000+i*(9999-1000)/100}"} }
>>>>>>>>> Using YCSB for load data as following:
>>>>>>>>> 
>>>>>>>>> ./bin/ycsb  load  hbase   -P workloads/workloadc  -p
>>>>>>> columnfamily=family -p recordcount=1000000000   -p threadcount=32
>>> -s  >
>>>>>>> result/workloadc
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> But when after a while, the ycsb return with following error:
>>>>>>>>> 
>>>>>>>>> 14/11/20 12:23:44 INFO client.AsyncProcess: #15, 
>>>>>>>>> table=usertable,
>>>>>>> attempt=35/35 failed 715 ops, last exception:
>>>>>>> org.apache.hadoop.hbase.RegionTooBusyException:
>>>>>>> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore
>> limit,
>> regionName=usertable,user9099,1416453519676.2552d36eb407a8af12d2b58c9
>> 73d68a9.,
>>>>>>> server=l-hbase10.dba.cn1,60020,1416451280772,
>> memstoreSize=536897120,
>>>>>>> blockingMemStoreSize=536870912
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.j
>> ava:2822)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java
>> :2234)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java
>> :2201)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java
>> :2205)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionS
>> erver.java:4253)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionM
>> utation(HRegionServer.java:3469)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServe
>> r.java:3359)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService
>> $2.callBlockingMethod(ClientProtos.java:29503)
>>>>>>>>>      at
>>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
>>>>>>>>>      at
>>>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpc
>> Scheduler.java:160)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSc
>> heduler.java:38)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedul
>> er.java:110)
>>>>>>>>>      at java.lang.Thread.run(Thread.java:744)
>>>>>>>>> on l-hbase10.dba.cn1,60020,1416451280772, tracking started Thu 
>>>>>>>>> Nov
>>> 20
>>>>>>> 12:15:07 CST 2014, retrying after 20051 ms, replay 715 ops.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> It seems the user9099 region is too busy, so I lookup the 
>>>>>>>>> memstore
>>>>>>> metrics in web:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> As you see, the user9099 is bigger than other region, I think 
>>>>>>>>> it
>> is
>>>>>>> flushing, but after a while, it does not change to a small size 
>>>>>>> and
>>> YCSB
>>>>>>> quit finally.
>>>>>>>>> 
>>>>>>>>> But when i change the concurrency threads to 4, all is right. 
>>>>>>>>> I
>> want
>>>>> to
>>>>>>> know why?
>>>>>>>>> 
>>>>>>>>> Any idea will be appreciated.
>>

RE: YCSB load failed because hbase region too busy

Reply via email to