Re: Slow region moves

Randy Fox Tue, 20 Oct 2015 11:30:43 -0700

Hi Vlad,

I tried it on a table and on a RegionServer basis and it appears to have no 
affect.  
Are we sure it is supported for bucket cache?  From my charts the bucket cache 
is getting cleared at the same time as the region moves occurred.  The regions 
slow to move are the ones with bucket cache.


I took a table with 102 regions and blockcache true and turned off block cache 
via alter while the table is enabled - it took 19 minutes.  To turn block cache 
back on took 4.3 seconds.

Let me know if there is anything else to try.  This issue is really hurting our 
day to day ops.

Thanks,

Randy



On 10/15/15, 3:55 PM, "Vladimir Rodionov" <[email protected]> wrote:

>Hey, Randy
>
>You can verify your hypothesis by setting hbase.rs.evictblocksonclose to
>false for your tables.
>
>-Vlad
>
>On Thu, Oct 15, 2015 at 1:06 PM, Randy Fox <[email protected]> wrote:
>
>> Caveat - we are trying to tune the BucketCache (probably a new thread - as
>> we are not sure we are getting the most out of it)
>> 72G off heap
>>
>> <property>
>>    <name>hfile.block.cache.size</name>
>>    <value>0.58</value>
>> </property>
>>
>> <property>
>>    <name>hbase.bucketcache.ioengine</name>
>>    <value>offheap</value>
>> </property>
>>
>> <property>
>>    <name>hbase.bucketcache.size</name>
>>    <value>72800</value>
>> </property>
>>
>> <property>
>>    <name>hbase.bucketcache.bucket.sizes</name>
>>    <value>9216,17408,33792,66560</value>
>> </property>
>>
>>
>>
>>
>>
>>
>> On 10/15/15, 12:00 PM, "Ted Yu" <[email protected]> wrote:
>>
>> >I am a bit curious.
>> >0.94 doesn't have BucketCache.
>> >
>> >Can you share BucketCache related config parameters in your cluster ?
>> >
>> >Cheers
>> >
>> >On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox <[email protected]> wrote:
>> >
>> >>
>> >> "StoreFileCloserThread-L-1" prio=10 tid=0x00000000027ec800 nid=0xad84
>> >> runnable [0x00007fbcc0c65000]
>> >>    java.lang.Thread.State: RUNNABLE
>> >>         at java.util.LinkedList.indexOf(LinkedList.java:602)
>> >>         at java.util.LinkedList.contains(LinkedList.java:315)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449)
>> >>         - locked <0x000000041b0887a8> (a
>> >> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:503)
>> >>         - locked <0x00000004944ff2d8> (a
>> >> org.apache.hadoop.hbase.regionserver.StoreFile)
>> >>         at
>> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:873)
>> >>         at
>> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:870)
>> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >>         at
>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>         at java.lang.Thread.run(Thread.java:745)
>> >>
>> >>
>> "StoreCloserThread-Wildfire_graph3,\x00\x04lK\x1B\xFC\x10\xD2,1402949830657.afb6a1720d936a83d73022aeb9ddbb6c.-1"
>> >> prio=10 tid=0x0000000003508800 nid=0xad83 waiting on condition
>> >> [0x00007fbcc5dcc000]
>> >>    java.lang.Thread.State: WAITING (parking)
>> >>         at sun.misc.Unsafe.park(Native Method)
>> >>         - parking to wait for  <0x0000000534e90a80> (a
>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> >>         at
>> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >>         at
>> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>> >>         at
>> >>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>> >>         at
>> >>
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>> >>         at
>> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:883)
>> >>         at
>> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:126)
>> >>         at
>> >> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1378)
>> >>         at
>> >> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1375)
>> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >>         at
>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>         at java.lang.Thread.run(Thread.java:745)
>> >>
>> >>
>> >> "RS_CLOSE_REGION-hb20:60020-0" prio=10 tid=0x00007fcec0142000 nid=0x3056
>> >> waiting on condition [0x00007fbcc2d87000]
>> >>    java.lang.Thread.State: WAITING (parking)
>> >>         at sun.misc.Unsafe.park(Native Method)
>> >>         - parking to wait for  <0x0000000534e61360> (a
>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> >>         at
>> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >>         at
>> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>> >>         at
>> >>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>> >>         at
>> >>
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>> >>         at
>> >> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1385)
>> >>         at
>> >> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1280)
>> >>         - locked <0x000000042230fa68> (a java.lang.Object)
>> >>         at
>> >>
>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
>> >>         at
>> >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>         at java.lang.Thread.run(Thread.java:745)
>> >>
>> >>
>> >> I attached the whole thing as well.
>> >>
>> >> -r
>> >>
>> >>
>> >> On 10/15/15, 10:39 AM, "Ted Yu" <[email protected]> wrote:
>> >>
>> >> >Can you give a bit more detail on why block eviction was cause for the
>> >> slow region movement?
>> >> >
>> >> >Did you happen to take stack traces ?
>> >> >
>> >> >Thanks
>> >> >
>> >> >> On Oct 15, 2015, at 10:32 AM, Randy Fox <[email protected]> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> We just upgraded from 0.94 to 1.0.0 and have noticed that region
>> moves
>> >> are super slow (order of minutes) whereas previously they where in the
>> >> seconds range.  After looking at the code, I think the time is spent
>> >> waiting for the blocks to be evicted from block cache.
>> >> >>
>> >> >> I wanted to verify that this theory is correct and see if there is
>> >> anything that can be done to speed up the moves.
>> >> >>
>> >> >> This is particular painful as we are trying to get our configs tuned
>> to
>> >> the new SW and need to do rolling restarts which is taking almost 24
>> hours
>> >> on our cluster.  We also do our own manual rebalancing of regions across
>> >> RS’s and that task is also now painful.
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Randy
>> >>
>>

Re: Slow region moves

Reply via email to