Re: Slow region moves

Randy Fox Thu, 15 Oct 2015 13:07:13 -0700

Caveat - we are trying to tune the BucketCache (probably a new thread - as we 
are not sure we are getting the most out of it)
72G off heap


<property>
   <name>hfile.block.cache.size</name>
   <value>0.58</value>
</property>

<property>
   <name>hbase.bucketcache.ioengine</name>
   <value>offheap</value>
</property>

<property>
   <name>hbase.bucketcache.size</name>
   <value>72800</value>
</property>

<property>
   <name>hbase.bucketcache.bucket.sizes</name>
   <value>9216,17408,33792,66560</value>
</property>






On 10/15/15, 12:00 PM, "Ted Yu" <[email protected]> wrote:

>I am a bit curious.
>0.94 doesn't have BucketCache.
>
>Can you share BucketCache related config parameters in your cluster ?
>
>Cheers
>
>On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox <[email protected]> wrote:
>
>>
>> "StoreFileCloserThread-L-1" prio=10 tid=0x00000000027ec800 nid=0xad84
>> runnable [0x00007fbcc0c65000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.util.LinkedList.indexOf(LinkedList.java:602)
>>         at java.util.LinkedList.contains(LinkedList.java:315)
>>         at
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247)
>>         at
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449)
>>         - locked <0x000000041b0887a8> (a
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator)
>>         at
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459)
>>         at
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036)
>>         at
>> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90)
>>         at
>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516)
>>         at
>> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143)
>>         at
>> org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:503)
>>         - locked <0x00000004944ff2d8> (a
>> org.apache.hadoop.hbase.regionserver.StoreFile)
>>         at
>> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:873)
>>         at
>> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:870)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> "StoreCloserThread-Wildfire_graph3,\x00\x04lK\x1B\xFC\x10\xD2,1402949830657.afb6a1720d936a83d73022aeb9ddbb6c.-1"
>> prio=10 tid=0x0000000003508800 nid=0xad83 waiting on condition
>> [0x00007fbcc5dcc000]
>>    java.lang.Thread.State: WAITING (parking)
>>         at sun.misc.Unsafe.park(Native Method)
>>         - parking to wait for  <0x0000000534e90a80> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>         at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>>         at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>>         at
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>>         at
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>>         at
>> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:883)
>>         at
>> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:126)
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1378)
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1375)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>>
>> "RS_CLOSE_REGION-hb20:60020-0" prio=10 tid=0x00007fcec0142000 nid=0x3056
>> waiting on condition [0x00007fbcc2d87000]
>>    java.lang.Thread.State: WAITING (parking)
>>         at sun.misc.Unsafe.park(Native Method)
>>         - parking to wait for  <0x0000000534e61360> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>         at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>>         at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>>         at
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>>         at
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1385)
>>         at
>> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1280)
>>         - locked <0x000000042230fa68> (a java.lang.Object)
>>         at
>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
>>         at
>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>>
>> I attached the whole thing as well.
>>
>> -r
>>
>>
>> On 10/15/15, 10:39 AM, "Ted Yu" <[email protected]> wrote:
>>
>> >Can you give a bit more detail on why block eviction was cause for the
>> slow region movement?
>> >
>> >Did you happen to take stack traces ?
>> >
>> >Thanks
>> >
>> >> On Oct 15, 2015, at 10:32 AM, Randy Fox <[email protected]> wrote:
>> >>
>> >> Hi,
>> >>
>> >> We just upgraded from 0.94 to 1.0.0 and have noticed that region moves
>> are super slow (order of minutes) whereas previously they where in the
>> seconds range.  After looking at the code, I think the time is spent
>> waiting for the blocks to be evicted from block cache.
>> >>
>> >> I wanted to verify that this theory is correct and see if there is
>> anything that can be done to speed up the moves.
>> >>
>> >> This is particular painful as we are trying to get our configs tuned to
>> the new SW and need to do rolling restarts which is taking almost 24 hours
>> on our cluster.  We also do our own manual rebalancing of regions across
>> RS’s and that task is also now painful.
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Randy
>>

Re: Slow region moves

Reply via email to