Re: Slow region moves

ramkrishna vasudevan Wed, 21 Oct 2015 03:53:33 -0700

Seems that the BucketAllocator#freeBlock() is synchronized and hence all
the bulk close that it tries to do will be blocked in the synchronized
block.  May be something like the IdLock has to be tried here?


Regards
Ram

On Wed, Oct 21, 2015 at 4:20 PM, ramkrishna vasudevan <
[email protected]> wrote:

> I think the forceful clearing of the blocks from the bucket cache is
> hurting in this case.  I think it is worth opening a JIRA for this and work
> on a fix.
>
> Regards
> Ram
>
> On Wed, Oct 21, 2015 at 12:00 AM, Randy Fox <[email protected]> wrote:
>
>> Hi Vlad,
>>
>> I tried it on a table and on a RegionServer basis and it appears to have
>> no affect.
>> Are we sure it is supported for bucket cache?  From my charts the bucket
>> cache is getting cleared at the same time as the region moves occurred.
>> The regions slow to move are the ones with bucket cache.
>>
>> I took a table with 102 regions and blockcache true and turned off block
>> cache via alter while the table is enabled - it took 19 minutes.  To turn
>> block cache back on took 4.3 seconds.
>>
>> Let me know if there is anything else to try.  This issue is really
>> hurting our day to day ops.
>>
>> Thanks,
>>
>> Randy
>>
>>
>>
>> On 10/15/15, 3:55 PM, "Vladimir Rodionov" <[email protected]> wrote:
>>
>> >Hey, Randy
>> >
>> >You can verify your hypothesis by setting hbase.rs.evictblocksonclose to
>> >false for your tables.
>> >
>> >-Vlad
>> >
>> >On Thu, Oct 15, 2015 at 1:06 PM, Randy Fox <[email protected]> wrote:
>> >
>> >> Caveat - we are trying to tune the BucketCache (probably a new thread
>> - as
>> >> we are not sure we are getting the most out of it)
>> >> 72G off heap
>> >>
>> >> <property>
>> >>    <name>hfile.block.cache.size</name>
>> >>    <value>0.58</value>
>> >> </property>
>> >>
>> >> <property>
>> >>    <name>hbase.bucketcache.ioengine</name>
>> >>    <value>offheap</value>
>> >> </property>
>> >>
>> >> <property>
>> >>    <name>hbase.bucketcache.size</name>
>> >>    <value>72800</value>
>> >> </property>
>> >>
>> >> <property>
>> >>    <name>hbase.bucketcache.bucket.sizes</name>
>> >>    <value>9216,17408,33792,66560</value>
>> >> </property>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 10/15/15, 12:00 PM, "Ted Yu" <[email protected]> wrote:
>> >>
>> >> >I am a bit curious.
>> >> >0.94 doesn't have BucketCache.
>> >> >
>> >> >Can you share BucketCache related config parameters in your cluster ?
>> >> >
>> >> >Cheers
>> >> >
>> >> >On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox <[email protected]>
>> wrote:
>> >> >
>> >> >>
>> >> >> "StoreFileCloserThread-L-1" prio=10 tid=0x00000000027ec800
>> nid=0xad84
>> >> >> runnable [0x00007fbcc0c65000]
>> >> >>    java.lang.Thread.State: RUNNABLE
>> >> >>         at java.util.LinkedList.indexOf(LinkedList.java:602)
>> >> >>         at java.util.LinkedList.contains(LinkedList.java:315)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449)
>> >> >>         - locked <0x000000041b0887a8> (a
>> >> >> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:503)
>> >> >>         - locked <0x00000004944ff2d8> (a
>> >> >> org.apache.hadoop.hbase.regionserver.StoreFile)
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:873)
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:870)
>> >> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >>         at
>> >> >>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >>
>> >> >>
>> >>
>> "StoreCloserThread-Wildfire_graph3,\x00\x04lK\x1B\xFC\x10\xD2,1402949830657.afb6a1720d936a83d73022aeb9ddbb6c.-1"
>> >> >> prio=10 tid=0x0000000003508800 nid=0xad83 waiting on condition
>> >> >> [0x00007fbcc5dcc000]
>> >> >>    java.lang.Thread.State: WAITING (parking)
>> >> >>         at sun.misc.Unsafe.park(Native Method)
>> >> >>         - parking to wait for  <0x0000000534e90a80> (a
>> >> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> >> >>         at
>> >> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:883)
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:126)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1378)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1375)
>> >> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >>         at
>> >> >>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >>
>> >> >>
>> >> >> "RS_CLOSE_REGION-hb20:60020-0" prio=10 tid=0x00007fcec0142000
>> nid=0x3056
>> >> >> waiting on condition [0x00007fbcc2d87000]
>> >> >>    java.lang.Thread.State: WAITING (parking)
>> >> >>         at sun.misc.Unsafe.park(Native Method)
>> >> >>         - parking to wait for  <0x0000000534e61360> (a
>> >> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> >> >>         at
>> >> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1385)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1280)
>> >> >>         - locked <0x000000042230fa68> (a java.lang.Object)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >>
>> >> >>
>> >> >> I attached the whole thing as well.
>> >> >>
>> >> >> -r
>> >> >>
>> >> >>
>> >> >> On 10/15/15, 10:39 AM, "Ted Yu" <[email protected]> wrote:
>> >> >>
>> >> >> >Can you give a bit more detail on why block eviction was cause for
>> the
>> >> >> slow region movement?
>> >> >> >
>> >> >> >Did you happen to take stack traces ?
>> >> >> >
>> >> >> >Thanks
>> >> >> >
>> >> >> >> On Oct 15, 2015, at 10:32 AM, Randy Fox <[email protected]>
>> wrote:
>> >> >> >>
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> We just upgraded from 0.94 to 1.0.0 and have noticed that region
>> >> moves
>> >> >> are super slow (order of minutes) whereas previously they where in
>> the
>> >> >> seconds range.  After looking at the code, I think the time is spent
>> >> >> waiting for the blocks to be evicted from block cache.
>> >> >> >>
>> >> >> >> I wanted to verify that this theory is correct and see if there
>> is
>> >> >> anything that can be done to speed up the moves.
>> >> >> >>
>> >> >> >> This is particular painful as we are trying to get our configs
>> tuned
>> >> to
>> >> >> the new SW and need to do rolling restarts which is taking almost 24
>> >> hours
>> >> >> on our cluster.  We also do our own manual rebalancing of regions
>> across
>> >> >> RS’s and that task is also now painful.
>> >> >> >>
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >>
>> >> >> >> Randy
>> >> >>
>> >>
>>
>
>

Re: Slow region moves

Reply via email to