I am a bit curious. 0.94 doesn't have BucketCache. Can you share BucketCache related config parameters in your cluster ?
Cheers On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox <[email protected]> wrote: > > "StoreFileCloserThread-L-1" prio=10 tid=0x00000000027ec800 nid=0xad84 > runnable [0x00007fbcc0c65000] > java.lang.Thread.State: RUNNABLE > at java.util.LinkedList.indexOf(LinkedList.java:602) > at java.util.LinkedList.contains(LinkedList.java:315) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449) > - locked <0x000000041b0887a8> (a > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036) > at > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143) > at > org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:503) > - locked <0x00000004944ff2d8> (a > org.apache.hadoop.hbase.regionserver.StoreFile) > at > org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:873) > at > org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:870) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > "StoreCloserThread-Wildfire_graph3,\x00\x04lK\x1B\xFC\x10\xD2,1402949830657.afb6a1720d936a83d73022aeb9ddbb6c.-1" > prio=10 tid=0x0000000003508800 nid=0xad83 waiting on condition > [0x00007fbcc5dcc000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000534e90a80> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) > at > org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:883) > at > org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:126) > at > org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1378) > at > org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1375) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > "RS_CLOSE_REGION-hb20:60020-0" prio=10 tid=0x00007fcec0142000 nid=0x3056 > waiting on condition [0x00007fbcc2d87000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000534e61360> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1385) > at > org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1280) > - locked <0x000000042230fa68> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > I attached the whole thing as well. > > -r > > > On 10/15/15, 10:39 AM, "Ted Yu" <[email protected]> wrote: > > >Can you give a bit more detail on why block eviction was cause for the > slow region movement? > > > >Did you happen to take stack traces ? > > > >Thanks > > > >> On Oct 15, 2015, at 10:32 AM, Randy Fox <[email protected]> wrote: > >> > >> Hi, > >> > >> We just upgraded from 0.94 to 1.0.0 and have noticed that region moves > are super slow (order of minutes) whereas previously they where in the > seconds range. After looking at the code, I think the time is spent > waiting for the blocks to be evicted from block cache. > >> > >> I wanted to verify that this theory is correct and see if there is > anything that can be done to speed up the moves. > >> > >> This is particular painful as we are trying to get our configs tuned to > the new SW and need to do rolling restarts which is taking almost 24 hours > on our cluster. We also do our own manual rebalancing of regions across > RS’s and that task is also now painful. > >> > >> > >> Thanks, > >> > >> Randy >
