Good to know :) -Vlad
On Thu, Oct 22, 2015 at 9:40 AM, Randy Fox <[email protected]> wrote: > Hi Vlad, > > So far patch seems to work perfectly. > > -randy > > > > > On 10/21/15, 12:52 PM, "Vladimir Rodionov" <[email protected]> wrote: > > >Randy, > > > >You can try patch I just submitted. It is for master but I verified it on > >1.0 branch as well. > > > >-Vlad > > > >On Wed, Oct 21, 2015 at 11:40 AM, Randy Fox <[email protected]> wrote: > > > >> https://issues.apache.org/jira/browse/HBASE-14663 > >> > >> -r > >> > >> > >> > >> On 10/21/15, 10:35 AM, "Vladimir Rodionov" <[email protected]> > wrote: > >> > >> >You are right, Randy > >> > > >> >This is the bug. Will you open JIRA? > >> > > >> >-Vlad > >> > > >> >On Wed, Oct 21, 2015 at 9:35 AM, Randy Fox <[email protected]> wrote: > >> > > >> >> Maybe I am looking in the wrong place but Hstore::close() has the > >> >> evictOnClose parameter hard coded to true: > >> >> > >> >> // close each store file in parallel > >> >> CompletionService<Void> completionService = > >> >> new ExecutorCompletionService<Void>(storeFileCloserThreadPool); > >> >> for (final StoreFile f : result) { > >> >> completionService.submit(new Callable<Void>() { > >> >> @Override > >> >> public Void call() throws IOException { > >> >> f.closeReader(true); > >> >> return null; > >> >> } > >> >> }); > >> >> } > >> >> > >> >> > >> >> Where does that setting come into play? > >> >> > >> >> -r > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> On 10/21/15, 8:14 AM, "Vladimir Rodionov" <[email protected]> > >> wrote: > >> >> > >> >> >I wonder why disabling cache eviction on close does not work in a > case > >> of > >> >> a > >> >> >bucket cache? I checked the code and did not find > >> >> >anything suspicious. It has to work. > >> >> > > >> >> >On Wed, Oct 21, 2015 at 3:52 AM, ramkrishna vasudevan < > >> >> >[email protected]> wrote: > >> >> > > >> >> >> Seems that the BucketAllocator#freeBlock() is synchronized and > hence > >> all > >> >> >> the bulk close that it tries to do will be blocked in the > >> synchronized > >> >> >> block. May be something like the IdLock has to be tried here? > >> >> >> > >> >> >> Regards > >> >> >> Ram > >> >> >> > >> >> >> On Wed, Oct 21, 2015 at 4:20 PM, ramkrishna vasudevan < > >> >> >> [email protected]> wrote: > >> >> >> > >> >> >> > I think the forceful clearing of the blocks from the bucket > cache > >> is > >> >> >> > hurting in this case. I think it is worth opening a JIRA for > this > >> and > >> >> >> work > >> >> >> > on a fix. > >> >> >> > > >> >> >> > Regards > >> >> >> > Ram > >> >> >> > > >> >> >> > On Wed, Oct 21, 2015 at 12:00 AM, Randy Fox <[email protected] > > > >> >> wrote: > >> >> >> > > >> >> >> >> Hi Vlad, > >> >> >> >> > >> >> >> >> I tried it on a table and on a RegionServer basis and it > appears > >> to > >> >> have > >> >> >> >> no affect. > >> >> >> >> Are we sure it is supported for bucket cache? From my charts > the > >> >> bucket > >> >> >> >> cache is getting cleared at the same time as the region moves > >> >> occurred. > >> >> >> >> The regions slow to move are the ones with bucket cache. > >> >> >> >> > >> >> >> >> I took a table with 102 regions and blockcache true and turned > off > >> >> block > >> >> >> >> cache via alter while the table is enabled - it took 19 > minutes. > >> To > >> >> >> turn > >> >> >> >> block cache back on took 4.3 seconds. > >> >> >> >> > >> >> >> >> Let me know if there is anything else to try. This issue is > >> really > >> >> >> >> hurting our day to day ops. > >> >> >> >> > >> >> >> >> Thanks, > >> >> >> >> > >> >> >> >> Randy > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> On 10/15/15, 3:55 PM, "Vladimir Rodionov" < > [email protected] > >> > > >> >> >> wrote: > >> >> >> >> > >> >> >> >> >Hey, Randy > >> >> >> >> > > >> >> >> >> >You can verify your hypothesis by setting > >> >> hbase.rs.evictblocksonclose > >> >> >> to > >> >> >> >> >false for your tables. > >> >> >> >> > > >> >> >> >> >-Vlad > >> >> >> >> > > >> >> >> >> >On Thu, Oct 15, 2015 at 1:06 PM, Randy Fox < > [email protected]> > >> >> wrote: > >> >> >> >> > > >> >> >> >> >> Caveat - we are trying to tune the BucketCache (probably a > new > >> >> thread > >> >> >> >> - as > >> >> >> >> >> we are not sure we are getting the most out of it) > >> >> >> >> >> 72G off heap > >> >> >> >> >> > >> >> >> >> >> <property> > >> >> >> >> >> <name>hfile.block.cache.size</name> > >> >> >> >> >> <value>0.58</value> > >> >> >> >> >> </property> > >> >> >> >> >> > >> >> >> >> >> <property> > >> >> >> >> >> <name>hbase.bucketcache.ioengine</name> > >> >> >> >> >> <value>offheap</value> > >> >> >> >> >> </property> > >> >> >> >> >> > >> >> >> >> >> <property> > >> >> >> >> >> <name>hbase.bucketcache.size</name> > >> >> >> >> >> <value>72800</value> > >> >> >> >> >> </property> > >> >> >> >> >> > >> >> >> >> >> <property> > >> >> >> >> >> <name>hbase.bucketcache.bucket.sizes</name> > >> >> >> >> >> <value>9216,17408,33792,66560</value> > >> >> >> >> >> </property> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> On 10/15/15, 12:00 PM, "Ted Yu" <[email protected]> > wrote: > >> >> >> >> >> > >> >> >> >> >> >I am a bit curious. > >> >> >> >> >> >0.94 doesn't have BucketCache. > >> >> >> >> >> > > >> >> >> >> >> >Can you share BucketCache related config parameters in your > >> >> cluster > >> >> >> ? > >> >> >> >> >> > > >> >> >> >> >> >Cheers > >> >> >> >> >> > > >> >> >> >> >> >On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox < > >> [email protected]> > >> >> >> >> wrote: > >> >> >> >> >> > > >> >> >> >> >> >> > >> >> >> >> >> >> "StoreFileCloserThread-L-1" prio=10 > tid=0x00000000027ec800 > >> >> >> >> nid=0xad84 > >> >> >> >> >> >> runnable [0x00007fbcc0c65000] > >> >> >> >> >> >> java.lang.Thread.State: RUNNABLE > >> >> >> >> >> >> at > java.util.LinkedList.indexOf(LinkedList.java:602) > >> >> >> >> >> >> at > >> java.util.LinkedList.contains(LinkedList.java:315) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449) > >> >> >> >> >> >> - locked <0x000000041b0887a8> (a > >> >> >> >> >> >> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:503) > >> >> >> >> >> >> - locked <0x00000004944ff2d8> (a > >> >> >> >> >> >> org.apache.hadoop.hbase.regionserver.StoreFile) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> > org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:873) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> > org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:870) > >> >> >> >> >> >> at > >> >> >> java.util.concurrent.FutureTask.run(FutureTask.java:262) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> > >> >> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > >> >> >> >> >> >> at > >> >> >> java.util.concurrent.FutureTask.run(FutureTask.java:262) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> >> >> >> >> >> at java.lang.Thread.run(Thread.java:745) > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > "StoreCloserThread-Wildfire_graph3,\x00\x04lK\x1B\xFC\x10\xD2,1402949830657.afb6a1720d936a83d73022aeb9ddbb6c.-1" > >> >> >> >> >> >> prio=10 tid=0x0000000003508800 nid=0xad83 waiting on > >> condition > >> >> >> >> >> >> [0x00007fbcc5dcc000] > >> >> >> >> >> >> java.lang.Thread.State: WAITING (parking) > >> >> >> >> >> >> at sun.misc.Unsafe.park(Native Method) > >> >> >> >> >> >> - parking to wait for <0x0000000534e90a80> (a > >> >> >> >> >> >> > >> >> >> >> > >> >> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:883) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:126) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> > >> >> > org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1378) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> > >> >> > org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1375) > >> >> >> >> >> >> at > >> >> >> java.util.concurrent.FutureTask.run(FutureTask.java:262) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> > >> >> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > >> >> >> >> >> >> at > >> >> >> java.util.concurrent.FutureTask.run(FutureTask.java:262) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> >> >> >> >> >> at java.lang.Thread.run(Thread.java:745) > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> "RS_CLOSE_REGION-hb20:60020-0" prio=10 > >> tid=0x00007fcec0142000 > >> >> >> >> nid=0x3056 > >> >> >> >> >> >> waiting on condition [0x00007fbcc2d87000] > >> >> >> >> >> >> java.lang.Thread.State: WAITING (parking) > >> >> >> >> >> >> at sun.misc.Unsafe.park(Native Method) > >> >> >> >> >> >> - parking to wait for <0x0000000534e61360> (a > >> >> >> >> >> >> > >> >> >> >> > >> >> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> > >> >> > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1385) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> > >> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1280) > >> >> >> >> >> >> - locked <0x000000042230fa68> (a > java.lang.Object) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> > >> >> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> >> >> >> >> >> at > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > >> >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> >> >> >> >> >> at java.lang.Thread.run(Thread.java:745) > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> I attached the whole thing as well. > >> >> >> >> >> >> > >> >> >> >> >> >> -r > >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> On 10/15/15, 10:39 AM, "Ted Yu" <[email protected]> > >> wrote: > >> >> >> >> >> >> > >> >> >> >> >> >> >Can you give a bit more detail on why block eviction was > >> cause > >> >> >> for > >> >> >> >> the > >> >> >> >> >> >> slow region movement? > >> >> >> >> >> >> > > >> >> >> >> >> >> >Did you happen to take stack traces ? > >> >> >> >> >> >> > > >> >> >> >> >> >> >Thanks > >> >> >> >> >> >> > > >> >> >> >> >> >> >> On Oct 15, 2015, at 10:32 AM, Randy Fox < > >> [email protected] > >> >> > > >> >> >> >> wrote: > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> Hi, > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> We just upgraded from 0.94 to 1.0.0 and have noticed > that > >> >> >> region > >> >> >> >> >> moves > >> >> >> >> >> >> are super slow (order of minutes) whereas previously they > >> >> where in > >> >> >> >> the > >> >> >> >> >> >> seconds range. After looking at the code, I think the > time > >> is > >> >> >> spent > >> >> >> >> >> >> waiting for the blocks to be evicted from block cache. > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> I wanted to verify that this theory is correct and > see if > >> >> there > >> >> >> >> is > >> >> >> >> >> >> anything that can be done to speed up the moves. > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> This is particular painful as we are trying to get our > >> >> configs > >> >> >> >> tuned > >> >> >> >> >> to > >> >> >> >> >> >> the new SW and need to do rolling restarts which is > taking > >> >> almost > >> >> >> 24 > >> >> >> >> >> hours > >> >> >> >> >> >> on our cluster. We also do our own manual rebalancing of > >> >> regions > >> >> >> >> across > >> >> >> >> >> >> RS’s and that task is also now painful. > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> Thanks, > >> >> >> >> >> >> >> > >> >> >> >> >> >> >> Randy > >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > >> >> > >> >
