I assume BLOCKCACHE => 'false' would turn this off? We have turned off cache on all tables.
On Mon, Jan 31, 2011 at 4:54 PM, Ryan Rawson <ryano...@gmail.com> wrote: > The Regionserver caches blocks, so a second read would benefit from > the caching of the first read. Over time blocks get evicted in a LRU > manner, and things would get slow again. > > Does this make sense to you? > > On Mon, Jan 31, 2011 at 1:50 PM, Wayne <wav...@gmail.com> wrote: > > We have heavy writes always going on so there is always memory pressure. > > > > If the open scanner reads the first block maybe that explains the 8ms the > > second time a test is run, but why is the first run averaging 35ms to > open > > and when the same read requests are sent again the open is only 8ms? > There > > is a difference between read #1 and read #2 that I can only explain by > > region location search. Our writes our so heavy I assume this region > > location information flushed always in 30-60 minutes. > > > > > > On Mon, Jan 31, 2011 at 4:44 PM, Ryan Rawson <ryano...@gmail.com> wrote: > > > >> Hey, > >> > >> The region location cache is held by a soft reference, so as long as > >> you dont have memory pressure, it will never get invalidated just > >> because of time. > >> > >> Another thing to consider, in HBase, the open scanner code also seeks > >> and reads the first block of the scan. This may incur a read to disk > >> and might explain the hot vs cold you are seeing below. > >> > >> -ryan > >> > >> On Mon, Jan 31, 2011 at 1:38 PM, Wayne <wav...@gmail.com> wrote: > >> > After doing many tests (10k serialized scans) we see that on average > >> opening > >> > the scanner takes 2/3 of the read time if the read is fresh > >> > (scannerOpenWithStop=~35ms, scannerGetList=~10ms). The second time > around > >> (1 > >> > minute later) we assume the region cache is "hot" and the open scanner > is > >> > much faster (scannerOpenWithStop=~8ms, scannerGetList=~10ms). After > 1-2 > >> > hours the cache is no longer "hot" and we are back to the initial > >> numbers. > >> > We assume this is due to finding where the data is located in the > >> cluster. > >> > We have cache turned off on our tables, but have 2% cache for hbase > and > >> the > >> > .META. table region server is showing 98% hit rate (.META. is served > out > >> of > >> > cache). How can we pre-warm the cache to speed up our reads? It does > not > >> > seem correct that 2/3 of our read time is always finding where the > data > >> is > >> > located. We have played with the prefetch.limit with various different > >> > settings without much difference. How can we warm up the cache? Per > the > >> > #2468 wording we need "Clients could prewarm cache by doing a large > scan > >> of > >> > all the meta for the table instead of random reads for each miss". We > >> > definitely do not want to pay this price on each read, but would like > to > >> > maybe set up a cron job to update once an hour for the tables this is > >> needed > >> > for. It would be great to have a way to pin the region locations to > >> memory > >> > or at least a method to heat it up before a big read process gets > kicked > >> > off. A read's latency for our type of usage pattern should be based > >> > primarily on disk i/o latency and not looking around for where the > data > >> is > >> > located in the cluster. Adding SSD disks wouldn't help us much at all > to > >> > lower read latency given what we are seeing. > >> > > >> > Any help or suggestions would be greatly appreciated. > >> > > >> > Thanks. > >> > > >> > > >