terrible ... org.apache.hadoop.hbase.client.ScannerTimeoutException: 338424ms passed since the last invocation, timeout is currently set to 300000
On Tue, Feb 1, 2011 at 6:45 AM, Wayne <[email protected]> wrote: > The file system buffer cache explains what is going on. The open scanner > reads the first block and the subsequent read goes against the same block > thereby getting out of the file buffer cache. > > Thanks. > > > On Mon, Jan 31, 2011 at 5:22 PM, Ryan Rawson <[email protected]> wrote: > > > Even without block caching, the linux buffer cache is still a factor, > > and your reads still go through them (via the datanode). > > > > When Stack is talking about the StoreScanner, this is a particular > > class inside of HBase that does the job of reading from 1 column > > family. The first time you instantiate it, it will read the first > > block it needs. This is done during the 'openScanner' call, and would > > explain the latency you are seeing in openScanner. > > > > -ryan > > > > On Mon, Jan 31, 2011 at 2:17 PM, Wayne <[email protected]> wrote: > > > I assume BLOCKCACHE => 'false' would turn this off? We have turned off > > cache > > > on all tables. > > > > > > On Mon, Jan 31, 2011 at 4:54 PM, Ryan Rawson <[email protected]> > wrote: > > > > > >> The Regionserver caches blocks, so a second read would benefit from > > >> the caching of the first read. Over time blocks get evicted in a LRU > > >> manner, and things would get slow again. > > >> > > >> Does this make sense to you? > > >> > > >> On Mon, Jan 31, 2011 at 1:50 PM, Wayne <[email protected]> wrote: > > >> > We have heavy writes always going on so there is always memory > > pressure. > > >> > > > >> > If the open scanner reads the first block maybe that explains the > 8ms > > the > > >> > second time a test is run, but why is the first run averaging 35ms > to > > >> open > > >> > and when the same read requests are sent again the open is only 8ms? > > >> There > > >> > is a difference between read #1 and read #2 that I can only explain > by > > >> > region location search. Our writes our so heavy I assume this region > > >> > location information flushed always in 30-60 minutes. > > >> > > > >> > > > >> > On Mon, Jan 31, 2011 at 4:44 PM, Ryan Rawson <[email protected]> > > wrote: > > >> > > > >> >> Hey, > > >> >> > > >> >> The region location cache is held by a soft reference, so as long > as > > >> >> you dont have memory pressure, it will never get invalidated just > > >> >> because of time. > > >> >> > > >> >> Another thing to consider, in HBase, the open scanner code also > seeks > > >> >> and reads the first block of the scan. This may incur a read to > disk > > >> >> and might explain the hot vs cold you are seeing below. > > >> >> > > >> >> -ryan > > >> >> > > >> >> On Mon, Jan 31, 2011 at 1:38 PM, Wayne <[email protected]> wrote: > > >> >> > After doing many tests (10k serialized scans) we see that on > > average > > >> >> opening > > >> >> > the scanner takes 2/3 of the read time if the read is fresh > > >> >> > (scannerOpenWithStop=~35ms, scannerGetList=~10ms). The second > time > > >> around > > >> >> (1 > > >> >> > minute later) we assume the region cache is "hot" and the open > > scanner > > >> is > > >> >> > much faster (scannerOpenWithStop=~8ms, scannerGetList=~10ms). > After > > >> 1-2 > > >> >> > hours the cache is no longer "hot" and we are back to the initial > > >> >> numbers. > > >> >> > We assume this is due to finding where the data is located in the > > >> >> cluster. > > >> >> > We have cache turned off on our tables, but have 2% cache for > hbase > > >> and > > >> >> the > > >> >> > .META. table region server is showing 98% hit rate (.META. is > > served > > >> out > > >> >> of > > >> >> > cache). How can we pre-warm the cache to speed up our reads? It > > does > > >> not > > >> >> > seem correct that 2/3 of our read time is always finding where > the > > >> data > > >> >> is > > >> >> > located. We have played with the prefetch.limit with various > > different > > >> >> > settings without much difference. How can we warm up the cache? > Per > > >> the > > >> >> > #2468 wording we need "Clients could prewarm cache by doing a > large > > >> scan > > >> >> of > > >> >> > all the meta for the table instead of random reads for each > miss". > > We > > >> >> > definitely do not want to pay this price on each read, but would > > like > > >> to > > >> >> > maybe set up a cron job to update once an hour for the tables > this > > is > > >> >> needed > > >> >> > for. It would be great to have a way to pin the region locations > to > > >> >> memory > > >> >> > or at least a method to heat it up before a big read process gets > > >> kicked > > >> >> > off. A read's latency for our type of usage pattern should be > based > > >> >> > primarily on disk i/o latency and not looking around for where > the > > >> data > > >> >> is > > >> >> > located in the cluster. Adding SSD disks wouldn't help us much at > > all > > >> to > > >> >> > lower read latency given what we are seeing. > > >> >> > > > >> >> > Any help or suggestions would be greatly appreciated. > > >> >> > > > >> >> > Thanks. > > >> >> > > > >> >> > > >> > > > >> > > > > > > -- Thanks & Best regards jiajun
