Thanks Geoff! No apology required, that's good stuff. I'll update the book with that param.
On 1/25/12 2:17 PM, "Geoff Hendrey" <[email protected]> wrote: >Sorry for jumping in late, and perhaps out of context, but I'm pasting >in some findings (reported to this list by us a while back) that helped >us to get scans to perform very fast. Adjusting >hbase.client.prefetch.limit was critical for us.: >======================== >It's even more mysterious than we think. There is lack of documentation >(or perhaps lack of know how). Apparently there are 2 factors that >decide the performance of scan. > >1. Scanner cache as we know - We always had scanner caching set to >1, but this is different than pre fetch limit >2. hbase.client.prefetch.limit - This is meta caching limit >defaults to 10 to prefetch 10 region locations every time we scan that >is not already been pre-warmed > >the "hbase.client.prefetch.limit" is passed along to the client code to >prefetch the next 10 region locations. > >int rows = Math.min(rowLimit, >configuration.getInt("hbase.meta.scanner.caching", 100)); > >the "row" variable mins to 10 and always prefetch atmost 10 region >boundaries. Hence every new region boundary that is not already been >pre-warmed fetch the next 10 region locations resulting in 1st slow >query followed by quick responses. This is basically pre-warming the >meta not region cache. > >-----Original Message----- >From: Jeff Whiting [mailto:[email protected]] >Sent: Wednesday, January 25, 2012 10:09 AM >To: [email protected] >Subject: Re: Speeding up Scans > >Does it make sense to have better defaults so the performance out of the >box is better? > >~Jeff > >On 1/25/2012 8:06 AM, Peter Wolf wrote: >> Ah ha! I appear to be insane ;-) >> >> Adding the following speeded things up quite a bit >> >> scan.setCacheBlocks(true); >> scan.setCaching(1000); >> >> Thank you, it was a duh! >> >> P >> >> >> >> On 1/25/12 8:13 AM, Doug Meil wrote: >>> Hi there- >>> >>> Quick sanity check: what caching level are you using? (default is >1) I >>> know this is basic, but it's always good to double-check. >>> >>> If "language" is already in the lead position of the rowkey, why use >the >>> filter? >>> >>> As for EC2, that's a wildcard. >>> >>> >>> >>> >>> >>> On 1/25/12 7:56 AM, "Peter Wolf"<[email protected]> wrote: >>> >>>> Hello all, >>>> >>>> I am looking for advice on speeding up my Scanning. >>>> >>>> I want to iterate over all rows where a particular column (language) >>>> equals a particular value ("JA"). >>>> >>>> I am already creating my row keys using that column in the first >bytes. >>>> And I do my scans using partial row matching, like this... >>>> >>>> public static byte[] calculateStartRowKey(String language) { >>>> int languageHash = language.length()> 0 ? >language.hashCode() : >>>> 0; >>>> byte[] language2 = Bytes.toBytes(languageHash); >>>> byte[] accountID2 = Bytes.toBytes(0); >>>> byte[] timestamp2 = Bytes.toBytes(0); >>>> return Bytes.add(Bytes.add(language2, accountID2), >timestamp2); >>>> } >>>> >>>> public static byte[] calculateEndRowKey(String language) { >>>> int languageHash = language.length()> 0 ? >language.hashCode() : >>>> 0; >>>> byte[] language2 = Bytes.toBytes(languageHash + 1); >>>> byte[] accountID2 = Bytes.toBytes(0); >>>> byte[] timestamp2 = Bytes.toBytes(0); >>>> return Bytes.add(Bytes.add(language2, accountID2), >timestamp2); >>>> } >>>> >>>> Scan scan = new Scan(calculateStartRowKey(language), >>>> calculateEndRowKey(language)); >>>> >>>> >>>> Since I am using a hash value for the string, I need to re-check the >>>> column to make sure that some other string does not get the same >hash >>>> value >>>> >>>> Filter filter = new SingleColumnValueFilter(resultFamily, >>>> languageCol, CompareFilter.CompareOp.EQUAL, >Bytes.toBytes(language)); >>>> scan.setFilter(filter); >>>> >>>> I am using the Cloudera 0.09.4 release, and a cluster of 3 machines >on >>>> EC2. >>>> >>>> I think that this should be really fast, but it is not. Any advice >on >>>> how to debug/speed it up? >>>> >>>> Thanks >>>> Peter >>>> >>>> >>>> >>>> >>>> >>> >> > >-- >Jeff Whiting >Qualtrics Senior Software Engineer >[email protected] > >
