On Mon, Dec 6, 2010 at 5:44 AM, Lior Schachter <[email protected]> wrote: > Hi all, > > I would like to speed up my scans and noticed these two methods on > org.apache.hadoop.hbase.client.Scan: > 1. setCacheBlocks
This is whether we should add blocks to the server-side block cache as we scan (Follow it in the code and you'll see how this flag makes it all the ways down into the reader we use pulling from our files in HDFS). > 2. setCaching > I presume you mean HTable#setScannerCaching? If so, its like it says in the javadoc, https://hudson.apache.org/hudson/view/G-L/view/HBase/job/hbase-0.90/ws/trunk/target/site/apidocs/index.html, its how many rows to fetch per RPC. > Can you please specify how these parameters should be configured and how > they relate to each other. > Leave the former alone. Play with the latter. The larger you can set it, the more improvement you will see (because IIRC, the default is to do an RPC for each row), but don't set it so high you pull too much per invocation and put pressure on server-side or even client-side heaps. You have an idea on the size of your rows so you should have notion of what to set it too. Start with a low value. Even a small change should make a difference. St.Ack > Thanks, > Lior >
