Re: Scan performance in version 0.20.3

Stack Mon, 06 Dec 2010 10:10:53 -0800

On Mon, Dec 6, 2010 at 5:44 AM, Lior Schachter <[email protected]> wrote:
> Hi all,
>
> I would like to speed up my scans and noticed these two methods on
> org.apache.hadoop.hbase.client.Scan:
> 1. setCacheBlocks


This is whether we should add blocks to the server-side block cache as
we scan (Follow it in the code and you'll see how this flag makes it
all the ways down into the reader we use pulling from our files in
HDFS).

> 2. setCaching
>

I presume you mean HTable#setScannerCaching?  If so, its like it says
in the javadoc,
https://hudson.apache.org/hudson/view/G-L/view/HBase/job/hbase-0.90/ws/trunk/target/site/apidocs/index.html,
its how many rows to fetch per RPC.

> Can you please specify how these parameters should be configured and how
> they relate to each other.
>

Leave the former alone.  Play with the latter.  The larger you can set
it, the more improvement you will see (because IIRC, the default is to
do an RPC for each row), but don't set  it so high you pull too much
per invocation and put pressure on server-side or even client-side
heaps.  You have an idea on the size of your rows so you should have
notion of what to set it too.  Start with a low value.  Even a small
change should make a difference.

St.Ack

> Thanks,
> Lior
>

Re: Scan performance in version 0.20.3

Reply via email to