I wrote myself a Scanner wrapper that uses a producer/consumer queue to
keep the client fed with a full buffer as much as possible.  When scanning
my table with scanner caching at 100 records, I see about a 24% uplift in
performance (~35k records/sec with the ClientScanner and ~44k records/sec
with my P/C scanner).  However, when I set scanner caching to 5000, it's
more of a wash compared to the standard ClientScanner: ~53k records/sec
with the ClientScanner and ~60k records/sec with the P/C scanner.

I'm not sure what to make of those results.  I think next I'll shut down
HBase and read the HFiles directly, to see if there's a drop off in
performance between reading them directly vs. via the RegionServer.

I still think that to really solve this there needs to be sliding window
of records in flight between disk and RS, and between RS and client.  I'm
thinking there's probably a single batch of records in flight between RS
and client at the moment.

Sandy

On 5/23/13 8:45 AM, "Bryan Keller" <[email protected]> wrote:

>I am considering scanning a snapshot instead of the table. I believe this
>is what the ExportSnapshot class does. If I could use the scanning code
>from ExportSnapshot then I will be able to scan the HDFS files directly
>and bypass the regionservers. This could potentially give me a huge boost
>in performance for full table scans. However, it doesn't really address
>the poor scan performance against a table.

Reply via email to