Thanks for the update, Sandy.

If you can open a JIRA and attach your producer / consumer scanner there,
that would be great.

On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <[email protected]> wrote:

> I wrote myself a Scanner wrapper that uses a producer/consumer queue to
> keep the client fed with a full buffer as much as possible.  When scanning
> my table with scanner caching at 100 records, I see about a 24% uplift in
> performance (~35k records/sec with the ClientScanner and ~44k records/sec
> with my P/C scanner).  However, when I set scanner caching to 5000, it's
> more of a wash compared to the standard ClientScanner: ~53k records/sec
> with the ClientScanner and ~60k records/sec with the P/C scanner.
>
> I'm not sure what to make of those results.  I think next I'll shut down
> HBase and read the HFiles directly, to see if there's a drop off in
> performance between reading them directly vs. via the RegionServer.
>
> I still think that to really solve this there needs to be sliding window
> of records in flight between disk and RS, and between RS and client.  I'm
> thinking there's probably a single batch of records in flight between RS
> and client at the moment.
>
> Sandy
>
> On 5/23/13 8:45 AM, "Bryan Keller" <[email protected]> wrote:
>
> >I am considering scanning a snapshot instead of the table. I believe this
> >is what the ExportSnapshot class does. If I could use the scanning code
> >from ExportSnapshot then I will be able to scan the HDFS files directly
> >and bypass the regionservers. This could potentially give me a huge boost
> >in performance for full table scans. However, it doesn't really address
> >the poor scan performance against a table.
>
>

Reply via email to