Thanks for the update, Sandy. If you can open a JIRA and attach your producer / consumer scanner there, that would be great.
On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <[email protected]> wrote: > I wrote myself a Scanner wrapper that uses a producer/consumer queue to > keep the client fed with a full buffer as much as possible. When scanning > my table with scanner caching at 100 records, I see about a 24% uplift in > performance (~35k records/sec with the ClientScanner and ~44k records/sec > with my P/C scanner). However, when I set scanner caching to 5000, it's > more of a wash compared to the standard ClientScanner: ~53k records/sec > with the ClientScanner and ~60k records/sec with the P/C scanner. > > I'm not sure what to make of those results. I think next I'll shut down > HBase and read the HFiles directly, to see if there's a drop off in > performance between reading them directly vs. via the RegionServer. > > I still think that to really solve this there needs to be sliding window > of records in flight between disk and RS, and between RS and client. I'm > thinking there's probably a single batch of records in flight between RS > and client at the moment. > > Sandy > > On 5/23/13 8:45 AM, "Bryan Keller" <[email protected]> wrote: > > >I am considering scanning a snapshot instead of the table. I believe this > >is what the ExportSnapshot class does. If I could use the scanning code > >from ExportSnapshot then I will be able to scan the HDFS files directly > >and bypass the regionservers. This could potentially give me a huge boost > >in performance for full table scans. However, it doesn't really address > >the poor scan performance against a table. > >
