Re: Poor HBase map-reduce scan performance

lars hofhansl Mon, 01 Jul 2013 04:00:34 -0700

Absolutely.



----- Original Message -----
From: Ted Yu <[email protected]>
To: [email protected]
Cc: 
Sent: Sunday, June 30, 2013 9:32 PM
Subject: Re: Poor HBase map-reduce scan performance

Looking at the tail of HBASE-8369, there were some comments which are yet
to be addressed.

I think trunk patch should be finalized before backporting.

Cheers

On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller <[email protected]> wrote:

> I'll attach my patch to HBASE-8369 tomorrow.
>
> On Jun 28, 2013, at 10:56 AM, lars hofhansl <[email protected]> wrote:
>
> > If we can make a clean patch with minimal impact to existing code I
> would be supportive of a backport to 0.94.
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Bryan Keller <[email protected]>
> > To: [email protected]; lars hofhansl <[email protected]>
> > Cc:
> > Sent: Tuesday, June 25, 2013 1:56 AM
> > Subject: Re: Poor HBase map-reduce scan performance
> >
> > I tweaked Enis's snapshot input format and backported it to 0.94.6 and
> have snapshot scanning functional on my system. Performance is dramatically
> better, as expected i suppose. I'm seeing about 3.6x faster performance vs
> TableInputFormat. Also, HBase doesn't get bogged down during a scan as the
> regionserver is being bypassed. I'm very excited by this. There are some
> issues with file permissions and library dependencies but nothing that
> can't be worked out.
> >
> > On Jun 5, 2013, at 6:03 PM, lars hofhansl <[email protected]> wrote:
> >
> >> That's exactly the kind of pre-fetching I was investigating a bit ago
> (made a patch, but ran out of time).
> >> This pre-fetching is strictly client only, where the client keeps the
> server busy while it is processing the previous batch, but filling up a 2nd
> buffer.
> >>
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ________________________________
> >> From: Sandy Pratt <[email protected]>
> >> To: "[email protected]" <[email protected]>
> >> Sent: Wednesday, June 5, 2013 10:58 AM
> >> Subject: Re: Poor HBase map-reduce scan performance
> >>
> >>
> >> Yong,
> >>
> >> As a thought experiment, imagine how it impacts the throughput of TCP to
> >> keep the window size at 1.  That means there's only one packet in flight
> >> at a time, and total throughput is a fraction of what it could be.
> >>
> >> That's effectively what happens with RPC.  The server sends a batch,
> then
> >> does nothing while it waits for the client to ask for more.  During that
> >> time, the pipe between them is empty.  Increasing the batch size can
> help
> >> a bit, in essence creating a really huge packet, but the problem
> remains.
> >> There will always be stalls in the pipe.
> >>
> >> What you want is for the window size to be large enough that the pipe is
> >> saturated.  A streaming API accomplishes that by stuffing data down the
> >> network pipe as quickly as possible.
> >>
> >> Sandy
> >>
> >> On 6/5/13 7:55 AM, "yonghu" <[email protected]> wrote:
> >>
> >>> Can anyone explain why client + rpc + server will decrease the
> performance
> >>> of scanning? I mean the Regionserver and Tasktracker are the same node
> >>> when
> >>> you use MapReduce to scan the HBase table. So, in my understanding,
> there
> >>> will be no rpc cost.
> >>>
> >>> Thanks!
> >>>
> >>> Yong
> >>>
> >>>
> >>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt <[email protected]>
> wrote:
> >>>
> >>>> https://issues.apache.org/jira/browse/HBASE-8691
> >>>>
> >>>>
> >>>> On 6/4/13 6:11 PM, "Sandy Pratt" <[email protected]> wrote:
> >>>>
> >>>>> Haven't had a chance to write a JIRA yet, but I thought I'd pop in
> here
> >>>>> with an update in the meantime.
> >>>>>
> >>>>> I tried a number of different approaches to eliminate latency and
> >>>>> "bubbles" in the scan pipeline, and eventually arrived at adding a
> >>>>> streaming scan API to the region server, along with refactoring the
> >>>> scan
> >>>>> interface into an event-drive message receiver interface.  In so
> >>>> doing, I
> >>>>> was able to take scan speed on my cluster from 59,537 records/sec
> with
> >>>> the
> >>>>> classic scanner to 222,703 records per second with my new scan API.
> >>>>> Needless to say, I'm pleased ;)
> >>>>>
> >>>>> More details forthcoming when I get a chance.
> >>>>>
> >>>>> Thanks,
> >>>>> Sandy
> >>>>>
> >>>>> On 5/23/13 3:47 PM, "Ted Yu" <[email protected]> wrote:
> >>>>>
> >>>>>> Thanks for the update, Sandy.
> >>>>>>
> >>>>>> If you can open a JIRA and attach your producer / consumer scanner
> >>>> there,
> >>>>>> that would be great.
> >>>>>>
> >>>>>> On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <[email protected]>
> >>>> wrote:
> >>>>>>
> >>>>>>> I wrote myself a Scanner wrapper that uses a producer/consumer
> >>>> queue to
> >>>>>>> keep the client fed with a full buffer as much as possible.  When
> >>>>>>> scanning
> >>>>>>> my table with scanner caching at 100 records, I see about a 24%
> >>>> uplift
> >>>>>>> in
> >>>>>>> performance (~35k records/sec with the ClientScanner and ~44k
> >>>>>>> records/sec
> >>>>>>> with my P/C scanner).  However, when I set scanner caching to 5000,
> >>>>>>> it's
> >>>>>>> more of a wash compared to the standard ClientScanner: ~53k
> >>>> records/sec
> >>>>>>> with the ClientScanner and ~60k records/sec with the P/C scanner.
> >>>>>>>
> >>>>>>> I'm not sure what to make of those results.  I think next I'll shut
> >>>>>>> down
> >>>>>>> HBase and read the HFiles directly, to see if there's a drop off in
> >>>>>>> performance between reading them directly vs. via the RegionServer.
> >>>>>>>
> >>>>>>> I still think that to really solve this there needs to be sliding
> >>>>>>> window
> >>>>>>> of records in flight between disk and RS, and between RS and
> client.
> >>>>>>> I'm
> >>>>>>> thinking there's probably a single batch of records in flight
> >>>> between
> >>>>>>> RS
> >>>>>>> and client at the moment.
> >>>>>>>
> >>>>>>> Sandy
> >>>>>>>
> >>>>>>> On 5/23/13 8:45 AM, "Bryan Keller" <[email protected]> wrote:
> >>>>>>>
> >>>>>>>> I am considering scanning a snapshot instead of the table. I
> >>>> believe
> >>>>>>> this
> >>>>>>>> is what the ExportSnapshot class does. If I could use the scanning
> >>>>>>> code
> >>>>>>>> from ExportSnapshot then I will be able to scan the HDFS files
> >>>>>>> directly
> >>>>>>>> and bypass the regionservers. This could potentially give me a
> huge
> >>>>>>> boost
> >>>>>>>> in performance for full table scans. However, it doesn't really
> >>>>>>> address
> >>>>>>>> the poor scan performance against a table.
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >
>
>

Re: Poor HBase map-reduce scan performance

Reply via email to