Re: endpoint coprocessor performance

Andrew Purtell Thu, 07 Mar 2013 17:13:04 -0800

Thanks for reporting back!


On Fri, Mar 8, 2013 at 9:02 AM, Kim Hamilton <[email protected]> wrote:

> I profiled it and getStartKeysInRange is taking all the time. Recall I'm
> running 0.92.1. I think these factors are consistent with
> https://issues.apache.org/jira/browse/HBASE-5492, which was fixed in
> 0.92.3.
>
> We'll be upgrading soon, so I'll be able to verify the perf issue is gone.
>
> Thanks for the help everyone!
>
>
> On Tue, Mar 5, 2013 at 8:54 PM, Kimdhamilton <[email protected]>
> wrote:
>
> > Yes, definitely. I'm following up tomorrow with more testing and will
> > report back. I'm definitely seeing significant load on .META. but want to
> > see what I can determine about the root cause
> >
> >
> > Sent from my Samsung smartphone on AT&T
> >
> >
> > -------- Original message --------
> > Subject: RE: endpoint coprocessor performance
> > From: Anoop Sam John <[email protected]>
> > To: "[email protected]" <[email protected]>
> > CC:
> >
> >
> > Yes agree with Andrew here... I checked the 94 code base yday.  I also
> > feel that the efficiency should be on the higher side.. And there is no
> > whole table scan. The HBase client issues scan for only those regions
> which
> > come under the start/stop keys that app specified. Yes it is contacting
> > .META. to know the regions coming within the start/stop rows. But that
> > should not be a big efficiency issue IMHO also.
> >
> > @Kim - Can you do some profiling and let us know which area of code is
> > eating up time in your case?
> >
> > HBASE-6877 also I am seeing.
> >
> > -Anoop-
> > ________________________________________
> > From: Andrew Purtell [[email protected]]
> > Sent: Wednesday, March 06, 2013 7:28 AM
> > To: [email protected]
> > Subject: Re: endpoint coprocessor performance
> >
> > > In current logic, HTable#coprocessorExec always scan the whole table,
> its
> > efficiency is low
> >
> > No, I don't think that is correct.
> >
> > In its current logic, coprocessorExec always scans the META table for all
> > regions of the target table, to find the up to date locations, and then
> > dispatches the exec in parallel to all regions of the target table. The
> > efficiency of the exec is actually high because invocations happen in
> > parallel across the cluster, with results reassembled back at the client
> as
> > they come in.
> >
> > The increased setup latency relative to a Scan and the load on META is
> > because of the initial scan on META to find the up to date locations of
> all
> > regions of the target table. For a Scan, the cached locations of regions
> > are used, and relocations are handled transparently by the client. Exec
> > could be updated to do this as well.
> >
> >
> >
> >
> > On Wed, Mar 6, 2013 at 5:13 AM, Kim Hamilton <[email protected]>
> > wrote:
> >
> > > Thanks so much! This describes exactly what I'm seeing. I did notice
> > > extremely heavy load on the region server carrying .META., as described
> > in
> > > HBASE-6870:
> > >
> > > In current logic, HTable#coprocessorExec always scan the whole table,
> > > its efficiency
> > > is low and will affect the Regionserver carrying .META. under large
> > > coprocessorExec requests
> > >
> > >
> > > Thanks again,
> > > Kim
> > > On Mon, Mar 4, 2013 at 8:08 PM, Stephen Boesch <[email protected]>
> > wrote:
> > >
> > > > great question from Kim and follow-up/answers.
> > > >
> > > >
> > > > 2013/3/4 Gary Helmling <[email protected]>
> > > >
> > > > > I see this is HBASE-6870.  I thought that sounded familiar.
> > > > >
> > > > >
> > > > > On Mon, Mar 4, 2013 at 6:23 PM, Gary Helmling <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > >
> > > > > > Check your logs for whether your end-point coprocessor is hitting
> > > > > >> zookeeper on every invocation to figure out the region start
> key.
> > > > > >> Unfortunately (at least last time I checked), the default way of
> > > > > invoking
> > > > > >> an end point coprocessor doesn't use the meta cache. You can go
> > > > through
> > > > > a
> > > > > >> combination of the following instead:
> > > > > >>     HRegionLocation regionLocation = retried ?
> > > > > >>         connection.relocateRegion(**tableName, tableKey) :
> > > > > >>         connection.locateRegion(**tableName, tableKey);
> > > > > >>     ...
> > > > > >> Then call HConnection.processExecs call, passing in the
> regionKeys
> > > > from
> > > > > >> above.
> > > > > >> You can trap the error case of the region being relocated and
> try
> > > > again
> > > > > >> with retried = true and it'll update the meta data cache when
> > > > > >> relocateRegion is called.
> > > > > >>
> > > > > >
> > > > > >
> > > > > > Any idea if we have an improvement logged in JIRA for this?  This
> > is
> > > > > > definitely something we should improve on.
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: endpoint coprocessor performance

Reply via email to