Ted,
On a table with 600K rows, Get'ting 100 rows seems to be faster than the 
FuzzyRowFilter (mask on the whole length of the key). I thought the 
FuzzyRowFilter's  SEEK_NEXT_USING_HINT would help.  All this on the client 
side, I have not changed my CoProcessor to use the FuzzyRowFilter based on the 
client side performance (still doing multiple get inside the coprocessor). 
Also, I am seeing very bad concurrent query performance. Are there any thing 
that would make Coprocessors almost single threaded across multiple invocations 
?
Again, all this after putting in 0.94.10 (for hbase-6870 sake) which seems to 
be very good in bringing up the regions online fast and balanced. Thanks and 
much appreciated.
 
Regards,
- kiru


Kiru Pakkirisamy | webcloudtech.wordpress.com


________________________________
 From: Ted Yu <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Saturday, August 17, 2013 4:19 PM
Subject: Re: Client Get vs Coprocessor scan performance
 

HBASE-6870 targeted whole table scanning for each coprocessorService call
which exhibited itself through:

HTable#coprocessorService -> getStartKeysInRange -> getStartEndKeys ->
getRegionLocations -> MetaScanner.allTableRegions(getConfiguration(),
getTableName(), false)

The cached region locations in HConnectionImplementation would be used.

Cheers


On Sat, Aug 17, 2013 at 2:21 PM, Asaf Mesika <[email protected]> wrote:

> Ted, can you elaborate a little bit why this issue boosts performance?
> I couldn't figure out from the issue comments if they execCoprocessor scans
> the entire .META. table or and entire table, to understand the actual
> improvement.
>
> Thanks!
>
>
>
>
> On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <[email protected]> wrote:
>
> > I think you need HBASE-6870 which went into 0.94.8
> >
> > Upgrading should boost coprocessor performance.
> >
> > Cheers
> >
> > On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <[email protected]
> >
> > wrote:
> >
> > > Ted,
> > > Here is the method signature/protocol
> > > public Map<String, Double> getFooMap<String, Double> input,
> > > int topN) throws IOException;
> > >
> > > There are 31 regions on 4 nodes X 8 CPU.
> > > I am on 0.94.6 (from Hortonworks).
> > > I think it seems to behave like what linwukang says, - it is almost a
> > full table scan in the coprocessor.
> > > Actually, when I set more specific ColumnPrefixFilters performance went
> > down.
> > > I want to do things on the server side because, I dont want to be
> > sending 500K column/values to the client.
> > > I cannot believe a single-threaded client which does some calculations
> > and group-by  beats the coprocessor running in 31 regions.
> > >
> > > Regards,
> > > - kiru
> > >
> > >
> > > Kiru Pakkirisamy | webcloudtech.wordpress.com
> > >
> > >
> > > ________________________________
> > > From: Ted Yu <[email protected]>
> > > To: [email protected]; Kiru Pakkirisamy <[email protected]
> >
> > > Sent: Thursday, August 8, 2013 8:40 PM
> > > Subject: Re: Client Get vs Coprocessor scan performance
> > >
> > >
> > > Can you give us a bit more information ?
> > >
> > > How do you deliver the 55 rowkeys to your endpoint ?
> > > How many regions do you have for this table ?
> > >
> > > What HBase version are you using ?
> > >
> > > Thanks
> > >
> > > On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
> > > <[email protected]>wrote:
> > >
> > >> Hi,
> > >> I am finding an odd behavior with the Coprocessor performance lagging
> a
> > >> client side Get.
> > >> I have a table with 500000 rows. Each have variable # of columns in
> one
> > >> column family (in this case about 600000 columns in total are
> processed)
> > >> When I try to get specific 55 rows, the client side completes in
> > half-the
> > >> time as the coprocessor endpoint.
> > >> I am using  55 RowFilters on the Coprocessor scan side. The rows are
> > >> processed are exactly the same way in both the cases.
> > >> Any pointers on how to debug this scenario ?
> > >>
> > >> Regards,
> > >> - kiru
> > >>
> > >>
> > >> Kiru Pakkirisamy | webcloudtech.wordpress.com
> >
>

Reply via email to