Maybe I spoke too soon. HBASE-6870 fixes the table scan (as verified by metrics 
of read requests on the region).
But the performance with RowFilter is very bad (actually worse than a full 
table scan, dont know how this can happen).API 
I hope my API usage is right. All I am doing is add RowFilters to FilterList 
and setFilter on the scan.
I tried looking into the AggregateImplementation  (which is mentioned as unit 
test for this bug)  but did not follow through because I am in a rush for a 
good workaround.
I have now replaced RowFilters with a Get on the Region (in a loop) after 
making sure my key is within startKey and endKey of the region.
I think this is getting my data right. Performance is very good, almost half 
that of the full scan code we had in the coprocessor earlier.
Are there any gotchas/bad side-effects to using a Get on the Region ?
 
Regards,
- kiru


Kiru Pakkirisamy | webcloudtech.wordpress.com


________________________________
 From: Kiru Pakkirisamy <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Friday, August 9, 2013 1:04 PM
Subject: Re: Client Get vs Coprocessor scan performance
 

I think this fixes my issues. On our dev cluster what used to take 1200 msec is 
now in the 700-800 msec region. Thanks again.
I will be soon deploying this to our Performance cluster where our query is at 
15 secs range.
 
Regards,
- kiru


Kiru Pakkirisamy | webcloudtech.wordpress.com


________________________________
From: Ted Yu <[email protected]>
To: "[email protected]" <[email protected]> 
Cc: "[email protected]" <[email protected]> 
Sent: Thursday, August 8, 2013 10:44 PM
Subject: Re: Client Get vs Coprocessor scan performance


I think you need HBASE-6870 which went into 0.94.8

Upgrading should boost coprocessor performance. 

Cheers

On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <[email protected]> wrote:

> Ted,
> Here is the method signature/protocol
> public Map<String, Double> getFooMap<String, Double> input,
> int topN) throws IOException;
> 
> There are 31 regions on 4 nodes X 8 CPU.
> I am on 0.94.6 (from Hortonworks).
> I think it seems to behave like what linwukang says, - it is almost a full 
> table scan in the coprocessor. 
> Actually, when I set more specific ColumnPrefixFilters performance went down.
> I want to do things on the server side because, I dont want to be sending 
> 500K column/values to the client.
> I cannot believe a single-threaded client which does some calculations and 
> group-by  beats the coprocessor running in 31 regions.
>  
> Regards,
> - kiru
> 
> 
> Kiru Pakkirisamy | webcloudtech.wordpress.com
> 
> 
> ________________________________
> From: Ted Yu <[email protected]>
> To: [email protected]; Kiru Pakkirisamy <[email protected]> 
> Sent: Thursday, August 8, 2013 8:40 PM
> Subject: Re: Client Get vs Coprocessor scan performance
> 
> 
> Can you give us a bit more information ?
> 
> How do you deliver the 55 rowkeys to your endpoint ?
> How many regions do you have for this table ?
> 
> What HBase version are you using ?
> 
> Thanks
> 
> On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
> <[email protected]>wrote:
> 
>> Hi,
>> I am finding an odd behavior with the Coprocessor performance lagging a
>> client side Get.
>> I have a table with 500000 rows. Each have variable # of columns in one
>> column family (in this case about 600000 columns in total are processed)
>> When I try to get specific 55 rows, the client side completes in half-the
>> time as the coprocessor endpoint.
>> I am using  55 RowFilters on the Coprocessor scan side. The rows are
>> processed are exactly the same way in both the cases.
>> Any pointers on how to debug this scenario ?
>> 
>> Regards,
>> - kiru
>> 
>> 
>> Kiru Pakkirisamy | webcloudtech.wordpress.com

Reply via email to