Would be interesting to compare against Phoenix's Skip Scan
(http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix.html)
which does a scan through a coprocessor and is more than 2x faster
than multi Get (plus handles multi-range scans in addition to point
gets).

James

On Aug 18, 2013, at 6:39 AM, Ted Yu <[email protected]> wrote:

> bq. Get'ting 100 rows seems to be faster than the FuzzyRowFilter (mask on
> the whole length of the key)
>
> In this case the Get's are very selective. The number of rows FuzzyRowFilter
> was evaluated against would be much higher.
> It would be nice if you remember the time each took.
>
> bq. Also, I am seeing very bad concurrent query performance
>
> Were the multi Get's performed by your coprocessor within region boundary
> of the respective coprocessor ? Just to confirm.
>
> bq. that would make Coprocessors almost single threaded across multiple
> invocations ?
>
> Let me dig into code some more.
>
> Cheers
>
>
> On Sat, Aug 17, 2013 at 10:34 PM, Kiru Pakkirisamy <
> [email protected]> wrote:
>
>> Ted,
>> On a table with 600K rows, Get'ting 100 rows seems to be faster than the
>> FuzzyRowFilter (mask on the whole length of the key). I thought the
>> FuzzyRowFilter's  SEEK_NEXT_USING_HINT would help.  All this on the client
>> side, I have not changed my CoProcessor to use the FuzzyRowFilter based on
>> the client side performance (still doing multiple get inside the
>> coprocessor). Also, I am seeing very bad concurrent query performance. Are
>> there any thing that would make Coprocessors almost single threaded across
>> multiple invocations ?
>> Again, all this after putting in 0.94.10 (for hbase-6870 sake) which seems
>> to be very good in bringing up the regions online fast and balanced. Thanks
>> and much appreciated.
>>
>> Regards,
>> - kiru
>>
>>
>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>
>>
>> ________________________________
>> From: Ted Yu <[email protected]>
>> To: "[email protected]" <[email protected]>
>> Sent: Saturday, August 17, 2013 4:19 PM
>> Subject: Re: Client Get vs Coprocessor scan performance
>>
>>
>> HBASE-6870 targeted whole table scanning for each coprocessorService call
>> which exhibited itself through:
>>
>> HTable#coprocessorService -> getStartKeysInRange -> getStartEndKeys ->
>> getRegionLocations -> MetaScanner.allTableRegions(getConfiguration(),
>> getTableName(), false)
>>
>> The cached region locations in HConnectionImplementation would be used.
>>
>> Cheers
>>
>>
>> On Sat, Aug 17, 2013 at 2:21 PM, Asaf Mesika <[email protected]>
>> wrote:
>>
>>> Ted, can you elaborate a little bit why this issue boosts performance?
>>> I couldn't figure out from the issue comments if they execCoprocessor
>> scans
>>> the entire .META. table or and entire table, to understand the actual
>>> improvement.
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>> On Fri, Aug 9, 2013 at 8:44 AM, Ted Yu <[email protected]> wrote:
>>>
>>>> I think you need HBASE-6870 which went into 0.94.8
>>>>
>>>> Upgrading should boost coprocessor performance.
>>>>
>>>> Cheers
>>>>
>>>> On Aug 8, 2013, at 10:21 PM, Kiru Pakkirisamy <
>> [email protected]
>>>>
>>>> wrote:
>>>>
>>>>> Ted,
>>>>> Here is the method signature/protocol
>>>>> public Map<String, Double> getFooMap<String, Double> input,
>>>>> int topN) throws IOException;
>>>>>
>>>>> There are 31 regions on 4 nodes X 8 CPU.
>>>>> I am on 0.94.6 (from Hortonworks).
>>>>> I think it seems to behave like what linwukang says, - it is almost a
>>>> full table scan in the coprocessor.
>>>>> Actually, when I set more specific ColumnPrefixFilters performance
>> went
>>>> down.
>>>>> I want to do things on the server side because, I dont want to be
>>>> sending 500K column/values to the client.
>>>>> I cannot believe a single-threaded client which does some
>> calculations
>>>> and group-by  beats the coprocessor running in 31 regions.
>>>>>
>>>>> Regards,
>>>>> - kiru
>>>>>
>>>>>
>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Ted Yu <[email protected]>
>>>>> To: [email protected]; Kiru Pakkirisamy <
>> [email protected]
>>>>
>>>>> Sent: Thursday, August 8, 2013 8:40 PM
>>>>> Subject: Re: Client Get vs Coprocessor scan performance
>>>>>
>>>>>
>>>>> Can you give us a bit more information ?
>>>>>
>>>>> How do you deliver the 55 rowkeys to your endpoint ?
>>>>> How many regions do you have for this table ?
>>>>>
>>>>> What HBase version are you using ?
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Thu, Aug 8, 2013 at 6:43 PM, Kiru Pakkirisamy
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Hi,
>>>>>> I am finding an odd behavior with the Coprocessor performance
>> lagging
>>> a
>>>>>> client side Get.
>>>>>> I have a table with 500000 rows. Each have variable # of columns in
>>> one
>>>>>> column family (in this case about 600000 columns in total are
>>> processed)
>>>>>> When I try to get specific 55 rows, the client side completes in
>>>> half-the
>>>>>> time as the coprocessor endpoint.
>>>>>> I am using  55 RowFilters on the Coprocessor scan side. The rows are
>>>>>> processed are exactly the same way in both the cases.
>>>>>> Any pointers on how to debug this scenario ?
>>>>>>
>>>>>> Regards,
>>>>>> - kiru
>>>>>>
>>>>>>
>>>>>> Kiru Pakkirisamy | webcloudtech.wordpress.com
>>>>
>>>
>>

Reply via email to