Re: How to implement efficient bulk query

Doug Meil Fri, 22 Jul 2011 09:30:16 -0700

No hard-coded limit that I'm aware of, but you should check the code in
HTable.


The point of multi-get was to minimize client-RS RPC calls, so in general
the batching is a good thing.  Like anything, there are practical limits.
Your milage/performance may vary.






On 7/22/11 11:38 AM, "Nanheng Wu" <[email protected]> wrote:

>That makes sense. So is there a limit on how large the batch size can
>be? Or, say if I pass all of my queries in one batch of size 10K,
>would that cause problems?
>
>On Fri, Jul 22, 2011 at 7:47 AM, Doug Meil
><[email protected]> wrote:
>>
>>
>> That method internally organizes the gets by RS, so it's pretty
>>efficient.
>>  I think processes the RS-groups serially in 0.90.x, and I thought I
>>saw a
>> ticket about multi-threaded processing, but you'll have to check the
>>code.
>>
>>
>>
>>
>> On 7/22/11 9:46 AM, "Nanheng Wu" <[email protected]> wrote:
>>
>>>Hi,
>>>
>>>  I have an use case for my data stored in HBase where I need to make
>>>a query for 20K-30K keys at once. I know that the HBase client API
>>>supports get operation with a list of "gets", so a naive
>>>implementation would probably just make one or more batch get calls.
>>>First of all I am wondering if I choose this implementation how should
>>>I choose the batch size? Can I put all the keys in a single batch?
>>>Secondly, is there a better implementation that's more efficient? For
>>>instance I can sort the keys first and split them into groups of a
>>>certain size, for each group do a scan using the first and last key of
>>>the group and filter out retuned rows that are not in the group (kinda
>>>like a merge join). Would the second implementation be faster that the
>>>first? Are there better ways to go about it? I am using HBase 0.20.6.
>>>
>>>Thanks!
>>
>>

Re: How to implement efficient bulk query

Reply via email to