That makes sense. So is there a limit on how large the batch size can be? Or, say if I pass all of my queries in one batch of size 10K, would that cause problems?
On Fri, Jul 22, 2011 at 7:47 AM, Doug Meil <[email protected]> wrote: > > > That method internally organizes the gets by RS, so it's pretty efficient. > I think processes the RS-groups serially in 0.90.x, and I thought I saw a > ticket about multi-threaded processing, but you'll have to check the code. > > > > > On 7/22/11 9:46 AM, "Nanheng Wu" <[email protected]> wrote: > >>Hi, >> >> I have an use case for my data stored in HBase where I need to make >>a query for 20K-30K keys at once. I know that the HBase client API >>supports get operation with a list of "gets", so a naive >>implementation would probably just make one or more batch get calls. >>First of all I am wondering if I choose this implementation how should >>I choose the batch size? Can I put all the keys in a single batch? >>Secondly, is there a better implementation that's more efficient? For >>instance I can sort the keys first and split them into groups of a >>certain size, for each group do a scan using the first and last key of >>the group and filter out retuned rows that are not in the group (kinda >>like a merge join). Would the second implementation be faster that the >>first? Are there better ways to go about it? I am using HBase 0.20.6. >> >>Thanks! > >
