That method internally organizes the gets by RS, so it's pretty efficient. I think processes the RS-groups serially in 0.90.x, and I thought I saw a ticket about multi-threaded processing, but you'll have to check the code.
On 7/22/11 9:46 AM, "Nanheng Wu" <[email protected]> wrote: >Hi, > > I have an use case for my data stored in HBase where I need to make >a query for 20K-30K keys at once. I know that the HBase client API >supports get operation with a list of "gets", so a naive >implementation would probably just make one or more batch get calls. >First of all I am wondering if I choose this implementation how should >I choose the batch size? Can I put all the keys in a single batch? >Secondly, is there a better implementation that's more efficient? For >instance I can sort the keys first and split them into groups of a >certain size, for each group do a scan using the first and last key of >the group and filter out retuned rows that are not in the group (kinda >like a merge join). Would the second implementation be faster that the >first? Are there better ways to go about it? I am using HBase 0.20.6. > >Thanks!
