Hi,

  I have an use case for my data stored in HBase where I need to make
a query for 20K-30K keys at once. I know that the HBase client API
supports get operation with a list of "gets", so a naive
implementation would probably just make one or more batch get calls.
First of all I am wondering if I choose this implementation how should
I choose the batch size? Can I put all the keys in a single batch?
Secondly, is there a better implementation that's more efficient? For
instance I can sort the keys first and split them into groups of a
certain size, for each group do a scan using the first and last key of
the group and filter out retuned rows that are not in the group (kinda
like a merge join). Would the second implementation be faster that the
first? Are there better ways to go about it? I am using HBase 0.20.6.

Thanks!

Reply via email to