RE: How to efficiently join HBase tables?

Buttler, David Thu, 16 Jun 2011 17:04:09 -0700

Depends on a couple of things.  If your LIST is a permanent feature of your 
document, then it might make sense to add the list(Boolean? Or the list index 
if the list has a particular sort order) to the doc record.  Otherwise, a 
little simple programming can get you the results you want:
1) Sort the list (if it is big, then a map reduce job with an identity map / 
single identity reducer would do the job).  If you require the order of the 
list to be maintained then you need to add another field to the list indicating 
order, so that you can recover that after the join.
2) output a list of DOCID / UUID sorted on DOCID
3) use a double iterator through your two outputs to find the UUIDs from the 
list (and optionally its order in the list)
4) optionally resort the UUID list by the list order index


This will not be particularly fast, but it should be robust to large list sizes.

If your list can fit into the memory of a map task, then put it in a hash map 
for each Map job, and while you iterate over your docs table, you can only 
output UUIDs and sort order, and let your reducer reorder them according to 
your list order.

Dave

-----Original Message-----
From: Florin P [mailto:[email protected]] 
Sent: Thursday, June 16, 2011 5:44 AM
To: [email protected]
Subject: Re: How to efficiently join HBase tables?

 Hello!
   Regarding the same subject of joining, I have the following scenario:
1. I have a big table DOCS that contains the columns
      UUID DOCID
      sdsd  1
      hdhs  3
      gdhg  7
      shdg  9
   
and so on (hope you got the idea)
2. an external list of docID 
(LIST)
   3
   1
   7 

 upon a I have to query("join") the DOCS DOCID column, so that the result should
be   hdhs, sdsd, gdhg. How I can implement such a request? Can be this a
possible solution:
1. to add a new column LIST (in the same column family ) to the DOCS 
2 add a new record in it that contain my LIST of docID
3. "Join" column LIST with DOCID column? ( perhaps a weird idea)

Thank you.
 Regards,
 Florin

RE: How to efficiently join HBase tables?

Reply via email to