multiple tables join with only one hug table.

Daniel,Wu Thu, 11 Aug 2011 19:02:27 -0700

if the retailer fact table is sale_fact with 10B rows, and join with 3 small 
tables: stores (10K), products(10K), period (1K). What's the best join solution?


In oracle, it can first build hash for stores, and hash for products, and hash 
for stores. Then probe using the fact table, if the row matched in stores, that 
row can go up further to map with products by hashing check, if pass, then go 
up further to try to match period. In this way, the sale_fact only needs to be 
scanned once which save lots of disk IO.  Is this doable in hive, if doable, 
what hint need to use?

multiple tables join with only one hug table.

Reply via email to