How about using Hive on Spark so your A is your fact table and the rest of your tables are dimensions.
20 million rows are not that big. has your fact table partitioned and more importantly scattered by your dimensional keys? CLUSTERED BY ( prod_id, cust_id, time_id, channel_id, promo_id) INTO 256 BUCKETS The ones that you use to create bitmap indexes on FACT table something like below on sales table +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+ | idx_name | tab_name | col_names | idx_tab_name | idx_type | comment | +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+ | sales_cust_bix | sales | cust_id | oraclehadoop__sales_sales_cust_bix__ | bitmap | | | sales_channel_bix | sales | channel_id | oraclehadoop__sales_sales_channel_bix__ | bitmap | | | sales_prod_bix | sales | prod_id | oraclehadoop__sales_sales_prod_bix__ | bitmap | | | sales_promo_bix | sales | promo_id | oraclehadoop__sales_sales_promo_bix__ | bitmap | | | sales_time_bix | sales | time_id | oraclehadoop__sales_sales_time_bix__ | bitmap | | +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+ HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 15 March 2016 at 19:16, Gopal Vijayaraghavan <[email protected]> wrote: > > >I have a query where I am joining with 10 other entities > > Are you using Tez? > > This looks like an obvious candidate for a broadcast join. > > Cheers, > Gopal > > >
