Hey, We noticed that the current skewed join supports only 1 skewed table, and assumes that the second table isn't skewed. Please review this suggestion for a 2 skewed tables design:
- Sample both tables - for each skewed key (with many records in at least one table), build a surrogate key in a GFCross style - e.g. if for this key there are 3M keys from the left table and 7M from the right table, and there are 100 reducers available, build GFCross with dimensions of sqrt(100*3/7) and sqrt(100*7/3) What do you say? Is this a necessary enhancement request? Or is it safe to assume that only one table will be skewed in each join? Thanks, Dudu and Ido -- Sent from my androido
