Lianhui Wang created SPARK-5763: ----------------------------------- Summary: Sort-based Groupby and Join to resolve skewed data Key: SPARK-5763 URL: https://issues.apache.org/jira/browse/SPARK-5763 Project: Spark Issue Type: Improvement Reporter: Lianhui Wang
In SPARK-4644, it provide a way to resolve skewed data. But when we has more keys that are skewed, I think that the way in SPARK-4644 is inappropriate. So we can use sort-merge to resolve skewed-groupby and skewed-join.because SPARK-2926 implement merge-sort, we can implement sort-merge for skewed based on SPARK-2926. And i have implemented sort-merge-groupby and it is very well for skewed data in my test.Later i will implement sort-merge-join to resolve skewed-join. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org