Chenzhao Guo created SPARK-22671: ------------------------------------ Summary: SortMergeJoin read more data when wholeStageCodegen is off compared with when it is on Key: SPARK-22671 URL: https://issues.apache.org/jira/browse/SPARK-22671 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Chenzhao Guo
In SortMergeJoin(with wholeStageCodegen), an optimization already exists: if the left table of a partition is empty then there is no need to read the right table of this corresponding partition. This benefits the case in which many partitions of left table is empty and the right table is big. While in the code path without wholeStageCodegen, this optimization doesn't happen. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org