Yash Datta created SPARK-7097: --------------------------------- Summary: Partitioned tables should only consider referred partitions in query during size estimation for checking against autoBroadcastJoinThreshold Key: SPARK-7097 URL: https://issues.apache.org/jira/browse/SPARK-7097 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1, 1.3.0, 1.2.2, 1.2.1, 1.2.0, 1.1.1 Reporter: Yash Datta Fix For: 1.4.0
Currently when deciding about whether to create HashJoin or ShuffleHashJoin, the size estimation of partitioned tables involved considers the size of entire table. This results in many query plans using shuffle hash joins , where infact only a small number of partitions may be being referred by the actual query (due to additional filters), and hence these could be run using Map side hash join. The query plan should consider the size of only the referred partitions in such cases -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org