Yash Datta created SPARK-7097:
---------------------------------

             Summary: Partitioned tables should only consider referred 
partitions in query during size estimation for checking against 
autoBroadcastJoinThreshold
                 Key: SPARK-7097
                 URL: https://issues.apache.org/jira/browse/SPARK-7097
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.3.1, 1.3.0, 1.2.2, 1.2.1, 1.2.0, 1.1.1
            Reporter: Yash Datta
             Fix For: 1.4.0


Currently when deciding about whether to create HashJoin or ShuffleHashJoin, 
the size estimation of partitioned tables involved considers the size of entire 
table. This results in many query plans using shuffle hash joins , where infact 
only a small number of partitions may be being referred by the actual query 
(due to additional filters), and hence these could be run using Map side hash 
join.

The query plan should consider the size of only the referred partitions in such 
cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to