[ https://issues.apache.org/jira/browse/SPARK-11111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell updated SPARK-11111: ------------------------------------ Component/s: SQL > Fast null-safe join > ------------------- > > Key: SPARK-11111 > URL: https://issues.apache.org/jira/browse/SPARK-11111 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Davies Liu > Assignee: Davies Liu > > Today, null safe joins are executed with a Cartesian product. > {code} > scala> sqlContext.sql("select * from t a join t b on (a.i <=> b.i)").explain > == Physical Plan == > TungstenProject [i#2,j#3,i#7,j#8] > Filter (i#2 <=> i#7) > CartesianProduct > LocalTableScan [i#2,j#3], [[1,1]] > LocalTableScan [i#7,j#8], [[1,1]] > {code} > One option is to add this rewrite to the optimizer: > {code} > select * > from t a > join t b > on coalesce(a.i, <default>) = coalesce(b.i, <default>) AND (a.i <=> b.i) > {code} > Acceptance criteria: joins with only null safe equality should not result in > a Cartesian product. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org