[ https://issues.apache.org/jira/browse/SPARK-29176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Enrico Minack resolved SPARK-29176. ----------------------------------- Resolution: Not A Problem > Optimization should change join type to CROSS > --------------------------------------------- > > Key: SPARK-29176 > URL: https://issues.apache.org/jira/browse/SPARK-29176 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.2 > Reporter: Enrico Minack > Priority: Major > > The following query is a valid join but gets optimized into a Cartesian join > of {{INNER}} type: > {code:java} > case class Value(id: Int, lower: String, upper: String) > val values = Seq(Value(1, "one", "ONE")).toDS > val join = values.join(values.withColumn("id", lit(1)), "id") > {code} > Catalyst optimizes this query into an inner join without any conditions. The > {{CheckCartesianProducts}} rule throws an exception then. > The following rules are involved: > The {{FoldablePropagation}} pushes the constant {{lit(1)}} into the join > condition, making it trivial: > {code:java} > === Applying Rule org.apache.spark.sql.catalyst.optimizer.FoldablePropagation > === > Project [id#3, lower#4, upper#5, lower#12, upper#13] Project [id#3, > lower#4, upper#5, lower#12, upper#13] > !+- Join Inner, (id#3 = id#7) +- Join Inner, (id#3 > = 1) > :- LocalRelation [id#3, lower#4, upper#5] :- LocalRelation > [id#3, lower#4, upper#5] > +- Project [1 AS id#7, lower#12, upper#13] +- Project [1 AS > id#7, lower#12, upper#13] > +- LocalRelation [id#11, lower#12, upper#13] +- > LocalRelation [id#11, lower#12, upper#13] > {code} > The {{PushPredicateThroughJoin}} rule pushes this join condition into the > other branch of the query tree as a filter, removing the only join condition: > {code:java} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin === > Project [id#3, lower#4, upper#5, lower#12, upper#13] Project [id#3, > lower#4, upper#5, lower#12, upper#13] > !+- Join Inner, (id#3 = 1) +- Join Inner > ! :- LocalRelation [id#3, lower#4, upper#5] :- Filter (id#3 = > 1) > ! +- Project [1 AS id#7, lower#12, upper#13] : +- > LocalRelation [id#3, lower#4, upper#5] > ! +- LocalRelation [id#11, lower#12, upper#13] +- Project [1 AS > id#7, lower#12, upper#13] > ! +- > LocalRelation [id#11, lower#12, upper#13] > {code} > The first rule reduces the join into a trivial join, the second rule removes > the condition altogether. On both plans the optimizer throws an exception: > {{Join condition is missing or trivial.}} > A valid join that becomes trivial or Cartesian during optimization should > change the type to {{CROSS}}, and not keep the join type as it is. For a user > it is hard to see from the original query that it represents a Cartesian join > though a join condition is given. And setting > {{spark.sql.crossJoin.enabled=true}} may not be desirable in a complex > production environment. > Do you agree that those optimization rules (and potentially others) need to > alter the join type if the condition becomes trivial or removed completely? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org