wangyum commented on a change in pull request #26257: [SPARK-29606][SQL]
Improve EliminateOuterJoin performance
URL: https://github.com/apache/spark/pull/26257#discussion_r347478495
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
##########
@@ -176,10 +176,16 @@ abstract class UnaryNode extends LogicalPlan {
allConstraints += EqualNullSafe(a.toAttribute, l)
case a @ Alias(e, _) =>
// For every alias in `projectList`, replace the reference in
constraints by its attribute.
- allConstraints ++= allConstraints.map(_ transform {
- case expr: Expression if expr.semanticEquals(e) =>
- a.toAttribute
- })
+ allConstraints ++= allConstraints.map {
+ case e @ EqualNullSafe(l, _: AttributeReference)
+ if !l.isInstanceOf[AttributeReference] => e
Review comment:
For example:
```scala
import org.apache.spark.sql.catalyst.plans.logical.Project
spark.sql("CREATE TABLE IF NOT EXISTS spark_29606(a int, b int, c int) USING
parquet")
spark.sql("SELECT a as a1, b as b1, c as c1, abc as abc1 FROM (SELECT a, b,
c, a + b + c as abc FROM spark_29606) t")
.queryExecution.analyzed.asInstanceOf[Project].validConstraints.toSeq.sortBy(_.toString).foreach(println)
```
`child.constraints` has a constraint: `(((a#5 + b#6) + c#7) <=> abc#0)`:

**Before this PR**. We will replace the reference in constraints by its
attribute. It will generate a lot of constraints. It seems that these
constraints are useless:

**After this PR**: We avoid generating these useless constraints:

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]