wangyum commented on a change in pull request #26257: [SPARK-29606][SQL] 
Improve EliminateOuterJoin performance
URL: https://github.com/apache/spark/pull/26257#discussion_r347478495
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ##########
 @@ -176,10 +176,16 @@ abstract class UnaryNode extends LogicalPlan {
         allConstraints += EqualNullSafe(a.toAttribute, l)
       case a @ Alias(e, _) =>
         // For every alias in `projectList`, replace the reference in 
constraints by its attribute.
-        allConstraints ++= allConstraints.map(_ transform {
-          case expr: Expression if expr.semanticEquals(e) =>
-            a.toAttribute
-        })
+        allConstraints ++= allConstraints.map {
+          case e @ EqualNullSafe(l, _: AttributeReference)
+            if !l.isInstanceOf[AttributeReference] => e
 
 Review comment:
   For example:
   ```scala
   import org.apache.spark.sql.catalyst.plans.logical.Project
   spark.sql("CREATE TABLE IF NOT EXISTS spark_29606(a int, b int, c int) USING 
parquet")
   spark.sql("SELECT a as a1, b as b1, c as c1, abc as abc1 FROM (SELECT a, b, 
c, a + b + c as abc FROM spark_29606) t")
     
.queryExecution.analyzed.asInstanceOf[Project].validConstraints.toSeq.sortBy(_.toString).foreach(println)
   ```
   `child.constraints` has a  constraint: `(((a#5 + b#6) + c#7) <=> abc#0)`:
   
   
   
![image](https://user-images.githubusercontent.com/5399861/69068511-dc27fb00-0a5f-11ea-98ab-dd0c52c27529.png)
   
   
   **Before this PR**.  We will replace the reference in constraints by its 
attribute. It will generate a lot of constraints. It seems that these 
constraints are useless:
   
![image](https://user-images.githubusercontent.com/5399861/69068611-0a0d3f80-0a60-11ea-98ca-4b23839c29cf.png)
   
   
   **After this PR**: We avoid generating these useless constraints:
   
![image](https://user-images.githubusercontent.com/5399861/69069075-d0890400-0a60-11ea-80e5-e54d162fb702.png)
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to