viirya commented on a change in pull request #29812:
URL: https://github.com/apache/spark/pull/29812#discussion_r493111576



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/WithFields.scala
##########
@@ -17,16 +17,29 @@
 
 package org.apache.spark.sql.catalyst.optimizer
 
-import org.apache.spark.sql.catalyst.expressions.WithFields
+import scala.collection.mutable
+
+import org.apache.spark.sql.catalyst.expressions.{Expression, GetStructField, 
WithFields}
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
 import org.apache.spark.sql.catalyst.rules.Rule
 
 
 /**
- * Combines all adjacent [[WithFields]] expression into a single 
[[WithFields]] expression.
+ * Optimizes [[WithFields]] expression chains.
  */
-object CombineWithFields extends Rule[LogicalPlan] {
+object OptimizeWithFields extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
+    case WithFields(structExpr, names, values) if names.distinct.length != 
names.length =>
+      val newNames = mutable.ArrayBuffer.empty[String]
+      val newValues = mutable.ArrayBuffer.empty[Expression]
+      names.zip(values).reverse.foreach { case (name, value) =>
+        if (!newNames.contains(name)) {
+          newNames += name
+          newValues += value
+        }
+      }
+      WithFields(structExpr, names = newNames.reverse.toSeq, valExprs = 
newValues.reverse.toSeq)

Review comment:
       Actually I'd like to run these rules to simplify `WithFields` tree early 
in analysis stage. After #29587, I thought that it is very likely to write bad 
`WithFields` tree. Once hitting that, it is very hard to debug and the 
analyzer/optimizer spend a lot of time traversing expression tree. So I think 
it is very useful keep this rule to simplify the expression tree, but I don't 
think we want to do `ReplaceWithFieldsExpression` in analysis stage.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to