cloud-fan commented on a change in pull request #34883:
URL: https://github.com/apache/spark/pull/34883#discussion_r767824415



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
##########
@@ -18,48 +18,17 @@
 package org.apache.spark.sql.catalyst.expressions
 
 /**
- * Rewrites an expression using rules that are guaranteed preserve the result 
while attempting
- * to remove cosmetic variations. Deterministic expressions that are `equal` 
after canonicalization
- * will always return the same answer given the same input (i.e. false 
positives should not be
- * possible). However, it is possible that two canonical expressions that are 
not equal will in fact
- * return the same answer given any input (i.e. false negatives are possible).
- *
- * The following rules are applied:
- *  - Names and nullability hints for [[org.apache.spark.sql.types.DataType]]s 
are stripped.
- *  - Names for [[GetStructField]] are stripped.
- *  - TimeZoneId for [[Cast]] and [[AnsiCast]] are stripped if `needsTimeZone` 
is false.
- *  - Commutative and associative operations ([[Add]] and [[Multiply]]) have 
their children ordered
- *    by `hashCode`.
- *  - [[EqualTo]] and [[EqualNullSafe]] are reordered by `hashCode`.
- *  - Other comparisons ([[GreaterThan]], [[LessThan]]) are reversed by 
`hashCode`.
- *  - Elements in [[In]] are reordered by `hashCode`.
+ * Reorders adjacent commutative operators such as [[And]] in the expression 
tree, according to
+ * the `hasCode` of leaf nodes, to remove cosmetic variations. Caller side 
should only call it on
+ * the root node of an expression tree that needs to be canonicalized.
  */
 object Canonicalize {
-  def execute(e: Expression): Expression = {
-    expressionReorder(ignoreTimeZone(ignoreNamesTypes(e)))
-  }
-
-  /** Remove names and nullability from types, and names from 
`GetStructField`. */
-  private[expressions] def ignoreNamesTypes(e: Expression): Expression = e 
match {
-    case a: AttributeReference =>
-      AttributeReference("none", a.dataType.asNullable)(exprId = a.exprId)
-    case GetStructField(child, ordinal, Some(_)) => GetStructField(child, 
ordinal, None)
-    case _ => e
-  }
-
-  /** Remove TimeZoneId for Cast if needsTimeZone return false. */
-  private[expressions] def ignoreTimeZone(e: Expression): Expression = e match 
{
-    case c: CastBase if c.timeZoneId.nonEmpty && !c.needsTimeZone =>
-      c.withTimeZone(null)
-    case _ => e
-  }
-
   /** Collects adjacent commutative operations. */
   private def gatherCommutative(
       e: Expression,
       f: PartialFunction[Expression, Seq[Expression]]): Seq[Expression] = e 
match {
     case c if f.isDefinedAt(c) => f(c).flatMap(gatherCommutative(_, f))
-    case other => other :: Nil
+    case other => reorderCommutativeOperators(other) :: Nil

Review comment:
       Since we call `Canonicalize` only once now, we need to make it truly 
recursive.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to