[GitHub] [spark] viirya commented on a change in pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

GitBox Thu, 20 May 2021 15:24:39 -0700


viirya commented on a change in pull request #32586:
URL: https://github.com/apache/spark/pull/32586#discussion_r636514764




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##########
@@ -167,7 +167,28 @@ class EquivalentExpressions {
    * Returns all the equivalent sets of expressions.
    */
   def getAllEquivalentExprs: Seq[Seq[Expression]] = {
-    equivalenceMap.values.map(_.toSeq).toSeq
+    equivalenceMap.values.map(_.toSeq).toSeq.sortBy(_.head)(new 
ExpressionContainmentOrdering)
+  }
+
+  /**
+   * Orders [Expression] by parent/child relations. The child expression is 
smaller
+   * than parent expression. If there is child-parent relationships among the 
subexpressions,
+   * we want the child expressions come first than parent expressions, so we 
can replace
+   * child expressions in parent expressions with subexpression evaluation. 
Note that
+   * this is not for general expression ordering. For example, two irrelevant 
expressions
+   * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the 
usage here,
+   * the order of irrelevant expressions does not matter.
+   */
+  class ExpressionContainmentOrdering extends Ordering[Expression] {
+    override def compare(x: Expression, y: Expression): Int = {
+      if (x.semanticEquals(y)) {
+        0
+      } else if (x.find(_.semanticEquals(y)).isDefined) {

Review comment:
       BTW, I think better approach is to sort after filter (e.g. size > 1 in 
most use-case), because the number of sub-exprs should be smaller.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

Reply via email to