Kimahriman commented on a change in pull request #30245:
URL: https://github.com/apache/spark/pull/30245#discussion_r661470618



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##########
@@ -65,11 +65,82 @@ class EquivalentExpressions {
     }
   }
 
+  private def addExprToSet(expr: Expression, set: mutable.Set[Expr]): Boolean 
= {
+    if (expr.deterministic) {
+      val e = Expr(expr)
+      if (set.contains(e)) {
+        true
+      } else {
+        set.add(e)
+        false
+      }
+    } else {
+      false
+    }
+  }
+
+  /**
+   * Adds only expressions which are common in each of given expressions, in a 
recursive way.
+   * For example, given two expressions `(a + (b + (c + 1)))` and `(d + (e + 
(c + 1)))`,
+   * the common expression `(c + 1)` will be added into `equivalenceMap`.
+   */
+  private def addCommonExprs(
+      exprs: Seq[Expression],
+      addFunc: Expression => Boolean = addExpr): Unit = {
+    val exprSetForAll = mutable.Set[Expr]()
+    addExprTree(exprs.head, addExprToSet(_, exprSetForAll))
+
+    val commonExprSet = exprs.tail.foldLeft(exprSetForAll) { (exprSet, expr) =>
+      val otherExprSet = mutable.Set[Expr]()
+      addExprTree(expr, addExprToSet(_, otherExprSet))
+      exprSet.intersect(otherExprSet)
+    }
+
+    commonExprSet.foreach(expr => addFunc(expr.e))
+  }
+
+  // There are some special expressions that we should not recurse into all of 
its children.
+  //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
+  //   2. If: common subexpressions will always be evaluated at the beginning, 
but the true and
+  //          false expressions in `If` may not get accessed, according to the 
predicate
+  //          expression. We should only recurse into the predicate expression.
+  //   3. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in 
a certain
+  //                condition. We should only recurse into the first condition 
expression as it
+  //                will always get accessed.
+  //   4. Coalesce: it's also a conditional expression, we should only recurse 
into the first
+  //                children, because others may not get accessed.
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
+    case _: CodegenFallback => Nil
+    case i: If => i.predicate :: Nil
+    case c: CaseWhen => c.children.head :: Nil
+    case c: Coalesce => c.children.head :: Nil
+    case other => other.children
+  }
+
+  // For some special expressions we cannot just recurse into all of its 
children, but we can
+  // recursively add the common expressions shared between all of its children.
+  private def commonChildrenToRecurse(expr: Expression): Seq[Seq[Expression]] 
= expr match {
+    case i: If => Seq(Seq(i.trueValue, i.falseValue))
+    case c: CaseWhen =>
+      // We look at subexpressions in conditions and values of `CaseWhen` 
separately. It is
+      // because a subexpression in conditions will be run no matter which 
condition is matched
+      // if it is shared among conditions, but it doesn't need to be shared in 
values. Similarly,
+      // a subexpression among values doesn't need to be in conditions because 
no matter which
+      // condition is true, it will be evaluated.
+      val conditions = c.branches.tail.map(_._1)

Review comment:
       FWIW, I also addressed this issue in 
https://github.com/apache/spark/pull/32987 which assumed CaseWhen's (and 
Coalesce) should short circuit and guard later conditions. The main 
benefit/difference is if you have
   
   `CaseWhen(cond1, ..., cond1, ..., cond2, ...)`, `cond1` gets pulled out as a 
subexpression when it wouldn't otherwise even with 
https://github.com/apache/spark/pull/33142 I think




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to