[GitHub] [spark] viirya commented on a change in pull request #32559: [SPARK-35410][SQL] SubExpr elimination should not include redundant children exprs in conditional expression

GitBox Mon, 17 May 2021 11:21:36 -0700


viirya commented on a change in pull request #32559:
URL: https://github.com/apache/spark/pull/32559#discussion_r633219605




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##########
@@ -90,13 +90,20 @@ class EquivalentExpressions {
     val exprSetForAll = mutable.Set[Expr]()
     addExprTree(exprs.head, addExprToSet(_, exprSetForAll))
 
-    val commonExprSet = exprs.tail.foldLeft(exprSetForAll) { (exprSet, expr) =>
+    val candidateExprs = exprs.tail.foldLeft(exprSetForAll) { (exprSet, expr) 
=>
       val otherExprSet = mutable.Set[Expr]()
       addExprTree(expr, addExprToSet(_, otherExprSet))
       exprSet.intersect(otherExprSet)
     }
 
-    commonExprSet.foreach(expr => addFunc(expr.e))
+    // Not all expressions in the set should be added. We should filter out 
the subexprs.

Review comment:
       Yea, revised the method comment. Thanks.

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##########
@@ -82,21 +82,31 @@ class EquivalentExpressions {
   /**
    * Adds only expressions which are common in each of given expressions, in a 
recursive way.
    * For example, given two expressions `(a + (b + (c + 1)))` and `(d + (e + 
(c + 1)))`,
-   * the common expression `(c + 1)` will be added into `equivalenceMap`.
+   * the common expression `(c + 1)` will be added into `equivalenceMap`. Note 
that if an
+   * expression and its child expressions are all commonly occurred in each of 
given expressions,
+   * we filter out the child expressions. For example, if `((a + b) + c)` and 
`(a + b)` are
+   * common expressions, we only add `((a + b) + c)`.
    */
   private def addCommonExprs(
       exprs: Seq[Expression],
       addFunc: Expression => Boolean = addExpr): Unit = {
     val exprSetForAll = mutable.Set[Expr]()
     addExprTree(exprs.head, addExprToSet(_, exprSetForAll))
 
-    val commonExprSet = exprs.tail.foldLeft(exprSetForAll) { (exprSet, expr) =>
+    val candidateExprs = exprs.tail.foldLeft(exprSetForAll) { (exprSet, expr) 
=>
       val otherExprSet = mutable.Set[Expr]()
       addExprTree(expr, addExprToSet(_, otherExprSet))
       exprSet.intersect(otherExprSet)
     }
 
-    commonExprSet.foreach(expr => addFunc(expr.e))
+    // Not all expressions in the set should be added. We should filter out 
the subexprs.
+    val commonExprSet = candidateExprs.filter { candidateExpr =>
+      candidateExprs.forall { expr =>
+        expr == candidateExpr || 
expr.e.find(_.semanticEquals(candidateExpr.e)).isEmpty
+      }

Review comment:
       Yea, I considered this part but didn't come out better one.

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##########
@@ -82,21 +82,31 @@ class EquivalentExpressions {
   /**
    * Adds only expressions which are common in each of given expressions, in a 
recursive way.
    * For example, given two expressions `(a + (b + (c + 1)))` and `(d + (e + 
(c + 1)))`,
-   * the common expression `(c + 1)` will be added into `equivalenceMap`.
+   * the common expression `(c + 1)` will be added into `equivalenceMap`. Note 
that if an
+   * expression and its child expressions are all commonly occurred in each of 
given expressions,
+   * we filter out the child expressions. For example, if `((a + b) + c)` and 
`(a + b)` are
+   * common expressions, we only add `((a + b) + c)`.

Review comment:
       The so called common expressions must occur at all branches/values. So 
in the above case, `(a + b)` is actually the only one common expression among 
two values `$"a" + $"b" + $"c` and `$"a" + $"b"`.

##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala
##########
@@ -309,6 +309,22 @@ class SubexpressionEliminationSuite extends SparkFunSuite 
with ExpressionEvalHel
       CodeGenerator.compile(code)
     }
   }
+
+  test("SPARK-35410: SubExpr elimination should not include redundant child 
exprs " +
+    "for conditional expressions") {

Review comment:
       So far the only one I can think about.

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##########
@@ -82,21 +82,31 @@ class EquivalentExpressions {
   /**
    * Adds only expressions which are common in each of given expressions, in a 
recursive way.
    * For example, given two expressions `(a + (b + (c + 1)))` and `(d + (e + 
(c + 1)))`,
-   * the common expression `(c + 1)` will be added into `equivalenceMap`.
+   * the common expression `(c + 1)` will be added into `equivalenceMap`. Note 
that if an
+   * expression and its child expressions are all commonly occurred in each of 
given expressions,
+   * we filter out the child expressions. For example, if `((a + b) + c)` and 
`(a + b)` are
+   * common expressions, we only add `((a + b) + c)`.
    */
   private def addCommonExprs(
       exprs: Seq[Expression],
       addFunc: Expression => Boolean = addExpr): Unit = {
     val exprSetForAll = mutable.Set[Expr]()
     addExprTree(exprs.head, addExprToSet(_, exprSetForAll))
 
-    val commonExprSet = exprs.tail.foldLeft(exprSetForAll) { (exprSet, expr) =>
+    val candidateExprs = exprs.tail.foldLeft(exprSetForAll) { (exprSet, expr) 
=>
       val otherExprSet = mutable.Set[Expr]()
       addExprTree(expr, addExprToSet(_, otherExprSet))
       exprSet.intersect(otherExprSet)
     }
 
-    commonExprSet.foreach(expr => addFunc(expr.e))
+    // Not all expressions in the set should be added. We should filter out 
the subexprs.
+    val commonExprSet = candidateExprs.filter { candidateExpr =>
+      candidateExprs.forall { expr =>
+        expr == candidateExpr || 
expr.e.find(_.semanticEquals(candidateExpr.e)).isEmpty
+      }

Review comment:
       BTW `candidateExprs` size should be a small number.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #32559: [SPARK-35410][SQL] SubExpr elimination should not include redundant children exprs in conditional expression

Reply via email to