[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-21 Thread via GitHub


cloud-fan commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1143112674


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -125,13 +128,27 @@ class EquivalentExpressions {
 }
   }
 
+  private def skipForShortcut(expr: Expression): Expression = {
+if (skipForShortcutEnable) {
+  // The subexpression may not need to eval even if it appears more than 
once.
+  // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if 
`a` is true.
+  expr match {
+case and: And => skipForShortcut(and.left)

Review Comment:
   I'm a bit worried about inconsistency. `childrenToRecurse` is not recursive 
either and it seems messy if we only make `skipForShortcut` recursive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


cloud-fan commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1140296638


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -125,13 +128,27 @@ class EquivalentExpressions {
 }
   }
 
+  private def skipForShortcut(expr: Expression): Expression = {
+if (skipForShortcutEnable) {
+  // The subexpression may not need to eval even if it appears more than 
once.
+  // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if 
`a` is true.
+  expr match {
+case and: And => skipForShortcut(and.left)

Review Comment:
   `childrenToRecurse` is recursive, so `skipForShortcut` doesn't need to be 
recursive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


cloud-fan commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1140005524


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children

Review Comment:
   The tests are correct but I'm confused about why the code works... if `And` 
is a root expression, we blindly take all its children here, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


cloud-fan commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1140001228


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children
+}
+if (skipShortcut) {

Review Comment:
   maybe `skipInShortcut` is a clearer name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


cloud-fan commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r114835


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children
+}
+if (skipShortcut) {

Review Comment:
   `skipShortCut` means we need to handle the shortcut expressions to skip CSE.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


cloud-fan commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1139893632


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children
+}
+if (shortcut) {
+  // The subexpression may not need to eval even if it appears more than 
once.
+  // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if 
`a` is true.
+  alwaysEvaluated.map {
+case and: And => and.left
+case or: Or => or.left
+case other => other
+  }

Review Comment:
   Another idea is to make CSE more dynamic: only evaluate it if its first 
appearance needs to be evaluated. It can handle `ConditionalExpression` as well 
but is much harder to implement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


cloud-fan commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1139888574


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children

Review Comment:
   shall we match `And`/`Or` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


cloud-fan commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1139887759


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -23,14 +23,17 @@ import scala.collection.mutable
 
 import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback
 import org.apache.spark.sql.catalyst.expressions.objects.LambdaVariable
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.util.Utils
 
 /**
  * This class is used to compute equality of (sub)expression trees. 
Expressions can be added
  * to this class and they subsequently query for expression equality. 
Expression trees are
  * considered equal if for the same input(s), the same result is produced.
  */
-class EquivalentExpressions {
+class EquivalentExpressions(
+shortcut: Boolean = 
SQLConf.get.subexpressionEliminationSkipForShotcutExpr) {

Review Comment:
   ```suggestion
   skipShortcut: Boolean = 
SQLConf.get.subexpressionEliminationSkipForShotcutExpr) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org