[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-21 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1143251312


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -125,13 +128,27 @@ class EquivalentExpressions {
 }
   }
 
+  private def skipForShortcut(expr: Expression): Expression = {
+if (skipForShortcutEnable) {
+  // The subexpression may not need to eval even if it appears more than 
once.
+  // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if 
`a` is true.
+  expr match {
+case and: And => skipForShortcut(and.left)

Review Comment:
   make sense, it should consistent with childrenToRecurse



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-20 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1142906959


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -125,13 +128,27 @@ class EquivalentExpressions {
 }
   }
 
+  private def skipForShortcut(expr: Expression): Expression = {
+if (skipForShortcutEnable) {
+  // The subexpression may not need to eval even if it appears more than 
once.
+  // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if 
`a` is true.
+  expr match {
+case and: And => skipForShortcut(and.left)

Review Comment:
   yea it is the fact, but after second thought the `updateExprTree` has side 
effect `updateExprInMap` during recursion. If we decide to skip for shortcut, 
is it better to return the final valid expression in one shot ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-20 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1142906959


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -125,13 +128,27 @@ class EquivalentExpressions {
 }
   }
 
+  private def skipForShortcut(expr: Expression): Expression = {
+if (skipForShortcutEnable) {
+  // The subexpression may not need to eval even if it appears more than 
once.
+  // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if 
`a` is true.
+  expr match {
+case and: And => skipForShortcut(and.left)

Review Comment:
   yea it is the fact, but after second though the `updateExprTree` has side 
effect `updateExprInMap` during recursion. If we decide to skip for shortcut, 
is it better to return the final valid expression in one shot ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1140042906


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children
+}
+if (skipShortcut) {

Review Comment:
   changed to `skipForShortcut` which aligns with config name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1140038457


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children

Review Comment:
   oh, I see it. The actually root expression is `If` which is the 
`ConditionalExpression`. Let me upadte it and the outdate test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1139981736


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children

Review Comment:
   Not need, the next round will cover it. I added some tests to confirm that 
`And` is not the root node.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-17 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1139980442


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children
+}
+if (shortcut) {
+  // The subexpression may not need to eval even if it appears more than 
once.
+  // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if 
`a` is true.
+  alwaysEvaluated.map {
+case and: And => and.left
+case or: Or => or.left
+case other => other
+  }

Review Comment:
   It's a kind of lazy evaluation. I remember before that pr attempts 
https://github.com/apache/spark/pull/32977 , and  there is some issues @viirya 
memtioned 
https://github.com/apache/spark/pull/32977#pullrequestreview-690266902.
   It seems the main issue is that, the method will go to large if we make each 
common subexpression evaluation lazy. Something like:
   ```
   def common_subexpression_1() {
 if (isnull) {
   // evaluate
 } else {
   // return exists value
 }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-15 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1138120937


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -864,6 +864,14 @@ object SQLConf {
   .checkValue(_ >= 0, "The maximum must not be negative")
   .createWithDefault(100)
 
+  val SUBEXPRESSION_ELIMINATION_SKIP_FOR_SHORTCUT_EXPR =
+buildConf("spark.sql.subexpressionElimination.skipForShortcutExpr")
+  .internal()
+  .doc("When true, shortcut eliminate subexpression with `AND`, `OR`.")

Review Comment:
   sure, thank you @viirya 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-15 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1138102904


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children
+}
+if (shortcut) {
+  // The subexpression may not need to eval even if it appears more than 
once.
+  // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if 
`a` is true.
+  alwaysEvaluated.map {
+case and: And => and.left
+case or: Or => or.left
+case other => other
+  }

Review Comment:
   It is why we add a new config with disabled by default. We can not decide 
which subexpression would be evaluated before running. When enable this config, 
it assumes that the left child is a shotcut, then the right child can be 
skipped whatever it contains common subexpression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-15 Thread via GitHub


ulysses-you commented on code in PR #40446:
URL: https://github.com/apache/spark/pull/40446#discussion_r1138102904


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##
@@ -128,10 +131,23 @@ class EquivalentExpressions {
   // There are some special expressions that we should not recurse into all of 
its children.
   //   1. CodegenFallback: it's children will not be used to generate code 
(call eval() instead)
   //   2. ConditionalExpression: use its children that will always be 
evaluated.
-  private def childrenToRecurse(expr: Expression): Seq[Expression] = expr 
match {
-case _: CodegenFallback => Nil
-case c: ConditionalExpression => c.alwaysEvaluatedInputs
-case other => other.children
+  private def childrenToRecurse(expr: Expression): Seq[Expression] = {
+val alwaysEvaluated = expr match {
+  case _: CodegenFallback => Nil
+  case c: ConditionalExpression => c.alwaysEvaluatedInputs
+  case other => other.children
+}
+if (shortcut) {
+  // The subexpression may not need to eval even if it appears more than 
once.
+  // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if 
`a` is true.
+  alwaysEvaluated.map {
+case and: And => and.left
+case or: Or => or.left
+case other => other
+  }

Review Comment:
   It is why we add a new config with disabled by default. We can not decide 
which subexpression would be evaluated before running. When enable this config, 
it assumes that the left child is a shotcut, then the righ child can be skipped 
whatever it contains common subexpression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org