[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1143251312 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { } } + private def skipForShortcut(expr: Expression): Expression = { +if (skipForShortcutEnable) { + // The subexpression may not need to eval even if it appears more than once. + // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is true. + expr match { +case and: And => skipForShortcut(and.left) Review Comment: make sense, it should consistent with childrenToRecurse -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1142906959 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { } } + private def skipForShortcut(expr: Expression): Expression = { +if (skipForShortcutEnable) { + // The subexpression may not need to eval even if it appears more than once. + // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is true. + expr match { +case and: And => skipForShortcut(and.left) Review Comment: yea it is the fact, but after second thought the `updateExprTree` has side effect `updateExprInMap` during recursion. If we decide to skip for shortcut, is it better to return the final valid expression in one shot ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1142906959 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { } } + private def skipForShortcut(expr: Expression): Expression = { +if (skipForShortcutEnable) { + // The subexpression may not need to eval even if it appears more than once. + // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is true. + expr match { +case and: And => skipForShortcut(and.left) Review Comment: yea it is the fact, but after second though the `updateExprTree` has side effect `updateExprInMap` during recursion. If we decide to skip for shortcut, is it better to return the final valid expression in one shot ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1140042906 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { // There are some special expressions that we should not recurse into all of its children. // 1. CodegenFallback: it's children will not be used to generate code (call eval() instead) // 2. ConditionalExpression: use its children that will always be evaluated. - private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { -case _: CodegenFallback => Nil -case c: ConditionalExpression => c.alwaysEvaluatedInputs -case other => other.children + private def childrenToRecurse(expr: Expression): Seq[Expression] = { +val alwaysEvaluated = expr match { + case _: CodegenFallback => Nil + case c: ConditionalExpression => c.alwaysEvaluatedInputs + case other => other.children +} +if (skipShortcut) { Review Comment: changed to `skipForShortcut` which aligns with config name -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1140038457 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { // There are some special expressions that we should not recurse into all of its children. // 1. CodegenFallback: it's children will not be used to generate code (call eval() instead) // 2. ConditionalExpression: use its children that will always be evaluated. - private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { -case _: CodegenFallback => Nil -case c: ConditionalExpression => c.alwaysEvaluatedInputs -case other => other.children + private def childrenToRecurse(expr: Expression): Seq[Expression] = { +val alwaysEvaluated = expr match { + case _: CodegenFallback => Nil + case c: ConditionalExpression => c.alwaysEvaluatedInputs + case other => other.children Review Comment: oh, I see it. The actually root expression is `If` which is the `ConditionalExpression`. Let me upadte it and the outdate test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1139981736 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { // There are some special expressions that we should not recurse into all of its children. // 1. CodegenFallback: it's children will not be used to generate code (call eval() instead) // 2. ConditionalExpression: use its children that will always be evaluated. - private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { -case _: CodegenFallback => Nil -case c: ConditionalExpression => c.alwaysEvaluatedInputs -case other => other.children + private def childrenToRecurse(expr: Expression): Seq[Expression] = { +val alwaysEvaluated = expr match { + case _: CodegenFallback => Nil + case c: ConditionalExpression => c.alwaysEvaluatedInputs + case other => other.children Review Comment: Not need, the next round will cover it. I added some tests to confirm that `And` is not the root node. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1139980442 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { // There are some special expressions that we should not recurse into all of its children. // 1. CodegenFallback: it's children will not be used to generate code (call eval() instead) // 2. ConditionalExpression: use its children that will always be evaluated. - private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { -case _: CodegenFallback => Nil -case c: ConditionalExpression => c.alwaysEvaluatedInputs -case other => other.children + private def childrenToRecurse(expr: Expression): Seq[Expression] = { +val alwaysEvaluated = expr match { + case _: CodegenFallback => Nil + case c: ConditionalExpression => c.alwaysEvaluatedInputs + case other => other.children +} +if (shortcut) { + // The subexpression may not need to eval even if it appears more than once. + // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is true. + alwaysEvaluated.map { +case and: And => and.left +case or: Or => or.left +case other => other + } Review Comment: It's a kind of lazy evaluation. I remember before that pr attempts https://github.com/apache/spark/pull/32977 , and there is some issues @viirya memtioned https://github.com/apache/spark/pull/32977#pullrequestreview-690266902. It seems the main issue is that, the method will go to large if we make each common subexpression evaluation lazy. Something like: ``` def common_subexpression_1() { if (isnull) { // evaluate } else { // return exists value } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138120937 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -864,6 +864,14 @@ object SQLConf { .checkValue(_ >= 0, "The maximum must not be negative") .createWithDefault(100) + val SUBEXPRESSION_ELIMINATION_SKIP_FOR_SHORTCUT_EXPR = +buildConf("spark.sql.subexpressionElimination.skipForShortcutExpr") + .internal() + .doc("When true, shortcut eliminate subexpression with `AND`, `OR`.") Review Comment: sure, thank you @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138102904 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { // There are some special expressions that we should not recurse into all of its children. // 1. CodegenFallback: it's children will not be used to generate code (call eval() instead) // 2. ConditionalExpression: use its children that will always be evaluated. - private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { -case _: CodegenFallback => Nil -case c: ConditionalExpression => c.alwaysEvaluatedInputs -case other => other.children + private def childrenToRecurse(expr: Expression): Seq[Expression] = { +val alwaysEvaluated = expr match { + case _: CodegenFallback => Nil + case c: ConditionalExpression => c.alwaysEvaluatedInputs + case other => other.children +} +if (shortcut) { + // The subexpression may not need to eval even if it appears more than once. + // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is true. + alwaysEvaluated.map { +case and: And => and.left +case or: Or => or.left +case other => other + } Review Comment: It is why we add a new config with disabled by default. We can not decide which subexpression would be evaluated before running. When enable this config, it assumes that the left child is a shotcut, then the right child can be skipped whatever it contains common subexpression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138102904 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { // There are some special expressions that we should not recurse into all of its children. // 1. CodegenFallback: it's children will not be used to generate code (call eval() instead) // 2. ConditionalExpression: use its children that will always be evaluated. - private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { -case _: CodegenFallback => Nil -case c: ConditionalExpression => c.alwaysEvaluatedInputs -case other => other.children + private def childrenToRecurse(expr: Expression): Seq[Expression] = { +val alwaysEvaluated = expr match { + case _: CodegenFallback => Nil + case c: ConditionalExpression => c.alwaysEvaluatedInputs + case other => other.children +} +if (shortcut) { + // The subexpression may not need to eval even if it appears more than once. + // e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is true. + alwaysEvaluated.map { +case and: And => and.left +case or: Or => or.left +case other => other + } Review Comment: It is why we add a new config with disabled by default. We can not decide which subexpression would be evaluated before running. When enable this config, it assumes that the left child is a shotcut, then the righ child can be skipped whatever it contains common subexpression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org