[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23079 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23079#discussion_r234639734 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] { replaceNullWithFalse(cond) -> value } cw.copy(branches = newBranches) + case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) => --- End diff -- ah i see. Sorry I missed it. Then it's safer to use a white-list here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/23079#discussion_r234534798 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala --- @@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest { testProjection(originalExpr = column, expectedExpr = column) } + test("replace nulls in lambda function of ArrayFilter") { +val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0)) --- End diff -- Updated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/23079#discussion_r234508866 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala --- @@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest { testProjection(originalExpr = column, expectedExpr = column) } + test("replace nulls in lambda function of ArrayFilter") { +val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0)) --- End diff -- Actually I intentionally made all three lambda the same (the `MapFilter` one only differs in the lambda parameter). I can encapsulate this lambda function into a test utility function. Let me update the PR and see what you think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/23079#discussion_r234508561 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] { replaceNullWithFalse(cond) -> value } cw.copy(branches = newBranches) + case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) => --- End diff -- I'm not sure if that's useful or not. First of all, the `replaceNullWithFalse` handling doesn't apply to all higher-order functions. In fact it only applies to a very narrow set, ones where a lambda function returns `BooleanType` and is immediately used as a predicate. So having a generic utility can certainly help make this PR slightly simpler, but I don't know how useful it is for other cases. I'd prefer waiting for more such transformation cases to introduce a new utility for the pattern. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23079#discussion_r234474562 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] { replaceNullWithFalse(cond) -> value } cw.copy(branches = newBranches) + case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) => --- End diff -- shall we add a `withNewFunctions` method in `HigherOrderFunction`? Then we can simplify this rule to ``` case f: HigherOrderFunction => f.withNewFunctions(f.functions.map(replaceNullWithFalse)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23079#discussion_r234467085 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala --- @@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest { testProjection(originalExpr = column, expectedExpr = column) } + test("replace nulls in lambda function of ArrayFilter") { +val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0)) --- End diff -- Test cases for `ArrayFilter` and `ArrayExists` seem to be identical. As we have those tests anyway, would it make sense to cover different lambda functions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...
GitHub user rednaxelafx opened a pull request: https://github.com/apache/spark/pull/23079 [SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicate to support higher-order functions: ArrayExists, ArrayFilter, MapFilter ## What changes were proposed in this pull request? Extend the `ReplaceNullWithFalse` optimizer rule introduced in SPARK-25860 (https://github.com/apache/spark/pull/22857) to also support optimizing predicates in higher-order functions of `ArrayExists`, `ArrayFilter`, `MapFilter`. Also rename the rule to `ReplaceNullWithFalseInPredicate` to better reflect its intent. ## How was this patch tested? Added new unit test cases to the `ReplaceNullWithFalseInPredicateSuite` (renamed from `ReplaceNullWithFalseSuite`). You can merge this pull request into a Git repository by running: $ git pull https://github.com/rednaxelafx/apache-spark catalyst-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23079.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23079 commit 710c8862b3138f6146fe2309d6379707f8d4ac14 Author: Kris Mok Date: 2018-11-18T09:09:53Z Extend ReplaceNullWithFalseInPredicate to support higher-order functions: ArrayExists, ArrayFilter, MapFilter --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org