[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

2018-11-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23079


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

2018-11-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23079#discussion_r234639734
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] {
   replaceNullWithFalse(cond) -> value
 }
 cw.copy(branches = newBranches)
+  case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
--- End diff --

ah i see. Sorry I missed it. Then it's safer to use a white-list here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

2018-11-19 Thread rednaxelafx
Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/23079#discussion_r234534798
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala
 ---
@@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest {
 testProjection(originalExpr = column, expectedExpr = column)
   }
 
+  test("replace nulls in lambda function of ArrayFilter") {
+val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0))
--- End diff --

Updated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

2018-11-18 Thread rednaxelafx
Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/23079#discussion_r234508866
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala
 ---
@@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest {
 testProjection(originalExpr = column, expectedExpr = column)
   }
 
+  test("replace nulls in lambda function of ArrayFilter") {
+val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0))
--- End diff --

Actually I intentionally made all three lambda the same (the `MapFilter` 
one only differs in the lambda parameter). I can encapsulate this lambda 
function into a test utility function. Let me update the PR and see what you 
think.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

2018-11-18 Thread rednaxelafx
Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/23079#discussion_r234508561
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] {
   replaceNullWithFalse(cond) -> value
 }
 cw.copy(branches = newBranches)
+  case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
--- End diff --

I'm not sure if that's useful or not. First of all, the 
`replaceNullWithFalse` handling doesn't apply to all higher-order functions. In 
fact it only applies to a very narrow set, ones where a lambda function returns 
`BooleanType` and is immediately used as a predicate. So having a generic 
utility can certainly help make this PR slightly simpler, but I don't know how 
useful it is for other cases.
I'd prefer waiting for more such transformation cases to introduce a new 
utility for the pattern. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

2018-11-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23079#discussion_r234474562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -767,6 +767,15 @@ object ReplaceNullWithFalse extends Rule[LogicalPlan] {
   replaceNullWithFalse(cond) -> value
 }
 cw.copy(branches = newBranches)
+  case af @ ArrayFilter(_, lf @ LambdaFunction(func, _, _)) =>
--- End diff --

shall we add a `withNewFunctions` method in `HigherOrderFunction`? Then we 
can simplify this rule to
```
case f: HigherOrderFunction => 
f.withNewFunctions(f.functions.map(replaceNullWithFalse))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

2018-11-18 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/23079#discussion_r234467085
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala
 ---
@@ -298,6 +299,45 @@ class ReplaceNullWithFalseSuite extends PlanTest {
 testProjection(originalExpr = column, expectedExpr = column)
   }
 
+  test("replace nulls in lambda function of ArrayFilter") {
+val cond = GreaterThan(UnresolvedAttribute("e"), Literal(0))
--- End diff --

Test cases for `ArrayFilter` and `ArrayExists` seem to be identical. As we 
have those tests anyway, would it make sense to cover different lambda 
functions?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23079: [SPARK-26107][SQL] Extend ReplaceNullWithFalseInP...

2018-11-18 Thread rednaxelafx
GitHub user rednaxelafx opened a pull request:

https://github.com/apache/spark/pull/23079

[SPARK-26107][SQL] Extend ReplaceNullWithFalseInPredicate to support 
higher-order functions: ArrayExists, ArrayFilter, MapFilter

## What changes were proposed in this pull request?

Extend the `ReplaceNullWithFalse` optimizer rule introduced in SPARK-25860 
(https://github.com/apache/spark/pull/22857) to also support optimizing 
predicates in higher-order functions of `ArrayExists`, `ArrayFilter`, 
`MapFilter`.

Also rename the rule to `ReplaceNullWithFalseInPredicate` to better reflect 
its intent.

## How was this patch tested?

Added new unit test cases to the `ReplaceNullWithFalseInPredicateSuite` 
(renamed from `ReplaceNullWithFalseSuite`).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rednaxelafx/apache-spark catalyst-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23079


commit 710c8862b3138f6146fe2309d6379707f8d4ac14
Author: Kris Mok 
Date:   2018-11-18T09:09:53Z

Extend ReplaceNullWithFalseInPredicate to support higher-order functions: 
ArrayExists, ArrayFilter, MapFilter




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org