[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21852


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r206739216
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,24 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue))
+  if branches.forall(_._2.semanticEquals(elseValue)) =>
+// For non-deterministic conditions with side effect, we can not 
remove it, or change
+// the ordering. As a result, we try to remove the deterministic 
conditions from the tail.
+var hitNonDeterministicCond = false
+var i = branches.length
+while (i > 0 && !hitNonDeterministicCond) {
+  hitNonDeterministicCond = !branches(i - 1)._1.deterministic
+  if (!hitNonDeterministicCond) {
--- End diff --

nit: we can avoid this per-iteration `if` check by updating the final step
```
if (i == 0 && !hitNonDeterministicCond) {
  elseValue
} else {
  e.copy(branches = branches.take(i + 1).map(branch => (branch._1, 
elseValue)))
}
``` 
feel free to change it in your next PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-31 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r206695251
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if branches
+.forall(_._2.semanticEquals(elseValue)) =>
+// For non-deterministic conditions with side effect, we can not 
remove it, or change
+// the ordering. As a result, we try to remove the deterministic 
conditions from the tail.
--- End diff --

Should be 

```scala
  hitNonDetermin = !branches(i - 1).deterministic
  if (!hitNonDetermin) {
i -= 1
  }
```

Personally, I like functional style more, but it's more efficient to use 
Java style here. I updated as you suggested. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r206500527
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if branches
+.forall(_._2.semanticEquals(elseValue)) =>
+// For non-deterministic conditions with side effect, we can not 
remove it, or change
+// the ordering. As a result, we try to remove the deterministic 
conditions from the tail.
--- End diff --

I think it's more readable to write java style code here
```
var hitNonDetermin = false
var i = branches.length 
while (i > 0 && !hitNonDetermin) {
  hitNonDetermin = !branches(i - 1).deterministic
  i -= 1
}
if (i == 0) {
  elseValue
} else {
  e.copy(branches = branches.take(i))
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r206499326
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if branches
+.forall(_._2.semanticEquals(elseValue)) =>
--- End diff --

code style nit:
```
case e @ CaseWhen(branches, Some(elseValue))
if branches.forall(_._2.semanticEquals(elseValue))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r206271589
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,23 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+val values = branches.map(_._2) :+ elseValue
+values.tail.forall(values.head.semanticEquals)
--- End diff --

I replaced the cond by `branches.forall(_._2.semanticEquals(elseValue))` 
which is simpler.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r206266243
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,23 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+val values = branches.map(_._2) :+ elseValue
+values.tail.forall(values.head.semanticEquals)
--- End diff --

I think what you suggested is the same as the implemented code in the PR, 
and `elseValue.deterministic` is not required since it is checked in 
`semanticEquals`.

Being said that, the whole condition can be replaced by 
`branches.exists(_._2.semanticEquals(elseValue))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r206000924
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,23 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
--- End diff --

@cloud-fan It sounds like what this #21904 proposes for?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r206000777
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,23 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+val values = branches.map(_._2) :+ elseValue
+values.tail.forall(values.head.semanticEquals)
--- End diff --

I think the case is: remove branches that have the same value of `else`:
```
branches.exists(_._2.semanticEquals(elseValue))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r206000642
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,23 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
--- End diff --

ah i see. Another optimization is: we can remove branches that have the 
same the condition. We can do it in next PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205946975
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,23 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
--- End diff --

We can not. When no `elseValue`, all the conditions are required to 
evaluated before hitting the default `elseValue` which is `null`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205924448
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,23 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
--- End diff --

can we apply this optimization when there is no elseValue? and have one 
more case to remove `CaseWhen` if it only has one branch and its value is the 
same as elseValue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205599224
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,29 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+val list = branches.map(_._2) :+ elseValue
+list.tail.forall(list.head.semanticEquals)
+  } =>
+// For non-deterministic conditions with side effect, we can not 
remove it, or change
+// the ordering. As a result, we try to remove the deterministic 
conditions from the tail.
+val newBranches = branches.map(_._1)
--- End diff --

@viirya I think this can address your concern. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205309619
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+val list = branches.map(_._2) :+ elseValue
+list.tail.forall(list.head.semanticEquals)
+  } =>
+// For non-deterministic conditions with side effect, we can not 
remove it.
+// Since the output of all the branches are semantic equivalence, 
`elseValue`
+// is picked for all the branches.
+val newBranches = 
branches.map(_._1).filter(!_.deterministic).map(cond => (cond, elseValue))
--- End diff --

All conds must be deterministic, otherwise a non deterministic one not run 
before can be run after this rule.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205306098
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,22 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+// With previous rules, it's guaranteed that there must be one 
branch.
--- End diff --

You're right. I removed the comment. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205305691
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala
 ---
@@ -122,4 +126,25 @@ class SimplifyConditionalSuite extends PlanTest with 
PredicateHelper {
 None),
   CaseWhen(normalBranch :: trueBranch :: Nil, None))
   }
+
+  test("remove entire CaseWhen if all the outputs are semantic 
equivalence") {
--- End diff --

Yes, I plan to add couple more tests tonight.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205303174
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala
 ---
@@ -122,4 +126,25 @@ class SimplifyConditionalSuite extends PlanTest with 
PredicateHelper {
 None),
   CaseWhen(normalBranch :: trueBranch :: Nil, None))
   }
+
+  test("remove entire CaseWhen if all the outputs are semantic 
equivalence") {
--- End diff --

We may need test case including non deterministic cond.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205303069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,22 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+// With previous rules, it's guaranteed that there must be one 
branch.
--- End diff --

Is this comment correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-23 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/21852

[SPARK-24893] [SQL] Remove the entire CaseWhen if all the outputs are 
semantic equivalence

## What changes were proposed in this pull request?

Similar to SPARK-24890, if all the outputs of `CaseWhen` are semantic 
equivalence, `CaseWhen` can be removed.

## How was this patch tested?

Tests added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbtsai/spark short-circuit-when

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21852.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21852


commit dc8de5fec6efc723ce01bcbaf0496c57b62e0ba2
Author: DB Tsai 
Date:   2018-07-23T19:15:58Z

remove casewhen if possible




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org