[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-02-07 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/16245


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211716
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

Hmm, actually that's what I mean, probably some confusing with 
`non-deterministic` with `non-foldable`? I think we can skip them both in a 
short cut evaluation. as those expressions are not `stateful`(unfortunately, 
Spark SQL expression doesn't have the concept of `stateful`), so skip the 
evaluation of them are harmless, and this is exactly the short cut logic of 
expression `AND`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211599
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

yes. however, if the first expression in the `AND` is `non-deterministic`, 
skipping it might change its next evaluation. so we can only reorder the 
deterministic expressions after non-deterministic expressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211489
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

I mean `(rand() > 0) && b)` should equals to `b && (rand() >0)`, and even, 
the latter probably has better performance, due to the short cut evaluation of 
`AND`. isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

Reordering non-deterministic expressions might change the evaluation 
results. I think `foldable` expressions are handled by other rule already. And 
I remember we don't have explicit `stateful` expression or it is classified as 
`non-deterministic` too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211330
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

I am a little confused why we need to separate the `non-deterministic`? 
Should be the `stateful` or `foldable`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2016-12-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r91886133
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,25 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case Filter(pred, child) =>
+  // Reverses the expressions to get the suffix deterministic 
expressions in the predicate.
+  // E.g., the original expressions are 'a > 1, rand(0), 'b > 2, 'c > 
3.
+  // The reversed expressions are 'c > 3, 'b > 2, rand(0), 'a > 1.
+  // The suffix deterministic expressions are 'c > 3, 'b > 2.
+  val (deterministicExprs, others) = 
splitConjunctivePredicates(pred).reverse
--- End diff --

the split is widely used in optimizer. i think i may rewrite this reverse 
and span to alleviate performance concern.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2016-12-11 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r91881557
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,25 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case Filter(pred, child) =>
+  // Reverses the expressions to get the suffix deterministic 
expressions in the predicate.
+  // E.g., the original expressions are 'a > 1, rand(0), 'b > 2, 'c > 
3.
+  // The reversed expressions are 'c > 3, 'b > 2, rand(0), 'a > 1.
+  // The suffix deterministic expressions are 'c > 3, 'b > 2.
+  val (deterministicExprs, others) = 
splitConjunctivePredicates(pred).reverse
--- End diff --

how is the performance of this split, reverse, and span?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org