[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-31 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r214505595
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1349,6 +1357,12 @@ object ConvertToLocalRelation extends 
Rule[LogicalPlan] {
 
 case Limit(IntegerLiteral(limit), LocalRelation(output, data, 
isStreaming)) =>
   LocalRelation(output, data.take(limit), isStreaming)
+
+case Filter(condition, LocalRelation(output, data, isStreaming))
--- End diff --

super nit: comment in 
https://github.com/apache/spark/pull/22205/files#diff-a636a87d8843eeccca90140be91d4fafR1348
 not change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22205


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213370527
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerRuleExclusionSuite.scala
 ---
@@ -21,7 +21,9 @@ import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.internal.SQLConf.OPTIMIZER_EXCLUDED_RULES
+import org.apache.spark.util.Utils
--- End diff --

Not needed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213370444
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala ---
@@ -35,10 +36,18 @@ trait SharedSparkSession
   with Eventually { self: Suite =>
 
   protected def sparkConf = {
+val excludeRules = if 
(getClass.getName.startsWith("org.apache.spark.sql")) {
+  // exclude this rule because it reduces code coverage by collapsing 
whole
+  // query plan into LocalRelation
+  Seq(ConvertToLocalRelation.ruleName)
+} else {
+  Nil
+}
 new SparkConf()
   .set("spark.hadoop.fs.file.impl", classOf[DebugFilesystem].getName)
   .set("spark.unsafe.exceptionOnMemoryLeak", "true")
   .set(SQLConf.CODEGEN_FALLBACK.key, "false")
+  .set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, 
excludeRules.mkString(","))
--- End diff --

The same thing is needed for `TestHive`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213124828
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends 
Rule[LogicalPlan] {
 
 case Limit(IntegerLiteral(limit), LocalRelation(output, data, 
isStreaming)) =>
   LocalRelation(output, data.take(limit), isStreaming)
+
+case Filter(condition, LocalRelation(output, data, isStreaming))
+if !hasUnevaluableExpr(condition) =>
--- End diff --

I suppose it is fine in this case. The only thing is that it violates the 
contract of the optimizer: it should not change the results of a query.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213113632
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends 
Rule[LogicalPlan] {
 
 case Limit(IntegerLiteral(limit), LocalRelation(output, data, 
isStreaming)) =>
   LocalRelation(output, data.take(limit), isStreaming)
+
+case Filter(condition, LocalRelation(output, data, isStreaming))
+if !hasUnevaluableExpr(condition) =>
--- End diff --

I am fine about introducing this change since we already did it in 
Project(LocalRelation). 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread bogdanrdc
Github user bogdanrdc commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213108632
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends 
Rule[LogicalPlan] {
 
 case Limit(IntegerLiteral(limit), LocalRelation(output, data, 
isStreaming)) =>
   LocalRelation(output, data.take(limit), isStreaming)
+
+case Filter(condition, LocalRelation(output, data, isStreaming))
+if !hasUnevaluableExpr(condition) =>
--- End diff --

ConvertToLocalRelation was already doing this for Project so I assumed it's 
OK. what I did is exactly how Project(LocalRelation) works.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213105696
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends 
Rule[LogicalPlan] {
 
 case Limit(IntegerLiteral(limit), LocalRelation(output, data, 
isStreaming)) =>
   LocalRelation(output, data.take(limit), isStreaming)
+
+case Filter(condition, LocalRelation(output, data, isStreaming))
+if !hasUnevaluableExpr(condition) =>
--- End diff --

We blocks all these optimization in the other optimization rules, e.g., 
ConstantFolding. All the non-deterministic expressions are not foldable. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread bogdanrdc
Github user bogdanrdc commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213102214
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends 
Rule[LogicalPlan] {
 
 case Limit(IntegerLiteral(limit), LocalRelation(output, data, 
isStreaming)) =>
   LocalRelation(output, data.take(limit), isStreaming)
+
+case Filter(condition, LocalRelation(output, data, isStreaming))
+if !hasUnevaluableExpr(condition) =>
--- End diff --

That is OK, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213101120
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends 
Rule[LogicalPlan] {
 
 case Limit(IntegerLiteral(limit), LocalRelation(output, data, 
isStreaming)) =>
   LocalRelation(output, data.take(limit), isStreaming)
+
+case Filter(condition, LocalRelation(output, data, isStreaming))
+if !hasUnevaluableExpr(condition) =>
--- End diff --

If the condition is non-deterministic, the values will be always true after 
the plans are optimized. The DataFrame with non-deterministic filters will 
always return the same result. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213100428
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -130,6 +130,10 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
 //   since the other rules might make two separate Unions operators 
adjacent.
 Batch("Union", Once,
   CombineUnions) ::
+// run this once earlier. this might simplify the plan and reduce cost 
of optimizer
--- End diff --

can you put your comment above into the comment in code


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread bogdanrdc
Github user bogdanrdc commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r212918379
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -130,6 +130,10 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
 //   since the other rules might make two separate Unions operators 
adjacent.
 Batch("Union", Once,
   CombineUnions) ::
+// run this once earlier. this might simplify the plan and reduce cost 
of optimizer
--- End diff --

it makes the optimizer faster for short queries. see code above. a query 
such as `Filter(LocalRelation)`, without this change, would go through all the 
heavy optimizer rules. with this change, the query becomes just `LocalRelation` 
which doesn't trigger many rules.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r212894435
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -130,6 +130,10 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
 //   since the other rules might make two separate Unions operators 
adjacent.
 Batch("Union", Once,
   CombineUnions) ::
+// run this once earlier. this might simplify the plan and reduce cost 
of optimizer
--- End diff --

Hi @bogdanrdc , could explain more about this?  Why is it necessary?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-23 Thread bogdanrdc
GitHub user bogdanrdc opened a pull request:

https://github.com/apache/spark/pull/22205

[SPARK-25212][SQL] Support Filter in ConvertToLocalRelation

## What changes were proposed in this pull request?
Support Filter in ConvertToLocalRelation, similar to how Project works.
Additionally, in Optimizer, run ConvertToLocalRelation earlier to simplify 
the plan. This is good for very short queries which often are queries on local 
relations.


## How was this patch tested?
New test. Manual benchmark.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bogdanrdc/spark local-relation-filter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22205.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22205


commit 2a1cd27d7f188b51429553718cac91d87d667bbd
Author: Bogdan Raducanu 
Date:   2018-08-23T13:17:11Z

optimization + test

commit 421ee20e27c6d1ba9384528c765690103f407d34
Author: Bogdan Raducanu 
Date:   2018-08-23T15:07:18Z

debug benchmark + early batch




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org