[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r214505595 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1349,6 +1357,12 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Limit(IntegerLiteral(limit), LocalRelation(output, data, isStreaming)) => LocalRelation(output, data.take(limit), isStreaming) + +case Filter(condition, LocalRelation(output, data, isStreaming)) --- End diff -- super nit: comment in https://github.com/apache/spark/pull/22205/files#diff-a636a87d8843eeccca90140be91d4fafR1348 not change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22205 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213370527 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerRuleExclusionSuite.scala --- @@ -21,7 +21,9 @@ import org.apache.spark.sql.catalyst.dsl.expressions._ import org.apache.spark.sql.catalyst.dsl.plans._ import org.apache.spark.sql.catalyst.plans.PlanTest import org.apache.spark.sql.catalyst.plans.logical.LocalRelation +import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.OPTIMIZER_EXCLUDED_RULES +import org.apache.spark.util.Utils --- End diff -- Not needed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213370444 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala --- @@ -35,10 +36,18 @@ trait SharedSparkSession with Eventually { self: Suite => protected def sparkConf = { +val excludeRules = if (getClass.getName.startsWith("org.apache.spark.sql")) { + // exclude this rule because it reduces code coverage by collapsing whole + // query plan into LocalRelation + Seq(ConvertToLocalRelation.ruleName) +} else { + Nil +} new SparkConf() .set("spark.hadoop.fs.file.impl", classOf[DebugFilesystem].getName) .set("spark.unsafe.exceptionOnMemoryLeak", "true") .set(SQLConf.CODEGEN_FALLBACK.key, "false") + .set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, excludeRules.mkString(",")) --- End diff -- The same thing is needed for `TestHive` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213124828 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Limit(IntegerLiteral(limit), LocalRelation(output, data, isStreaming)) => LocalRelation(output, data.take(limit), isStreaming) + +case Filter(condition, LocalRelation(output, data, isStreaming)) +if !hasUnevaluableExpr(condition) => --- End diff -- I suppose it is fine in this case. The only thing is that it violates the contract of the optimizer: it should not change the results of a query. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213113632 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Limit(IntegerLiteral(limit), LocalRelation(output, data, isStreaming)) => LocalRelation(output, data.take(limit), isStreaming) + +case Filter(condition, LocalRelation(output, data, isStreaming)) +if !hasUnevaluableExpr(condition) => --- End diff -- I am fine about introducing this change since we already did it in Project(LocalRelation). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user bogdanrdc commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213108632 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Limit(IntegerLiteral(limit), LocalRelation(output, data, isStreaming)) => LocalRelation(output, data.take(limit), isStreaming) + +case Filter(condition, LocalRelation(output, data, isStreaming)) +if !hasUnevaluableExpr(condition) => --- End diff -- ConvertToLocalRelation was already doing this for Project so I assumed it's OK. what I did is exactly how Project(LocalRelation) works. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213105696 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Limit(IntegerLiteral(limit), LocalRelation(output, data, isStreaming)) => LocalRelation(output, data.take(limit), isStreaming) + +case Filter(condition, LocalRelation(output, data, isStreaming)) +if !hasUnevaluableExpr(condition) => --- End diff -- We blocks all these optimization in the other optimization rules, e.g., ConstantFolding. All the non-deterministic expressions are not foldable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user bogdanrdc commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213102214 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Limit(IntegerLiteral(limit), LocalRelation(output, data, isStreaming)) => LocalRelation(output, data.take(limit), isStreaming) + +case Filter(condition, LocalRelation(output, data, isStreaming)) +if !hasUnevaluableExpr(condition) => --- End diff -- That is OK, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213101120 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { case Limit(IntegerLiteral(limit), LocalRelation(output, data, isStreaming)) => LocalRelation(output, data.take(limit), isStreaming) + +case Filter(condition, LocalRelation(output, data, isStreaming)) +if !hasUnevaluableExpr(condition) => --- End diff -- If the condition is non-deterministic, the values will be always true after the plans are optimized. The DataFrame with non-deterministic filters will always return the same result. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213100428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -130,6 +130,10 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) // since the other rules might make two separate Unions operators adjacent. Batch("Union", Once, CombineUnions) :: +// run this once earlier. this might simplify the plan and reduce cost of optimizer --- End diff -- can you put your comment above into the comment in code --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user bogdanrdc commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r212918379 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -130,6 +130,10 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) // since the other rules might make two separate Unions operators adjacent. Batch("Union", Once, CombineUnions) :: +// run this once earlier. this might simplify the plan and reduce cost of optimizer --- End diff -- it makes the optimizer faster for short queries. see code above. a query such as `Filter(LocalRelation)`, without this change, would go through all the heavy optimizer rules. with this change, the query becomes just `LocalRelation` which doesn't trigger many rules. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r212894435 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -130,6 +130,10 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) // since the other rules might make two separate Unions operators adjacent. Batch("Union", Once, CombineUnions) :: +// run this once earlier. this might simplify the plan and reduce cost of optimizer --- End diff -- Hi @bogdanrdc , could explain more about this? Why is it necessary? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...
GitHub user bogdanrdc opened a pull request: https://github.com/apache/spark/pull/22205 [SPARK-25212][SQL] Support Filter in ConvertToLocalRelation ## What changes were proposed in this pull request? Support Filter in ConvertToLocalRelation, similar to how Project works. Additionally, in Optimizer, run ConvertToLocalRelation earlier to simplify the plan. This is good for very short queries which often are queries on local relations. ## How was this patch tested? New test. Manual benchmark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bogdanrdc/spark local-relation-filter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22205.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22205 commit 2a1cd27d7f188b51429553718cac91d87d667bbd Author: Bogdan Raducanu Date: 2018-08-23T13:17:11Z optimization + test commit 421ee20e27c6d1ba9384528c765690103f407d34 Author: Bogdan Raducanu Date: 2018-08-23T15:07:18Z debug benchmark + early batch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org