GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/14214
[SPARK-16545][SQL] Eliminate one unnecessary round of physical planning in ForeachSink ## Problem As reported by [SPARK-16545](https://issues.apache.org/jira/browse/SPARK-16545), in `ForeachSink` we have initialized 3 rounds of physical planning. Specifically: [1] In `StreamExecution`, [lastExecution.executedPlan](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L369) [2] In `ForeachSink`, [forearchPartition()](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L69) calls withNewExecutionId(..., **_queryExection_**) which further calls [**_queryExecution_**.executedPlan](https://github.com/apache/spark/blob/9a5071996b968148f6b9aba12e0d3fe888d9acd8/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L55) [3] In `ForeachSink`, [val rdd = { ... incrementalExecution = new IncrementalExecution ...}](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L53) ## What changes were proposed in this pull request? [1] should not be eliminated in general; **[2] is eliminated by this patch, by replacing the `queryExecution` with `incrementalExecution` provided by [3];** [3] should be eliminated but can not be done at this stage; let's revisit it when SPARK-16264 is resolved. ## How was this patch tested? - checked manually now there are only 2 rounds of physical planning in ForeachSink after this patch - existing tests ensues it cause no regression You can merge this pull request into a Git repository by running: $ git pull https://github.com/lw-lin/spark physical-3x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14214.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14214 ---- commit 8ec635fe7403baf5149e3f6714872bf706b37cd7 Author: Liwei Lin <lwl...@gmail.com> Date: 2016-07-15T02:12:02Z Fix foreachPartition ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org