GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/14214
[SPARK-16545][SQL] Eliminate one unnecessary round of physical planning in
ForeachSink
## Problem
As reported by
[SPARK-16545](https://issues.apache.org/jira/browse/SPARK-16545), in
`ForeachSink` we have initialized 3 rounds of physical planning.
Specifically:
[1] In `StreamExecution`,
[lastExecution.executedPlan](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L369)
[2] In `ForeachSink`,
[forearchPartition()](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L69)
calls withNewExecutionId(..., **_queryExection_**) which further calls
[**_queryExecution_**.executedPlan](https://github.com/apache/spark/blob/9a5071996b968148f6b9aba12e0d3fe888d9acd8/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L55)
[3] In `ForeachSink`, [val rdd = { ... incrementalExecution = new
IncrementalExecution
...}](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L53)
## What changes were proposed in this pull request?
[1] should not be eliminated in general;
**[2] is eliminated by this patch, by replacing the `queryExecution` with
`incrementalExecution` provided by [3];**
[3] should be eliminated but can not be done at this stage; let's revisit
it when SPARK-16264 is resolved.
## How was this patch tested?
- checked manually now there are only 2 rounds of physical planning in
ForeachSink after this patch
- existing tests ensues it cause no regression
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/lw-lin/spark physical-3x
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14214.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14214
----
commit 8ec635fe7403baf5149e3f6714872bf706b37cd7
Author: Liwei Lin <[email protected]>
Date: 2016-07-15T02:12:02Z
Fix foreachPartition
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]