GitHub user brkyvz opened a pull request:
https://github.com/apache/spark/pull/19467
[SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning
## What changes were proposed in this pull request?
In EnsureStatefulOpPartitioning, we check that the inputRDD to a SparkPlan
has the expected partitioning for Streaming Stateful Operators. The problem is
that we are not allowed to access this information during planning.
The reason we added that check was because CoalesceExec could actually
create RDDs with 0 partitions. We should fix it such that when CoalesceExec
says that there is a SinglePartition, there is in fact an inputRDD of 1
partition instead of 0 partitions.
## How was this patch tested?
Regression test in StreamingQuerySuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/brkyvz/spark stateful-op
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19467.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19467
----
commit 961ade120f7a179751e5ec45b24e159259de0bae
Author: Burak Yavuz <[email protected]>
Date: 2017-10-10T22:02:48Z
Fix plan resolution bug caused by EnsureStatefulOpPartitioning
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]