GitHub user jose-torres opened a pull request:
https://github.com/apache/spark/pull/21239
[SPARK-24040][SS] Support single partition aggregates in continuous
processing.
## What changes were proposed in this pull request?
Support aggregates with exactly 1 partition in continuous processing.
A few small tweaks are needed to make this work:
* Replace currentEpoch tracking with an ThreadLocal. This means that
current epoch is scoped to a task rather than a node, but I think that's
sustainable even once we add shuffle.
* Add a new testing-only flag to disable the UnsupportedOperationChecker
whitelist of allowed continuous processing nodes. I think this is preferable to
writing a pile of custom logic to enforce that there is in fact only 1
partition; we plan to support multi-partition aggregates before the next Spark
release, so we'd just have to tear that logic back out.
* Restart continuous processing queries from the first available
uncommitted epoch, rather than one that's guaranteed to be unused. This is
required for stateful operators to overwrite partial state from the previous
attempt at the epoch, and there was no specific motivation for the original
strategy. In another PR before stabilizing the StreamWriter API, we'll need to
narrow down and document more precise semantic guarantees for the epoch IDs.
## How was this patch tested?
new unit tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jose-torres/spark withAggr
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21239.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21239
----
commit c620978f98cda9b178afb08b87041d6154d5edd0
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-05-04T21:34:22Z
rebase on master
commit 4dbc10db1acae62e415c12da2d21dd2428692a7d
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-05-04T21:38:26Z
suite got left out of commit
commit 9b4aecd01951eb9c31671535e2a081a484a39d58
Author: Jose Torres <torres.joseph.f+github@...>
Date: 2018-05-04T22:15:48Z
move to EpochTracker
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]