GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/21701
[SPARK-24730][SS] Add policy to choose max as global watermark when
streaming query has multiple watermarks
## What changes were proposed in this pull request?
Currently, when a streaming query has multiple watermark, the policy is to
choose the min of them as the global watermark. This is safe to do as the
global watermark moves with the slowest stream, and is therefore is safe as it
does not unexpectedly drop some data as late, etc. While this is indeed the
safe thing to do, in some cases, you may want the watermark to advance with the
fastest stream, that is, take the max of multiple watermarks. This PR is to add
that configuration. It makes the following changes.
- Adds a configuration to specify max as the policy.
- Saves the configuration in OffsetSeqMetadata because changing it in the
middle can lead to unpredictable results.
- For old checkpoints without the configuration, it assumes the default
policy as min (irrespective of the policy set at the session where the query is
being restarted). This is to ensure that existing queries are affected in any
way.
- [ ] Add a test for recovery from existing checkpoints.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-24730
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21701.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21701
----
commit c0d1c6e0a5532eeab0848834d2dc348808e54069
Author: Tathagata Das <tathagata.das1565@...>
Date: 2018-07-03T04:28:05Z
Implemented
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]