GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/16547
[SPARK-19168][Structured Streaming] Improvement: filter late data using
watermark for `Append` mode
## What changes were proposed in this pull request?
Currently we're filtering late data using watermark for `Update` mode;
maybe we should do the same for `Append` mode.
Note this is an improvement rather than correctness fix, because the
current behavior of `Append` mode is quite correct even without this.
## How was this patch tested?
commit #1 of this patch added `numRowsUpdated` checks in
`EventTimeWatermarkSuite.scala`:
```scala
line 139: AddData(inputData, 10),
line 140: CheckLastBatch(),
line 141: assertNumStateRows(2, 1) // We also processed the data 10,
which is less than watermark
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/lw-lin/spark append-filter
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16547.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16547
----
commit 86d088687b7da7f33022ee8693cc3fdd9228775b
Author: Liwei Lin <[email protected]>
Date: 2017-01-11T07:15:43Z
Also examine `numRowsUpdated` in test
commit 2a91e6f8612c01b61a4d501b22fbf2690fa36f4a
Author: Liwei Lin <[email protected]>
Date: 2017-01-11T07:21:33Z
Filter data less than watermark in `Append` mode
commit c9f62c161ad94af35ad237673c85428ce6094ac5
Author: Liwei Lin <[email protected]>
Date: 2017-01-11T07:38:02Z
Fix an error message
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]