GitHub user zsxwing opened a pull request:
https://github.com/apache/spark/pull/22207
[SPARK-25214][SS]Fix the issue that Kafka v2 source may return duplicated
records when `failOnDataLoss=false`
## What changes were proposed in this pull request?
When there are missing offsets, Kafka v2 source may return duplicated
records when `failOnDataLoss=false`.
This PR fixes the issue and also adds regression tests for all Kafka
readers.
## How was this patch tested?
New tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zsxwing/spark SPARK-25214
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22207.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22207
----
commit f2d4d67c765a298d23964b26ec07596839f008fa
Author: Shixiong Zhu <zsxwing@...>
Date: 2018-08-23T17:46:52Z
Fix the issue that Kafka v2 source may return duplicated records when is
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]