GitHub user harishreedharan opened a pull request:
https://github.com/apache/spark/pull/3655
[SPARK-4704][STREAMING] Reliable Kafka Receiver can lose data if the blo...
...ck generator fails to store data.
The Reliable Kafka Receiver commits offsets only when events are actually
stored, which ensures that on restart we will actually start where we left off.
But if the failure happens in the store() call, and the block generator reports
an error the receiver does not do anything and will continue reading from the
current offset and not the last commit. This means that messages between the
last commit and the current offset will be lost.
This PR retries the store call four times and then stops the receiver with
an error message and the last exception that was received from the store.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/harishreedharan/spark kafka-failure-fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3655.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3655
----
commit 5e2e7ad479d2739c4f1bd62fd1d48b216b2bdce0
Author: Hari Shreedharan <[email protected]>
Date: 2014-12-10T01:44:39Z
[SPARK-4704][STREAMING] Reliable Kafka Receiver can lose data if the block
generator fails to store data.
The Reliable Kafka Receiver commits offsets only when events are actually
stored, which ensures that on restart we will actually start where we left off.
But if the failure happens in the store() call, and the block generator reports
an error the receiver does not do anything and will continue reading from the
current offset and not the last commit. This means that messages between the
last commit and the current offset will be lost.
This PR retries the store call four times and then stops the receiver with
an error message and the last exception that was received from the store.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]