GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15387
[SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll twice
## What changes were proposed in this pull request?
Kafka consumers can't subscribe or maintain heartbeat without polling, but
polling ordinarily consumes messages and adjusts position. We don't want this
on the driver, so we poll with a timeout of 0 and pause all topicpartitions.
Some consumer strategies that seek to particular positions have to poll
first, but they weren't pausing immediately thereafter. Thus, there was a race
condition where the second poll() in the DStream start method might actually
adjust consumer position.
Eliminated (or at least drastically reduced the chance of) the race
condition via pausing in the relevant consumer strategies, and assert on
startup that no messages were consumed.
## How was this patch tested?
I reliably reproduced the intermittent test failure by inserting a
thread.sleep directly before returning from SubscribePattern. The suggested
fix eliminated the failure.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/koeninger/spark-1 SPARK-17782
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15387.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15387
----
commit 1fc5863db88cac9dfd0be09318c4ca8779a51682
Author: cody koeninger <[email protected]>
Date: 2016-10-07T01:08:01Z
[SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll being
called twice and moving position
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]