[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 If you're worried about it then accept the alternative PR I linked. On Sun, Oct 9, 2016 at 11:37 PM, Shixiong Zhu wrote: > During the original implementation I had verified th

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-09 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15387 > During the original implementation I had verified that calling pause kills the internal message buffer, which is one of the complications leading to a cached consumer per partition. I obs

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 Let me know if you guys like that alternative PR better --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 If the concern is TD's comment, "Future calls to {@link #poll(long)} will not return any records from these partitions until they have been resumed using {@link #resume(Collection)}." http

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 Poll also isn't going to return you just messages for a single topicpartition, so to do what you're suggesting you'd have to go through all the messages and do additional processing, not jus

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 You dont want poll consuming messages, its not just about offset correctness, the driver shouldnt be spending time or bandwidth doing that. What is the substantive concern with this solution

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15387 #15397 is the fix for structured streaming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15387 @koeninger If `poll(0)` returns a non empty results, how about just subtract the offsets back using the result size? Then we don't need to count on `poll(0)` not updating the offsets. --- If your

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 I set auto commit to false, and still recreated the test failure. That makes sense to me, consumer position should still be getting updated in memory even if it isn't saved to storage a

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15387 > but polling ordinarily consumes messages and adjusts position. Even if `enable.auto.commit` is `false`? In the doc, it says `automatically set as the last committed offset`, so I guess set

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 I'm not going to say anything is impossible, which is the point of the assert. If it does somehow happen, it will be at start, so should be obvious. The whole poll 0 / pause thing i

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15387 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66479/ Test PASSed. ---

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15387 **[Test build #66479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66479/consoleFull)** for PR 15387 at commit [`aca55de`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15387 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15387 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66477/ Test PASSed. ---

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15387 **[Test build #66477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66477/consoleFull)** for PR 15387 at commit [`1fc5863`](https://github.com/apache/spark/commit/

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15387 **[Test build #66479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66479/consoleFull)** for PR 15387 at commit [`aca55de`](https://github.com/apache/spark/commit/a

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15387 **[Test build #66477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66477/consoleFull)** for PR 15387 at commit [`1fc5863`](https://github.com/apache/spark/commit/1