[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-02-04 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72949130 I can merge this for now and we can focus on that issue later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-02-04 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3655 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-02-04 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72959021 Agreed. I will file a jira. We should discuss the issue there. This one was more of a break-fix, but that would be a more elaborate fix. --- If your project is

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-02-03 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72784543 ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-02-03 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72784728 @harishreedharan This begs a higher level questions of whether the write ahead log (which is the probably component to fail) should have its own retries independent of the

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-02-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72789875 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-02-03 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-72784691 [Test build #26710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26710/consoleFull) for PR 3655 at commit

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-01-14 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-70015327 No, this does prevent data loss - basically if the store fails multiple times, we shutdown the receiver completely. So the new receiver which gets started starts

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-01-12 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-69684611 Hi @harishreedharan , After carefully looking at the code, I think data will not be lost even in such failure situation. For example, if we meet exception in

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-01-12 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-69684946 So I think the aim of this patch is to fix the recoverable problems of data store with retries, not prevent data loss. That's my thought, sorry for my

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-01-09 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-69384366 @tdas Any comments on this one, or is this one ready to go in? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-12 Thread harishreedharan
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/3655#discussion_r21766800 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala --- @@ -201,12 +201,31 @@ class

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-09 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-66393421 I messed up the jira number in the commit. Please fix it when merging. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-09 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-66398284 Thanks Hari, seems this is a simple solution. BTW should we make `count = 3` as a configurable parameter? For others LGTM. Original thoughts of introducing

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-66399105 [Test build #24282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24282/consoleFull) for PR 3655 at commit

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3655#issuecomment-66399109 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2014-12-09 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3655#discussion_r21586197 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala --- @@ -201,12 +201,31 @@ class ReliableKafkaReceiver[