Re: Error when Spark streaming consumes from Kafka

2015-02-02 Thread Greg Temchenko
Hi,

This seems not fixed yet.
I filed an issue in jira: https://issues.apache.org/jira/browse/SPARK-5505

Greg



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-Spark-streaming-consumes-from-Kafka-tp19570p21471.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Error when Spark streaming consumes from Kafka

2015-02-02 Thread Tathagata Das
This is an issue that is hard to resolve without rearchitecting the whole
Kafka Receiver. There are some workarounds worth looking into.

http://mail-archives.apache.org/mod_mbox/kafka-users/201312.mbox/%3CCAFbh0Q38qQ0aAg_cj=jzk-kbi8xwf+1m6xlj+fzf6eetj9z...@mail.gmail.com%3E

On Mon, Feb 2, 2015 at 1:07 PM, Greg Temchenko s...@dicefield.com wrote:

 Hi,

 This seems not fixed yet.
 I filed an issue in jira: https://issues.apache.org/jira/browse/SPARK-5505

 Greg



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-Spark-streaming-consumes-from-Kafka-tp19570p21471.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Error when Spark streaming consumes from Kafka

2015-02-02 Thread Dibyendu Bhattacharya
Or you can use this Low Level Kafka Consumer for Spark :
https://github.com/dibbhatt/kafka-spark-consumer

This is now part of http://spark-packages.org/ and is running successfully
for past few months in Pearson production environment . Being Low Level
consumer, it does not have this re-balancing issue which High Level
consumer have.

Also I know there are few who has shifted to this Low Level Consumer which
started giving them a better robust fault tolerant Kafka Receiver for Spark.

Regards,
Dibyendu

On Tue, Feb 3, 2015 at 3:57 AM, Tathagata Das tathagata.das1...@gmail.com
wrote:

 This is an issue that is hard to resolve without rearchitecting the whole
 Kafka Receiver. There are some workarounds worth looking into.


 http://mail-archives.apache.org/mod_mbox/kafka-users/201312.mbox/%3CCAFbh0Q38qQ0aAg_cj=jzk-kbi8xwf+1m6xlj+fzf6eetj9z...@mail.gmail.com%3E

 On Mon, Feb 2, 2015 at 1:07 PM, Greg Temchenko s...@dicefield.com wrote:

 Hi,

 This seems not fixed yet.
 I filed an issue in jira:
 https://issues.apache.org/jira/browse/SPARK-5505

 Greg



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-Spark-streaming-consumes-from-Kafka-tp19570p21471.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: Error when Spark streaming consumes from Kafka

2015-02-02 Thread Neelesh
We're planning to use this as well (Dibyendu's
https://github.com/dibbhatt/kafka-spark-consumer ). Dibyendu, thanks for
the efforts. So far its working nicely. I think there is merit in make it
the default Kafka Receiver for spark streaming.

-neelesh

On Mon, Feb 2, 2015 at 5:25 PM, Dibyendu Bhattacharya 
dibyendu.bhattach...@gmail.com wrote:

 Or you can use this Low Level Kafka Consumer for Spark :
 https://github.com/dibbhatt/kafka-spark-consumer

 This is now part of http://spark-packages.org/ and is running
 successfully for past few months in Pearson production environment . Being
 Low Level consumer, it does not have this re-balancing issue which High
 Level consumer have.

 Also I know there are few who has shifted to this Low Level Consumer which
 started giving them a better robust fault tolerant Kafka Receiver for Spark.

 Regards,
 Dibyendu

 On Tue, Feb 3, 2015 at 3:57 AM, Tathagata Das tathagata.das1...@gmail.com
  wrote:

 This is an issue that is hard to resolve without rearchitecting the whole
 Kafka Receiver. There are some workarounds worth looking into.


 http://mail-archives.apache.org/mod_mbox/kafka-users/201312.mbox/%3CCAFbh0Q38qQ0aAg_cj=jzk-kbi8xwf+1m6xlj+fzf6eetj9z...@mail.gmail.com%3E

 On Mon, Feb 2, 2015 at 1:07 PM, Greg Temchenko s...@dicefield.com
 wrote:

 Hi,

 This seems not fixed yet.
 I filed an issue in jira:
 https://issues.apache.org/jira/browse/SPARK-5505

 Greg



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-Spark-streaming-consumes-from-Kafka-tp19570p21471.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: Error when Spark streaming consumes from Kafka

2015-02-02 Thread Dibyendu Bhattacharya
Thanks Neelesh . Glad to know this Low Level Consumer is working for you.

Dibyendu

On Tue, Feb 3, 2015 at 8:06 AM, Neelesh neele...@gmail.com wrote:

 We're planning to use this as well (Dibyendu's
 https://github.com/dibbhatt/kafka-spark-consumer ). Dibyendu, thanks for
 the efforts. So far its working nicely. I think there is merit in make it
 the default Kafka Receiver for spark streaming.

 -neelesh

 On Mon, Feb 2, 2015 at 5:25 PM, Dibyendu Bhattacharya 
 dibyendu.bhattach...@gmail.com wrote:

 Or you can use this Low Level Kafka Consumer for Spark :
 https://github.com/dibbhatt/kafka-spark-consumer

 This is now part of http://spark-packages.org/ and is running
 successfully for past few months in Pearson production environment . Being
 Low Level consumer, it does not have this re-balancing issue which High
 Level consumer have.

 Also I know there are few who has shifted to this Low Level Consumer
 which started giving them a better robust fault tolerant Kafka Receiver for
 Spark.

 Regards,
 Dibyendu

 On Tue, Feb 3, 2015 at 3:57 AM, Tathagata Das 
 tathagata.das1...@gmail.com wrote:

 This is an issue that is hard to resolve without rearchitecting the
 whole Kafka Receiver. There are some workarounds worth looking into.


 http://mail-archives.apache.org/mod_mbox/kafka-users/201312.mbox/%3CCAFbh0Q38qQ0aAg_cj=jzk-kbi8xwf+1m6xlj+fzf6eetj9z...@mail.gmail.com%3E

 On Mon, Feb 2, 2015 at 1:07 PM, Greg Temchenko s...@dicefield.com
 wrote:

 Hi,

 This seems not fixed yet.
 I filed an issue in jira:
 https://issues.apache.org/jira/browse/SPARK-5505

 Greg



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-Spark-streaming-consumes-from-Kafka-tp19570p21471.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org







Re: Error when Spark streaming consumes from Kafka

2014-11-23 Thread Bill Jay
Hi Dibyendu,

Thank you for answer. I will try the Spark-Kafka consumer.

Bill

On Sat, Nov 22, 2014 at 9:15 PM, Dibyendu Bhattacharya 
dibyendu.bhattach...@gmail.com wrote:

 I believe this is something to do with how Kafka High Level API manages
 consumers within a Consumer group and how it re-balance during failure. You
 can find some mention in this Kafka wiki.

 https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design

 Due to various issues in Kafka High Level APIs, Kafka is moving the High
 Level Consumer API to a complete new set of API in Kafka 0.9.

 Other than this co-ordination issue, High Level consumer also has data
 loss issues.

 You can probably try this Spark-Kafka consumer which uses Low Level Simple
 consumer API which is more performant and have no data loss scenarios.

 https://github.com/dibbhatt/kafka-spark-consumer

 Regards,
 Dibyendu

 On Sun, Nov 23, 2014 at 2:13 AM, Bill Jay bill.jaypeter...@gmail.com
 wrote:

 Hi all,

 I am using Spark to consume from Kafka. However, after the job has run
 for several hours, I saw the following failure of an executor:

 kafka.common.ConsumerRebalanceFailedException: 
 group-1416624735998_ip-172-31-5-242.ec2.internal-1416648124230-547d2c31 
 can't rebalance after 4 retries
 
 kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:432)
 
 kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:722)
 
 kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:212)
 
 kafka.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:138)
 
 org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:114)
 
 org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)
 
 org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)
 
 org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)
 
 org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:257)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)


 Does anyone know the reason for this exception? Thanks!

 Bill





Error when Spark streaming consumes from Kafka

2014-11-22 Thread Bill Jay
Hi all,

I am using Spark to consume from Kafka. However, after the job has run for
several hours, I saw the following failure of an executor:

kafka.common.ConsumerRebalanceFailedException:
group-1416624735998_ip-172-31-5-242.ec2.internal-1416648124230-547d2c31
can't rebalance after 4 retries

kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:432)

kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:722)

kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:212)

kafka.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:138)

org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:114)

org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)

org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)

org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)

org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:257)

org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)


Does anyone know the reason for this exception? Thanks!

Bill


Re: Error when Spark streaming consumes from Kafka

2014-11-22 Thread Dibyendu Bhattacharya
I believe this is something to do with how Kafka High Level API manages
consumers within a Consumer group and how it re-balance during failure. You
can find some mention in this Kafka wiki.

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design

Due to various issues in Kafka High Level APIs, Kafka is moving the High
Level Consumer API to a complete new set of API in Kafka 0.9.

Other than this co-ordination issue, High Level consumer also has data loss
issues.

You can probably try this Spark-Kafka consumer which uses Low Level Simple
consumer API which is more performant and have no data loss scenarios.

https://github.com/dibbhatt/kafka-spark-consumer

Regards,
Dibyendu

On Sun, Nov 23, 2014 at 2:13 AM, Bill Jay bill.jaypeter...@gmail.com
wrote:

 Hi all,

 I am using Spark to consume from Kafka. However, after the job has run for
 several hours, I saw the following failure of an executor:

 kafka.common.ConsumerRebalanceFailedException: 
 group-1416624735998_ip-172-31-5-242.ec2.internal-1416648124230-547d2c31 can't 
 rebalance after 4 retries
 
 kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:432)
 
 kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:722)
 
 kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:212)
 
 kafka.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:138)
 
 org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:114)
 
 org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)
 
 org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)
 
 org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)
 
 org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:257)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)


 Does anyone know the reason for this exception? Thanks!

 Bill