Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-17 Thread Matthias Niehoff
heartbeat.interval.ms default group.max.session.timeout.ms default session.timeout.ms 6 default values as of kafka 0.10. complete Kafka params: val kafkaParams = Map[String, String]( "bootstrap.servers" -> kafkaBrokers, "auto.offset.reset" -> "latest", "enable.auto.commit" -> "false",

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-14 Thread Cody Koeninger
For you or anyone else having issues with consumer rebalance, what are your settings for heartbeat.interval.ms session.timeout.ms group.max.session.timeout.ms relative to your batch time? On Tue, Oct 11, 2016 at 10:19 AM, static-max wrote: > Hi, > > I run into the same exception > (org.apache.k

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Cody Koeninger
The new underlying kafka consumer prefetches data and is generally heavier weight, so it is cached on executors. Group id is part of the cache key. I assumed kafka users would use different group ids for consumers they wanted to be distinct, since otherwise would cause problems even with the norma

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
good point, I changed the group id to be unique for the separate streams and now it works. Thanks! Although changing this is ok for us, i am interested in the why :-) With the old connector this was not a problem nor is it afaik with the pure kafka consumer api 2016-10-11 14:30 GMT+02:00 Cody Koe

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Cody Koeninger
Just out of curiosity, have you tried using separate group ids for the separate streams? On Oct 11, 2016 4:46 AM, "Matthias Niehoff" wrote: > I stripped down the job to just consume the stream and print it, without > avro deserialization. When I only consume one topic, everything is fine. As > s

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
I re-ran the job with DEBUG Log Level for org.apache.spark, kafka.consumer and org.apache.kafka. Please find the output here: http://pastebin.com/VgtRUQcB most of the delay is introduced by *16/10/11 13:20:12 DEBUG RecurringTimer: Callback for JobGenerator called at time x*, which repeats multiple

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
I stripped down the job to just consume the stream and print it, without avro deserialization. When I only consume one topic, everything is fine. As soon as I add a second stream it stucks after about 5 minutes. So I basically bails down to: val kafkaParams = Map[String, String]( "bootstrap

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Matthias Niehoff
This Job will fail after about 5 minutes: object SparkJobMinimal { //read Avro schemas var stream = getClass.getResourceAsStream("/avro/AdRequestLog.avsc") val avroSchemaAdRequest = scala.io.Source.fromInputStream(stream).getLines.mkString stream.close stream = getClass.getResourceAsSt

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-10 Thread Matthias Niehoff
Yes, without commiting the data the consumer rebalances. The job consumes 3 streams process them. When consuming only one stream it runs fine. But when consuming three streams, even without joining them, just deserialize the payload and trigger an output action it fails. I will prepare code sample

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-06 Thread Cody Koeninger
OK, so at this point, even without involving commitAsync, you're seeing consumer rebalances after a particular batch takes longer than the session timeout? Do you have a minimal code example you can share? On Tue, Oct 4, 2016 at 2:18 AM, Matthias Niehoff wrote: > Hi, > sry for the late reply. A

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-04 Thread Matthias Niehoff
Hi, sry for the late reply. A public holiday in Germany. Yes, its using a unique group id which no other job or consumer group is using. I have increased the session.timeout to 1 minutes and set the max.poll.rate to 1000. The processing takes ~1 second. 2016-09-29 4:41 GMT+02:00 Cody Koeninger :

Re: Problems with new experimental Kafka Consumer for 0.10

2016-09-28 Thread Cody Koeninger
Well, I'd start at the first thing suggested by the error, namely that the group has rebalanced. Is that stream using a unique group id? On Wed, Sep 28, 2016 at 5:17 AM, Matthias Niehoff wrote: > Hi, > > the stacktrace: > > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot b

Re: Problems with new experimental Kafka Consumer for 0.10

2016-09-28 Thread Matthias Niehoff
Hi, the stacktrace: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.

Re: Problems with new experimental Kafka Consumer for 0.10

2016-09-27 Thread Cody Koeninger
What's the actual stacktrace / exception you're getting related to commit failure? On Tue, Sep 27, 2016 at 9:37 AM, Matthias Niehoff wrote: > Hi everybody, > > i am using the new Kafka Receiver for Spark Streaming for my Job. When > running with old consumer it runs fine. > > The Job consumes 3 T

Problems with new experimental Kafka Consumer for 0.10

2016-09-27 Thread Matthias Niehoff
Hi everybody, i am using the new Kafka Receiver for Spark Streaming for my Job. When running with old consumer it runs fine. The Job consumes 3 Topics, saves the data to Cassandra, cogroups the topic, calls mapWithState and stores the results in cassandra. After that I manually commit the Kafka o