Re: Continuous warning while consuming using new kafka-spark010 API

2016-11-04 Thread Cody Koeninger
I answered the duplicate post on the user mailing list, I'd say keep
the discussion there.

On Fri, Nov 4, 2016 at 12:14 PM, vonnagy <i...@vadio.com> wrote:
> Nitin,
>
> I am getting the similar issues using Spark 2.0.1 and Kafka 0.10. I have to
> jobs, one that uses a Kafka stream and one that uses just the KafkaRDD.
>
> With the KafkaRDD, I continually get the "Failed to get records". I have
> adjusted the polling with `spark.streaming.kafka.consumer.poll.ms` and the
> size of records with Kafka's `max.poll.records`. Even when it gets records
> it is extremely slow.
>
> When working with multiple KafkaRDDs in parallel I get the dreaded
> `ConcurrentModificationException`. The Spark logic is supposed to use a
> CachedKafkaConsumer based on the topic and partition. This is supposed to
> guarantee thread safety, but I continually get this error along with the
> polling timeout.
>
> Has anyone else tried to use Spark 2 with Kafka 0.10 and had any success. At
> this point it is completely useless in my experience. With Spark 1.6 and
> Kafka 0.8.x, I never had these problems.
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Continuous-warning-while-consuming-using-new-kafka-spark010-API-tp18987p19736.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Continuous warning while consuming using new kafka-spark010 API

2016-11-04 Thread vonnagy
Nitin,

I am getting the similar issues using Spark 2.0.1 and Kafka 0.10. I have to
jobs, one that uses a Kafka stream and one that uses just the KafkaRDD. 

With the KafkaRDD, I continually get the "Failed to get records". I have
adjusted the polling with `spark.streaming.kafka.consumer.poll.ms` and the
size of records with Kafka's `max.poll.records`. Even when it gets records
it is extremely slow.

When working with multiple KafkaRDDs in parallel I get the dreaded
`ConcurrentModificationException`. The Spark logic is supposed to use a
CachedKafkaConsumer based on the topic and partition. This is supposed to
guarantee thread safety, but I continually get this error along with the
polling timeout.

Has anyone else tried to use Spark 2 with Kafka 0.10 and had any success. At
this point it is completely useless in my experience. With Spark 1.6 and
Kafka 0.8.x, I never had these problems.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Continuous-warning-while-consuming-using-new-kafka-spark010-API-tp18987p19736.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Continuous warning while consuming using new kafka-spark010 API

2016-09-19 Thread Nitin Goyal
Hi All,

I am using the new kafka-spark010 API to consume messages from Kafka
(brokers running kafka 0.10.0.1).

I am seeing continuous following warning only when producer is writing
messages to kafka in parallel (increased
spark.streaming.kafka.consumer.poll.ms to 1024 ms as well) :-

16/09/19 16:44:53 WARN TaskSetManager: Lost task 97.0 in stage 32.0 (TID
4942, host-3): java.lang.AssertionError: assertion failed: Failed to get
records for spark-executor-example topic2 8 1052989 after polling for 1024

while at same time, I see this in spark UI corresponding to that job
topic: topic2partition: 8offsets: 1051731 to 1066124

Code :-

val stream = KafkaUtils.createDirectStream[String, String]( ssc,
PreferConsistent, Subscribe[String, String](topics, kafkaParams.asScala) )

stream.foreachRDD {rdd => rdd.filter(_ => false).collect}


Has anyone encountered this with the new API? Is this the expected
behaviour or am I missing something here?

-- 
Regards
Nitin Goyal