Hi, Usual general questions are: -- what is your Spark version? -- what is your Kafka version? -- do you use "standard" Kafka consumer or try to implement something custom (your own multi-threaded consumer)?
The freshest docs https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html AFAIK, yes, you should use unique group id for each stream (KAFKA 0.10 !!!) > kafkaParams.put("group.id", "use_a_separate_group_id_for_each_stream"); > > On Sun, Dec 11, 2016 at 5:51 PM, Anton Okolnychyi < [email protected]> wrote: > Hi, > > I am experimenting with Spark Streaming and Kafka. I will appreciate if > someone can say whether the following assumption is correct. > > If I have multiple computations (each with its own output) on one stream > (created as KafkaUtils.createDirectStream), then there is a chance to > have ConcurrentModificationException: KafkaConsumer is not safe for > multi-threaded access. To solve this problem, I should create a new stream > with different "group.id" for each computation. > > Am I right? > > Best regards, > Anton >
