Hi all, I have been using Storm(0.10.0-beta) with Kafka(0.8.2) for building real time data ingestion system.
The Kafka topic on which input messages arrive, is just having single partition and replication factor of 1 for the topic. The problem I am facing is I am seeing duplicate messages read from spouts. The number of duplicates is same as number of workers/machines I have in the storm cluster. When debugged, I found duplicate kafkaSpout instances which are reading from same topic from Kakfa, below is the config for the spout. builder.setSpout(spoutId, new KafkaSpout(kafkaConfig),1) .setNumTasks(1) .setMaxTaskParallelism(1) Even with above config, I can see multiple instances of the spout running which is consuming from same topic on kafka, resulting in duplicate messages. I tried setting number of workers for the topology to 1, as, config.setNumWorkers(1) Even with above configuration, there are still 3 instances of the spout running. The topology works fine, i.e. no duplicate messages are read from Kafka topic when it is run in local mode, but duplicate messages are read when the storm topology is run in distributed mode, even if number of workers are set to 1 for the topology. This <https://groups.google.com/forum/#!searchin/storm-user/kafkaspout/storm-user/vzAlIhAOntw/yo_-rUs8cj0J> questions mentions similar problem being solved with using KafkaSpout from storm-kafka package, unfortunately its not working for me. Similar <https://www.quora.com/If-I-increase-the-parallelism-of-a-Kafka-spout-in-my-storm-topology-how-can-I-stop-it-from-reading-the-same-message-in-a-topic-multiple-times> question on Quora, says just setting the parallelism hint should work. Question on stackoverflow <http://stackoverflow.com/questions/18267834/storm-kafka-multiple-spouts-how-to-share-the-load> mentions multiple Kafka Spout achieved with parallelism hint again. Unfortunately none of the above suggestions are working out for me as expected. How can I make sure my KafkaSpout reads no duplicate messages or how can I create just single instance for the spout? Thanks & Regards, Rahul Kavale
