Hello, I have asked the question on stackoverflow as well here http://stackoverflow.com/questions/39737201/spark-kafka-per-topic-number-of-partitions-map-not-honored
I am confused about the "per topic number of partitions" parameter when creating a inputDstream using KafkaUtils.createStream(...) method. I am pasting the question here, please help. >From [Spark Documentation][1] > parameter topicMap of KafkaUtils.createStream(...) method determines "per-topic number of Kafka partitions to consume" [Javadoc here][2] So, when I created a kafka topic with 3 partitions and started a spark receiver as Map<String, Integer> topicMap = new HashMap<>(); topicMap.put(topic, 1); JavaPairReceiverInputDStream<String,String> inputDStream = KafkaUtils.createStream(javaStreamingContext, zookeeperQuorum, groupId, topicMap); I expected this receiver to receive messages from ONLY one partition of the 3 partitions that I created. However, when I check the offset checker, I see the following: Pid Offset logSize Lag Owner 0 9 9 0 none 1 11 11 0 none 2 7 7 0 none I expected this code to receive messages from one partition and then I thought I needed to start more receivers (one per partition) as given in the [documentation here][3] to cover all Kafka topic partitions. int numStreams = 3; List<JavaPairDStream<String, String>> kafkaStreams = new ArrayList<>(numStreams); for (int i = 0; i < numStreams; i++) { kafkaStreams.add(KafkaUtils.createStream(...)); } JavaPairDStream<String, String> unifiedStream = streamingContext.union(kafkaStreams.get(0), kafkaStreams.subList(1, kafkaStreams.size())); So, my question is can one receiver receive messages from all partitions? If so, what in the world does the topicMap(topic -> numPartitions) mean? [1]: http://spark.apache.org/docs/latest/streaming-kafka-integration.html [2]: http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html [3]: http://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers