HI Bill, You don't need to match the number of thread to the number of partitions in the specific topic, for example, you have 3 partitions in topic1, but you only set 2 threads, ideally 1 thread will receive 2 partitions and another thread for the left one partition, it depends on the scheduling of Kafka itself, basically the data will not be lost.
But you don't need to set the thread number which is larger than the partition number, since each partition can only be consumed by one consumer, so the left threads will be wasted. 2015-05-19 7:46 GMT+08:00 Bill Jay <bill.jaypeter...@gmail.com>: > Hi all, > > I am reading the docs of receiver-based Kafka consumer. The last > parameters of KafkaUtils.createStream is per topic number of Kafka > partitions to consume. My question is, does the number of partitions for > topic in this parameter need to match the number of partitions in Kafka. > > For example, I have two topics, topic1 with 3 partitions and topic2 with 4 > partitions. > > If i specify 2 for topic1 and 3 for topic2 and feed them to the > createStream function, will there be data loss? Or it will just be an > inefficiency. > > Thanks! > > Bill >