I believe when you go with 1, it will distribute the consumer across your cluster (possibly on 6 machines), but still it i don't see a away to tell from which partition it will consume etc. If you are looking to have a consumer where you can specify the partition details and all, then you are better off with the lowlevel consumer. <https://github.com/dibbhatt/kafka-spark-consumer>
Thanks Best Regards On Tue, Feb 24, 2015 at 9:36 AM, bit1...@163.com <bit1...@163.com> wrote: > Hi, > I am experimenting Spark Streaming and Kafka Integration, To read > messages from Kafka in parallel, basically there are two ways > 1. Create many Receivers like (1 to 6).map(_ => KakfaUtils.createStream). > 2. Specifiy many threads when calling KakfaUtils.createStream like val > topicMap("myTopic"=>6), this will create one receiver with 6 reading > threads. > > My question is which option is better, sounds option 2 is better is to me > because it saves a lot of cores(one Receiver one core), but I learned > from somewhere else that choice 1 is better, so I would ask and see how you > guys elaborate on this. Thank > > ------------------------------ > bit1...@163.com >