On 10/27/2014 05:19 AM, Jianshi Huang wrote: > Any suggestion? :) > > Jianshi > > On Thu, Oct 23, 2014 at 3:49 PM, Jianshi Huang > <jianshi.hu...@gmail.com <mailto:jianshi.hu...@gmail.com>> wrote: > > The Kafka stream has 10 topics and the data rate is quite high (~ > 100K/s per topic). > > Which configuration do you recommend? > - 1 Spark app consuming all Kafka topics > - 10 separate Spark app each consuming one topic > > Assuming they have the same resource pool. > > Cheers, > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/
Do you have time to try and benchmark both? I don't know anything about Kafka, but I would imagine that the performance of both options would be similar. That said, I would recommend having them all run separately; adding new data streams doesn't require killing a monolithic job, and an error in one stream would affect a monolithic job much worse that having them all run separately. Regards, Alec