On 10/27/2014 05:19 AM, Jianshi Huang wrote:
> Any suggestion? :)
>
> Jianshi
>
> On Thu, Oct 23, 2014 at 3:49 PM, Jianshi Huang
> <jianshi.hu...@gmail.com <mailto:jianshi.hu...@gmail.com>> wrote:
>
>     The Kafka stream has 10 topics and the data rate is quite high (~
>     100K/s per topic).
>
>     Which configuration do you recommend?
>     - 1 Spark app consuming all Kafka topics
>     - 10 separate Spark app each consuming one topic
>
>     Assuming they have the same resource pool.
>
>     Cheers,
>     -- 
>     Jianshi Huang
>
>     LinkedIn: jianshi
>     Twitter: @jshuang
>     Github & Blog: http://huangjs.github.com/
>
>
>
>
> -- 
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/

Do you have time to try and benchmark both? I don't know anything about
Kafka, but I would imagine that the performance of both options would be
similar.

That said, I would recommend having them all run separately; adding new
data streams doesn't require killing a monolithic job, and an error in
one stream would affect a monolithic job much worse that having them all
run separately.

Regards,

Alec

Reply via email to