With this lowlevel Kafka API <https://github.com/dibbhatt/kafka-spark-consumer/>, you can actually specify how many receivers that you want to spawn and most of the time it spawns evenly, usually you can put a sleep just after creating the context for the executors to connect to the driver and then spark will evenly distribute the receivers.
Thanks Best Regards On Wed, May 13, 2015 at 9:03 PM, hotdog <[email protected]> wrote: > I 'm using streaming integrated with streaming-kafka. > > My kafka topic has 80 partitions, while my machines have 40 cores. I found > that when the job is running, the kafka consumer processes are only deploy > to 2 machines, the bandwidth of the 2 machines will be very very high. > > I wonder is there any way to control the kafka consumer's dispatch? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/force-the-kafka-consumer-process-to-different-machines-tp22872.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
