Try setting spark.streaming.kafka.maxRatePerPartition, this can help control the number of messages read from Kafka per partition on the spark streaming consumer.
-S > On Mar 5, 2016, at 10:02 PM, Vinti Maheshwari <vinti.u...@gmail.com> wrote: > > Hello, > > I am trying to figure out why my kafka+spark job is running slow. I found > that spark is consuming all the messages out of kafka into a single batch > itself and not sending any messages to the other batches. > > 2016/03/05 21:57:05 0 events - - queued 2016/03/05 21:57:00 0 events - - > queued 2016/03/05 21:56:55 0 events - - queued 2016/03/05 21:56:50 0 events - > - queued 2016/03/05 21:56:45 0 events - - queued 2016/03/05 21:56:40 4039573 > events 6 ms - processing > > Does anyone know how this behavior can be changed so that the number of > messages are load balanced across all the batches? > > Thanks, > Vinti