How would backpressure know anything about the capacity of your system on the very first batch?
You should be able to set maxRatePerPartition at a value that makes sure your first batch doesn't blow things up, and let backpressure scale from there. On Wed, Oct 12, 2016 at 8:53 AM, Samy Dindane <s...@dindane.com> wrote: > That's what I was looking for, thank you. > > Unfortunately, neither > > * spark.streaming.backpressure.initialRate > * spark.streaming.backpressure.enabled > * spark.streaming.receiver.maxRate > * spark.streaming.receiver.initialRate > > change how many records I get (I tried many different combinations). > > The only configuration that works is > "spark.streaming.kafka.maxRatePerPartition". > That's better than nothing, but I'd be useful to have backpressure enabled > for automatic scaling. > > Do you have any idea about why aren't backpressure working? How to debug > this? > > > On 10/11/2016 06:08 PM, Cody Koeninger wrote: >> >> http://spark.apache.org/docs/latest/configuration.html >> >> "This rate is upper bounded by the values >> spark.streaming.receiver.maxRate and >> spark.streaming.kafka.maxRatePerPartition if they are set (see >> below)." >> >> On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane <s...@dindane.com> wrote: >>> >>> Hi, >>> >>> Is it possible to limit the size of the batches returned by the Kafka >>> consumer for Spark Streaming? >>> I am asking because the first batch I get has hundred of millions of >>> records >>> and it takes ages to process and checkpoint them. >>> >>> Thank you. >>> >>> Samy >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org