I am 100% sure.

println(conf.get("spark.streaming.backpressure.enabled")) prints true.

On 10/12/2016 05:48 PM, Cody Koeninger wrote:
Just to make 100% sure, did you set


to true?

On Wed, Oct 12, 2016 at 10:09 AM, Samy Dindane <s...@dindane.com> wrote:

On 10/12/2016 04:40 PM, Cody Koeninger wrote:

How would backpressure know anything about the capacity of your system
on the very first batch?

Isn't "spark.streaming.backpressure.initialRate" supposed to do this?

You should be able to set maxRatePerPartition at a value that makes
sure your first batch doesn't blow things up, and let backpressure
scale from there.

Backpressure doesn't scale even when using maxRatePerPartition: when I
enable backpressure and set maxRatePerPartition to n, I always get n
records, even if my batch takes longer than batchDuration to finish.

* I set batchDuration to 1 sec: `val ssc = new StreamingContext(conf,
* I set backpressure.initialRate and/or maxRatePerPartition to 100,000 and
enable backpressure
* Since I can't handle 100,000 records in 1 second, I expect the
backpressure to kick in in the second batch, and get less than 100,000; but
this does not happen

What am I missing here?

On Wed, Oct 12, 2016 at 8:53 AM, Samy Dindane <s...@dindane.com> wrote:

That's what I was looking for, thank you.

Unfortunately, neither

* spark.streaming.backpressure.initialRate
* spark.streaming.backpressure.enabled
* spark.streaming.receiver.maxRate
* spark.streaming.receiver.initialRate

change how many records I get (I tried many different combinations).

The only configuration that works is
That's better than nothing, but I'd be useful to have backpressure
for automatic scaling.

Do you have any idea about why aren't backpressure working? How to debug

On 10/11/2016 06:08 PM, Cody Koeninger wrote:


"This rate is upper bounded by the values
spark.streaming.receiver.maxRate and
spark.streaming.kafka.maxRatePerPartition if they are set (see

On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane <s...@dindane.com> wrote:


Is it possible to limit the size of the batches returned by the Kafka
consumer for Spark Streaming?
I am asking because the first batch I get has hundred of millions of
and it takes ages to process and checkpoint them.

Thank you.


To unsubscribe e-mail: user-unsubscr...@spark.apache.org

To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to