Hi,
On 10/13/2016 04:35 PM, Cody Koeninger wrote:
So I see in the logs that PIDRateEstimator is choosing a new rate, and
the rate it's choosing is 100.
But it's always choosing 100, while all the other variables change (processing
time, latestRate, etc.) change.
Also, the records per batch is
So I see in the logs that PIDRateEstimator is choosing a new rate, and
the rate it's choosing is 100.
That happens to be the default minimum of an (apparently undocumented) setting,
spark.streaming.backpressure.pid.minRate
Try setting that to 1 and see if there's different behavior.
BTW, how ma
Hey Cody,
Thanks for the reply. Really helpful.
Following your suggestion, I set spark.streaming.backpressure.enabled to true
and maxRatePerPartition to 10.
I know I can handle 100k records at the same time, but definitely not in 1
second (the batchDuration), so I expect the backpressure t
Cool, just wanted to make sure.
To answer your question about
> Isn't "spark.streaming.backpressure.initialRate" supposed to do this?
that configuration was added well after the integration of the direct
stream with the backpressure code, and was added only to the receiver
code, which the direct
I am 100% sure.
println(conf.get("spark.streaming.backpressure.enabled")) prints true.
On 10/12/2016 05:48 PM, Cody Koeninger wrote:
Just to make 100% sure, did you set
spark.streaming.backpressure.enabled
to true?
On Wed, Oct 12, 2016 at 10:09 AM, Samy Dindane wrote:
On 10/12/2016 04:40
How would backpressure know anything about the capacity of your system
on the very first batch?
You should be able to set maxRatePerPartition at a value that makes
sure your first batch doesn't blow things up, and let backpressure
scale from there.
On Wed, Oct 12, 2016 at 8:53 AM, Samy Dindane w
That's what I was looking for, thank you.
Unfortunately, neither
* spark.streaming.backpressure.initialRate
* spark.streaming.backpressure.enabled
* spark.streaming.receiver.maxRate
* spark.streaming.receiver.initialRate
change how many records I get (I tried many different combinations).
The
http://spark.apache.org/docs/latest/configuration.html
"This rate is upper bounded by the values
spark.streaming.receiver.maxRate and
spark.streaming.kafka.maxRatePerPartition if they are set (see
below)."
On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane wrote:
> Hi,
>
> Is it possible to limit th
Hi,
Is it possible to limit the size of the batches returned by the Kafka consumer
for Spark Streaming?
I am asking because the first batch I get has hundred of millions of records
and it takes ages to process and checkpoint them.
Thank you.
Samy
-