Re: Limit Kafka batches size with Spark Streaming

Cody Koeninger Wed, 12 Oct 2016 07:48:06 -0700

How would backpressure know anything about the capacity of your system
on the very first batch?


You should be able to set maxRatePerPartition at a value that makes
sure your first batch doesn't blow things up, and let backpressure
scale from there.

On Wed, Oct 12, 2016 at 8:53 AM, Samy Dindane <s...@dindane.com> wrote:
> That's what I was looking for, thank you.
>
> Unfortunately, neither
>
> * spark.streaming.backpressure.initialRate
> * spark.streaming.backpressure.enabled
> * spark.streaming.receiver.maxRate
> * spark.streaming.receiver.initialRate
>
> change how many records I get (I tried many different combinations).
>
> The only configuration that works is
> "spark.streaming.kafka.maxRatePerPartition".
> That's better than nothing, but I'd be useful to have backpressure enabled
> for automatic scaling.
>
> Do you have any idea about why aren't backpressure working? How to debug
> this?
>
>
> On 10/11/2016 06:08 PM, Cody Koeninger wrote:
>>
>> http://spark.apache.org/docs/latest/configuration.html
>>
>> "This rate is upper bounded by the values
>> spark.streaming.receiver.maxRate and
>> spark.streaming.kafka.maxRatePerPartition if they are set (see
>> below)."
>>
>> On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane <s...@dindane.com> wrote:
>>>
>>> Hi,
>>>
>>> Is it possible to limit the size of the batches returned by the Kafka
>>> consumer for Spark Streaming?
>>> I am asking because the first batch I get has hundred of millions of
>>> records
>>> and it takes ages to process and checkpoint them.
>>>
>>> Thank you.
>>>
>>> Samy
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Limit Kafka batches size with Spark Streaming

Reply via email to