subject:"\"Limit Kafka batches size with Spark Streaming\""

Re: Limit Kafka batches size with Spark Streaming

2016-10-13 Thread Samy Dindane

Hi, On 10/13/2016 04:35 PM, Cody Koeninger wrote: So I see in the logs that PIDRateEstimator is choosing a new rate, and the rate it's choosing is 100. But it's always choosing 100, while all the other variables change (processing time, latestRate, etc.) change. Also, the records per batch is

Re: Limit Kafka batches size with Spark Streaming

2016-10-13 Thread Cody Koeninger

So I see in the logs that PIDRateEstimator is choosing a new rate, and the rate it's choosing is 100. That happens to be the default minimum of an (apparently undocumented) setting, spark.streaming.backpressure.pid.minRate Try setting that to 1 and see if there's different behavior. BTW, how ma

Re: Limit Kafka batches size with Spark Streaming

2016-10-13 Thread Samy Dindane

Hey Cody, Thanks for the reply. Really helpful. Following your suggestion, I set spark.streaming.backpressure.enabled to true and maxRatePerPartition to 10. I know I can handle 100k records at the same time, but definitely not in 1 second (the batchDuration), so I expect the backpressure t

Re: Limit Kafka batches size with Spark Streaming

2016-10-12 Thread Cody Koeninger

Cool, just wanted to make sure. To answer your question about > Isn't "spark.streaming.backpressure.initialRate" supposed to do this? that configuration was added well after the integration of the direct stream with the backpressure code, and was added only to the receiver code, which the direct

Re: Limit Kafka batches size with Spark Streaming

2016-10-12 Thread Samy Dindane

I am 100% sure. println(conf.get("spark.streaming.backpressure.enabled")) prints true. On 10/12/2016 05:48 PM, Cody Koeninger wrote: Just to make 100% sure, did you set spark.streaming.backpressure.enabled to true? On Wed, Oct 12, 2016 at 10:09 AM, Samy Dindane wrote: On 10/12/2016 04:40

Re: Limit Kafka batches size with Spark Streaming

2016-10-12 Thread Cody Koeninger

How would backpressure know anything about the capacity of your system on the very first batch? You should be able to set maxRatePerPartition at a value that makes sure your first batch doesn't blow things up, and let backpressure scale from there. On Wed, Oct 12, 2016 at 8:53 AM, Samy Dindane w

Re: Limit Kafka batches size with Spark Streaming

2016-10-12 Thread Samy Dindane

That's what I was looking for, thank you. Unfortunately, neither * spark.streaming.backpressure.initialRate * spark.streaming.backpressure.enabled * spark.streaming.receiver.maxRate * spark.streaming.receiver.initialRate change how many records I get (I tried many different combinations). The

Re: Limit Kafka batches size with Spark Streaming

2016-10-11 Thread Cody Koeninger

http://spark.apache.org/docs/latest/configuration.html "This rate is upper bounded by the values spark.streaming.receiver.maxRate and spark.streaming.kafka.maxRatePerPartition if they are set (see below)." On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane wrote: > Hi, > > Is it possible to limit th

Limit Kafka batches size with Spark Streaming

2016-10-11 Thread Samy Dindane

Hi, Is it possible to limit the size of the batches returned by the Kafka consumer for Spark Streaming? I am asking because the first batch I get has hundred of millions of records and it takes ages to process and checkpoint them. Thank you. Samy -

Re: Limit Kafka batches size with Spark Streaming

Re: Limit Kafka batches size with Spark Streaming

Re: Limit Kafka batches size with Spark Streaming

Re: Limit Kafka batches size with Spark Streaming

Re: Limit Kafka batches size with Spark Streaming

Re: Limit Kafka batches size with Spark Streaming

Re: Limit Kafka batches size with Spark Streaming

Re: Limit Kafka batches size with Spark Streaming

Limit Kafka batches size with Spark Streaming

9 matches

Site Navigation

Mail list logo

Footer information