gaborgsomogyi commented on a change in pull request #23999: [docs]Add additional explanation for "Setting the max receiving rate" in streaming-programming-guide.md URL: https://github.com/apache/spark/pull/23999#discussion_r264248341
########## File path: docs/streaming-programming-guide.md ########## @@ -2036,7 +2036,7 @@ To run a Spark Streaming applications, you need to have the following. `spark.streaming.receiver.maxRate` for receivers and `spark.streaming.kafka.maxRatePerPartition` for Direct Kafka approach. In Spark 1.5, we have introduced a feature called *backpressure* that eliminate the need to set this rate limit, as Spark Streaming automatically figures out the - rate limits and dynamically adjusts them if the processing conditions change. This backpressure + rate limits and dynamically adjusts them if the processing conditions change.If the first batch of data is very large which causes the first batch is processing all the time and the task can not work normally , using a maximum rate limit can solve the problem .This backpressure Review comment: I see the intention but I agree with Sean and think this change doesn't make the doc better. I agree that if the first batch processing time is significantly bigger than the batch period then microbatches can be queued up but I would rephrase things. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
