hehuiyuan commented on a change in pull request #23999: [docs]Add additional 
explanation for "Setting the max receiving rate" in 
streaming-programming-guide.md
URL: https://github.com/apache/spark/pull/23999#discussion_r264024237
 
 

 ##########
 File path: docs/streaming-programming-guide.md
 ##########
 @@ -2036,7 +2036,7 @@ To run a Spark Streaming applications, you need to have 
the following.
   `spark.streaming.receiver.maxRate` for receivers and 
`spark.streaming.kafka.maxRatePerPartition`
   for Direct Kafka approach. In Spark 1.5, we have introduced a feature called 
*backpressure* that
   eliminate the need to set this rate limit, as Spark Streaming automatically 
figures out the
-  rate limits and dynamically adjusts them if the processing conditions 
change. This backpressure
+  rate limits and dynamically adjusts them if the processing conditions 
change.If the first batch of data is very large which causes the first batch is 
processing all the time and the task can not work normally , using a maximum 
rate limit can solve the problem .This backpressure
 
 Review comment:
   First of all,thank you for your reply. Maybe I didn't express it very 
accurately.
   
   The original document means that setting backpressure does not require to 
set this rate limit. However, In actual usage scenarios, such as spark 
streaming consuming kafka, the first batch of data is often very large, leading 
to the first batch has been processing, affecting the normal operation of 
tasks. Even the first batch of data is finished and it  costs much more time 
than the batch time , the efficiency of processing  subsequent batches is not 
as good as the efficiency of the first batch of data  was processed in batch 
time then continue  processing subsequent batches ,especially spark streaming 
on kubernetes.
   
   In a word,i want to express  setting backpressure is not need setting rate 
limit that is not rigorous .
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to