[GitHub] [bahir] datasherlock commented on pull request #101: [BAHIR-295] Added backpressure & ratelimit support

via GitHub Sat, 04 Feb 2023 06:33:27 -0800


datasherlock commented on PR #101:
URL: https://github.com/apache/bahir/pull/101#issuecomment-1416768410


   The backpressure implementation seems buggy. My understanding is that the 
backpressure mechanism will control the input rate but never exceed the 
`spark.streaming.receiver.maxRate`. But this doesn't seem to be honoured since 
we're noticing that the receiver input rate breaches the 
`spark.streaming.receiver.maxRate` every now and then and tends to put a lot of 
pressure on the pipeline. 
   
   Context - I created a Spark Scala app with 900 receivers, 
`spark.streaming.receiver.maxRate=1500` and `batchInterval=60s`. My 
understanding is that the total number of records per batch should not be 
greater than `900*1500*60 = 81,000,000 records`. But I am noticing that some 
batches are going as high as 776,732,455 records where the `processing time is 
>>> batchInterval`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [bahir] datasherlock commented on pull request #101: [BAHIR-295] Added backpressure & ratelimit support

Reply via email to