Github user sidhavratha commented on the issue:
https://github.com/apache/spark/pull/21685
If batch duration is 10 second, every 10 second 1 new batch will start
irrespective of last batch was completed or not.
If a particular batch (10 second duration - which is supposed to complete
in 10 second), takes more time to complete (for ex. 50 second in attached
screenshot) that additional 40 sec will get added as scheduling delay of next
batch. If poll time is included in processing time it can cause this sudden
jump of scheduling delays of batches.
These scheduling delay will get cleared if some batches take less than 10
sec. For ex. first batch in screenshot had 4s scheduled delay which got cleared
for next batch as that batch took only 5s to process.
We are using backpressure to automatically control record count based of
batch speed.
<img width="1253" alt="screen shot 2018-07-02 at 6 51 13 pm"
src="https://user-images.githubusercontent.com/2279976/42166788-c1c3890a-7e29-11e8-8d74-c2c251c7a6a1.png">
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]