uncleGen edited a comment on issue #24283: [SPARK-27355][SS] Make query execution more sensitive to epoch message late or lost URL: https://github.com/apache/spark/pull/24283#issuecomment-480701061 @gaborgsomogyi Thanks for your reply. [#23156](https://github.com/apache/spark/pull/20936) introduced a maximum queue threshold before stop the stream with a error. In [#23156](https://github.com/apache/spark/pull/20936) , we used the same threshold for different queue, i.e. `partitionCommits`, `partitionOffsets` and `epochsWaitingToBeCommitted`. Generally, the size of `partitionCommits` and `partitionOffsets` grow much faster than `epochsWaitingToBeCommitted`. The stream may fail with 10 epochs if partition number is 100. However, we may wait for 10000 epochs before failure if partition number is 1 (if i understand correctly). It is such a long time before query fail. Well, this may be just a harsh boundary condition. The main concern of PR is to split these two thresholds to make query execution more sensitive to epoch message late or lost. If you feel like `10 epoch` is too sensitive in some intermittent problem, we can relax this condition to 100 or other.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
