uncleGen edited a comment on issue #24283: [SPARK-27355][SS] Make query 
execution more sensitive to epoch message late or lost
URL: https://github.com/apache/spark/pull/24283#issuecomment-480701061
 
 
   @gaborgsomogyi Thanks for your reply.  
[#23156](https://github.com/apache/spark/pull/20936) introduced a maximum queue 
threshold before stop the stream with a error. In 
[#23156](https://github.com/apache/spark/pull/20936) , we used the same 
threshold for different queue, i.e. `partitionCommits`, `partitionOffsets` and 
`epochsWaitingToBeCommitted`. Generally, the size of `partitionCommits` and 
`partitionOffsets` grow much faster than `epochsWaitingToBeCommitted`. The 
stream may fail with 10 epochs if partition number is 100. However, we may wait 
for 10000 epochs before failure if partition number is 1 (if i understand 
correctly). It is such a long time before query fail. Well, this may be just a 
harsh boundary condition. The main concern of PR is to split there two 
thresholds to make query execution more sensitive to epoch message late or 
lost. If you feel like `10 epoch` is too sensitive in some  intermittent 
problem, we can relax this condition to 100 or other.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to