viirya commented on pull request #30812: URL: https://github.com/apache/spark/pull/30812#issuecomment-749227210
> Can you explain it a little bit why Spark cannot distribute the tasks evenly in the cluster? It would help me understand why this is not a problem for general tasks. I ran some streaming queries with stateful operation recently. When the first batch takes payload from latest offsets, this batch possibly finishes very quick. An executor might be assigned more than one task because the executor finishes previous task very quickly and becomes available again. Generally this is not a problem. It doesn't mater these tasks are evenly distributed or not, because the tasks are finished very quickly. But for SS stateful tasks, next batches will choose task locations of previous batch. So this is why this is an issue only for SS. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
