cloud-fan commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-819341329


   Correct me if I'm wrong: Spark tries its best to schedule SS tasks on 
executors that have existing state store data. This is already the case and is 
implemented via the preferred location. The problem we are solving here is the 
first micro-batch, where there is no existing state store data and we want to 
schedule the tasks of the first micro-batch evenly on the cluster. This is to 
avoid skews in the future that many SS tasks are running on very few executors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to