[GitHub] [spark] cloud-fan commented on pull request #32136: [WIP][SPARK-35022][CORE] Task Scheduling Plugin in Spark

GitBox Wed, 14 Apr 2021 01:37:26 -0700


cloud-fan commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-819341329



   Correct me if I'm wrong: Spark tries its best to schedule SS tasks on 
executors that have existing state store data. This is already the case and is 
implemented via the preferred location. The problem we are solving here is the 
first micro-batch, where there is no existing state store data and we want to 
schedule the tasks of the first micro-batch evenly on the cluster. This is to 
avoid skews in the future that many SS tasks are running on very few executors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on pull request #32136: [WIP][SPARK-35022][CORE] Task Scheduling Plugin in Spark

Reply via email to