viirya commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-818550415
#30812 is a previous attempt to use locality mechanism to stabilize state store location. Basically I want to do is to avoid Spark schedule streaming tasks which use state store (let me call them stateful tasks) to arbitrary executors. In short it wastes resource consumption on state store, and costs extra time on restoring state store on different executors. For the use-case, current locality seems a hacky approach as we can just blindly assign stateful tasks to executors evenly. We do not know if the assignment makes sense for the scheduler. It makes me think that we may need an API that we can use to provide scheduling suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
