Ngone51 edited a comment on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-844858158
> However I'm not sure if stage level scheduling can deal with executor lost case. Based on above comment, seems it cannot. That will be a major concern for the use-case here. During the task scheduling, once an executor is lost, we may need the scheduler be able to re-schedule the task to a particular executor (e.g. reused PVC in our case). So what if the state store resource is **required** not **optional**? It means, the task won't launch until getting the required state store. So in your PVC case, the task will wait until it re-mount to some executors. And if we make state store resource required, we should do the similar thing for the HDFS state store on executor lost. For example, we should reconstruct the state store on other active executors (or even we don't have to reconstruct the state store in reality but move the `StateStoreProviderId`s to other active executors' metadata (e.g., ExecutorData) should be enogh) so that the state store resources always exist and scheduling won't hang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
