Ngone51 commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-844858158
> However I'm not sure if stage level scheduling can deal with executor lost case. Based on above comment, seems it cannot. That will be a major concern for the use-case here. During the task scheduling, once an executor is lost, we may need the scheduler be able to re-schedule the task to a particular executor (e.g. reused PVC in our case). So what if the state store resource is **required** not **optional**? It means, the task won't launch until getting the required state store. So in your PVC case, the task will wait until it re-mount to some executors. And if we make state store resource **required***, we should do a similar thing for the HDFS state store on executor lost. For example, we should reconstruct the state store on other active executors so that the state store resources always exist and scheduling won't hang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
