viirya commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-845734433
> So what if the state store resource is **required** not **optional**? It means, the task won't launch until getting the required state store. So in your PVC case, the task will wait until it re-mount to some executors. And if we make state store resource required, we should do the similar thing for the HDFS state store on executor lost. For example, we should reconstruct the state store on other active executors (or even we don't have to reconstruct the state store in reality but move the `StateStoreProviderId`s to other active executors' metadata (e.g., ExecutorData) should be enogh) so that the state store resources always exist and scheduling won't hang. No. In our use-case, we want to get rid of HDFS for state store checkpoint. So the task will wait until the PVC re-mounts to another new executor. Our state store is checkpointed to PVC, not HDFS. That is why I question about if stage level scheduling can handle such case. Because it is one of requirements of this proposed plugin API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
