Ngone51 edited a comment on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-844858158


   > However I'm not sure if stage level scheduling can deal with executor lost 
case. Based on above comment, seems it cannot. That will be a major concern for 
the use-case here. During the task scheduling, once an executor is lost, we may 
need the scheduler be able to re-schedule the task to a particular executor 
(e.g. reused PVC in our case).
   
   So what if the state store resource is **required** not **optional**? It 
means, the task won't launch until getting the required state store. So in your 
PVC case, the task will wait until it re-mount to some executors. And if we 
make state store resource required, we should do the similar thing for the HDFS 
state store on executor lost. For example, we should reconstruct the state 
store on other active executors (or even we don't have to reconstruct the state 
store in reality but move the `StateStoreProviderId`s to other active 
executors' metadata (e.g., ExecutorData) should be enogh) so that the state 
store resources always exist and scheduling won't hang. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to