viirya commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-844836740
> This sounds feasible to me. We can treat the state store as a resource for the streaming task. And since the `StateStoreProviderId` is shared between batches, tasks between batches must be assigned the same state store as long as they require the same `StateStoreProviderId` (which is guaranteed by the stage level scheduling mechanism). Here's pseudo code may look like: > On the other side, driver should be able to update `ExecutorData.resourcesInfo` when `StateStoreCoordinatorRef` receives the active state store instance register so that the executor would contain the state store resource. Thanks @xuanyuanking and @Ngone51. This might be a feasible direction to unblock this. Let me think about it and maybe POC it locally. Roughly looking at it, stage level scheduling can deal with particular task to particular executor issue. This looks okay. #32422 is based on the API added here. So there is a hook/API or something that we can rely to change the scheduler behavior for first batch. Specifically, it is a streaming concept and so without an API/hook there it will be ugly to add hack code into the scheduler to fit this need. We can discuss this, of course. However I'm not sure if stage level scheduling can deal with executor lost case. Based on above comment, seems it cannot. That will be a major concern for the use-case here. During the task scheduling, once an executor is lost, we may need the scheduler be able to re-schedule the task to a particular executor (e.g. reused PVC in our case). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
