Ngone51 edited a comment on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-844775582
Thanks for the ping @xuanyuanking
> does it fit into stage level scheduling
This sounds feasible to me. We can treat the state store as a resource for
the streaming task. And since the `StateStoreProviderId` is shared between
batches, tasks between batches must be assigned the same state store as long as
they require the same `StateStoreProviderId` (which is guaranteed by the stage
level scheduling mechanism). Here's pseudo code may look like:
```scala
case class StateStoreRDD {
...
this.withResources(new ResourceProfile().add(StateStoreProviderId))
...
}
```
On the other side, driver should be able to update
`ExecutorData.resourcesInfo` when `StateStoreCoordinatorRef` receives the
active state store instance register so that the executor would contain the
state store resource.
One thing we need to pay attention to is that: there're might be no
available executors for the specific `StateStoreProviderId` due to executor
lost or 1st batch (where state store hasn't established
) which could leads the scheduling hang. Thus, I'm thinking of making the
state store as an "optional" resource.
While stage level scheduling solving the "must run at a particular executor"
problem, the problem of unenvely distribution of the first batch still exits. I
don't have a good idea yet. But I think we can add hack code in scheduler
anyway (e.g., we can add the strategy as you added in the the #32422) as long
as we know its the 1st batch.
Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]