Ngone51 commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-845694478
> For the stage level scheduling option, is the state store essentially the same across all executors? No. Tasks with the different partition ids must use the different state stores. And tasks with the same partition id between micro-batches must use the same state store. > How is state store reconstructed when executor lost? I assume its when a streaming task is assigned and the executor is missing the state store and not automatically on executor lost? Yes. Currently, it's reconstructed lazily. And that's preferable normally as reconstruction involves I/O operation - means it needs to read persisted state store data from HDFS and reconstruct an in-memory state store instance for fast operation. > Just thinking out loud, if we still have to do something in the scheduler as plugin and that fulfills both requirements then why have 2 solutions (1 stage level scheduling and 1 scheduler plugin). I may not present my idea clearly..I actually mean we don't add the plugin but replace it with the solution of stage level scheduling + add evenly distribution/spreading strategy (with an option maybe as you mentioned) to scheduler directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
