Ngone51 commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-845694478


   > For the stage level scheduling option, is the state store essentially the 
same across all executors?
   
   No. Tasks with the different partition ids must use the different state 
stores.  And tasks with the same partition id between micro-batches must use 
the same state store.
   
   > How is state store reconstructed when executor lost? I assume its when a 
streaming task is assigned and the executor is missing the state store and not 
automatically on executor lost?
   
   Yes. Currently, it's reconstructed lazily. And that's preferable normally as 
reconstruction involves I/O operation - means it needs to read persisted state 
store data from HDFS and reconstruct an in-memory state store instance for fast 
operation.
   
   > Just thinking out loud, if we still have to do something in the scheduler 
as plugin and that fulfills both requirements then why have 2 solutions (1 
stage level scheduling and 1 scheduler plugin).
   
   I may not present my idea clearly..I actually mean we don't add the plugin 
but replace it with the solution of stage level scheduling + add evenly 
distribution/spreading strategy (with an option maybe as you mentioned) to 
scheduler directly.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to