viirya commented on pull request #32136:
URL: https://github.com/apache/spark/pull/32136#issuecomment-832927163


   Evenly state store distribution is just one benefit the plugin enables. 
Locality cannot be used programmingly to control stateful tasks. For example, 
after setting up a long locality, once an executor is lost, will Spark be able 
to choose another  executor met our need? We plan to further enhance current SS 
checkpoint mechanism. One necessary piece is to be able to control stateful 
task location in cases like that.
   
   I'm willing to limit the change only to SS, but unfortunately Spark doesn't 
provide an API for task scheduling. Since I begin to work on SS in last few 
months, I feel that SS is somehow a neglected module. Some important features 
are stuck in past few years. SS is far behind other streaming solution in 
features. We still believe Spark can be our streaming solution. Driven by 
customer need on their streaming applications, we are working on to revive the 
features (session window, rocksdb state store) and also plan on new 
enhancements (checkpoint).
   
   Appreciate if you can re-consider the possibility to add this scheduling 
plugin. I'm open to change the API to be more general for other use-cases if 
you think it is better.
   
   cc @dbtsai @dongjoon-hyun 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to