viirya commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-832927163
Evenly state store distribution is just one benefit the plugin enables. Locality cannot be used programmingly to control stateful tasks. For example, after setting up a long locality, once an executor is lost, will Spark be able to choose another executor met our need? We plan to further enhance current SS checkpoint mechanism. One necessary piece is to be able to control stateful task location in cases like that. I'm willing to limit the change only to SS, but unfortunately Spark doesn't provide an API for task scheduling. Since I begin to work on SS in last few months, I feel that SS is somehow a neglected module. Some important features are stuck in past few years. SS is far behind other streaming solution in features. We still believe Spark can be our streaming solution. Driven by customer need on their streaming applications, we are working on to revive the features (session window, rocksdb state store) and also plan on new enhancements (checkpoint). Appreciate if you can re-consider the possibility to add this scheduling plugin. I'm open to change the API to be more general for other use-cases if you think it is better. cc @dbtsai @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
