viirya commented on pull request #30770: URL: https://github.com/apache/spark/pull/30770#issuecomment-747142154
> The problem is, this is completely relying on luck - this doesn't give any help on physical plan. Again the problem exists even without the PR, but then shouldn't we fix the root cause instead of extending the possibility of luck? At least Spark should be able to know there're other executors still keeping the state, and taking into account while planning. We already have preferred locations for stateful operations. This is how Spark takes into account when planning physical stateful operations. I think users can adjust locality wait to force Spark doing that. The proposal of this is to stabilize the unloading behavior. To avoid unload some stores earlier and some stores later. It makes harder to estimate the query behavior. It is possible that a query works because it unloads stores earlier and sometime it doesn't because it unloads stores later. If you think we should not make it as a configurable item. I can remove it from a configuration and only check if alive time is more than the maintenance interval. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
