viirya edited a comment on pull request #30770:
URL: https://github.com/apache/spark/pull/30770#issuecomment-747142154


   > The problem is, this is completely relying on luck - this doesn't give any 
help on physical plan. Again the problem exists even without the PR, but then 
shouldn't we fix the root cause instead of extending the possibility of luck? 
At least Spark should be able to know there're other executors still keeping 
the state, and taking into account while planning.
   
   We already have preferred locations for stateful operations. This is how 
Spark takes into account when planning physical stateful operations. I think 
users can adjust locality wait to force Spark doing that.
   
   The proposal of this is to stabilize the unloading behavior, not just to 
increase the chance of luck. To avoid unload some stores earlier and some 
stores later. It makes harder to estimate the query behavior. It is possible 
that a query works because it unloads stores earlier and sometime it doesn't 
because it unloads stores later.
   
   If you think we should not make it as a configurable item. I can remove it 
from a configuration and only check if alive time is more than the maintenance 
interval. It also helps to stabilize this behavior.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to