HeartSaVioR commented on pull request #30770:
URL: https://github.com/apache/spark/pull/30770#issuecomment-747187019


   > why we don't do unloading asap
   
   Because the conversation is one way - executor registers to driver, executor 
queries from driver, but driver doesn't notify executors. If we can make 
inactive state being unloaded ASAP it's ideal, but I haven't investigated on 
the side-effect.
   
   The possible problem (you've mentioned) comes from the fact there're 
multiple loads on the state across executors. We're providing the estimated 
memory usage of state store in metric and if I'm not mistaken inactive state is 
not counted. That is a huge miss on heap-memory based state store provider. 
Again that's an existing problem, but once we know the problem I don't want to 
expand the possibility.
   
   So IMHO the right direction would be either trying our best to unload 
inactive state ASAP, or considering the replication as the further improvement. 
Not somewhere in between. Even the latter wouldn't be an improvement if we 
could enforce Spark to respect the active executor of the state.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to