HeartSaVioR commented on pull request #30770:
URL: https://github.com/apache/spark/pull/30770#issuecomment-747289740


   My preference is unloading state as soon as possible if it's not being used. 
So if this is achievable I don't think we need to investigate alternatives.
   
   The difference between TTL and replication is the condition on eviction. 
Assume some bad thing happens (the original proposal only helps in edge case, 
so I think it's not crazy to assume the bad thing) and Spark somehow assigns 
task for the same state partition to executor A for batch 1 and B for batch 2 
and C for batch 3, etc. TTL based eviction will end up keeping all states as 
loaded unless it reaches TTL. That said, max copies of states on the fly are 
indeterministic.
   
   Instead, we could restrict max number of loaded state for the same state 
store. Coordinator can maintain the latest N active executors for the state 
store, and executor can evict the state on maintenance if the executor isn't 
included in the list. Assuming the maintenance interval isn't too long, Spark 
will be able to maintain roughly closed to the max number of copies for state 
store. (We could even add another maintenance interval for only unloading state 
which can be even reduced down heavily. The interval is set to the high value 
because of cost on snapshotting the state.)
   
   The state store coordinator (in driver) has full control of the condition of 
eviction, which would help when preferredLocation is getting called. (That's 
easily achievable on TTL case as well once executor is reporting to 
coordinator, though.)
   
   That only works if end users could tolerate multiple copies (should be 
configurable) of state store among executors. Given the problematic case 
assumes the large state, neither TTL nor replication will work for the case. We 
should just have to reduce down the unnecessary inactive state.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to