HeartSaVioR commented on pull request #30770: URL: https://github.com/apache/spark/pull/30770#issuecomment-747289740
My preference is unloading state as soon as possible if it's not being used. So if this is achievable I don't think we need to investigate alternatives. The difference between TTL and replication is the condition on eviction. Assume some bad thing happens (the original proposal only helps in edge case, so I think it's not crazy to assume the bad thing) and Spark somehow assigns task for the same state partition to executor A for batch 1 and B for batch 2 and C for batch 3, etc. TTL based eviction will end up keeping all states as loaded unless it reaches TTL. That said, max copies of states on the fly are indeterministic. Instead, we could restrict max number of loaded state for the same state store. Coordinator can maintain the latest N active executors for the state store, and executor can evict the state on maintenance if the executor isn't included in the list. Assuming the maintenance interval isn't too long, Spark will be able to maintain roughly closed to the max number of copies for state store. (We could even add another maintenance interval for only unloading state which can be even reduced down heavily. The interval is set to the high value because of cost on snapshotting the state.) The state store coordinator (in driver) has full control of the condition of eviction, which would help when preferredLocation is getting called. (That's easily achievable on TTL case as well once executor is reporting to coordinator, though.) That only works if end users could tolerate multiple copies (should be configurable) of state store among executors. Given the problematic case assumes the large state, neither TTL nor replication will work for the case. We should just have to reduce down the unnecessary inactive state. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
