squito commented on issue #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched URL: https://github.com/apache/spark/pull/22806#issuecomment-460324135 A ConcurrentHashMap alone isn't enough to solve all the issues. TSM also expects `successful` and `taskSuccessful` to be in sync. If they are not controlled by one lock, they may not be. You could make `taskSuccessful` an `AtomicInteger` so they'd be eventually consistent, but then we've got to make sure there are no errors there (eg. would anything crazy happen if you somehow declare a taskset completed because `taskSuccessful == numTasks`, but you haven't yet marked `successful` for every task?). And even `successful` is used in multiple places -- a CHM doesn't protect the *values* from concurrent modification, so you'd need to make sure that you wouldn't eg. double increment `taskSuccessful` from two simultaneous completions for the same task. you probably meant something to handle all this, and maybe it can be done, but at least so far, its not totally obvious to me what the right solution is. @Ngone51 that is an interesting possibility ... I will think through that some more as well. I think it could be made to work, though the more duplicated state we have, it will only get even harder to reason about this already complex beast.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
