squito commented on issue #22806: [SPARK-25250][CORE] : Late zombie task 
completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#issuecomment-460324135
 
 
   A ConcurrentHashMap alone isn't enough to solve all the issues.  TSM also 
expects `successful` and `taskSuccessful` to be in sync.  If they are not 
controlled by one lock, they may not be.  You could make `taskSuccessful` an 
`AtomicInteger` so they'd be eventually consistent, but then we've got to make 
sure there are no errors there (eg. would anything crazy happen if you somehow 
declare a taskset completed because `taskSuccessful == numTasks`, but you 
haven't yet marked `successful` for every task?).  And even `successful` is 
used in multiple places -- a CHM doesn't protect the *values* from concurrent 
modification, so you'd need to make sure that you wouldn't eg. double increment 
`taskSuccessful` from two simultaneous completions for the same task.
   
   you probably meant something to handle all this, and maybe it can be done, 
but at least so far, its not totally obvious to me what the right solution is.
   
   @Ngone51 that is an interesting possibility ... I will think through that 
some more as well.  I think it could be made to work, though the more 
duplicated state we have, it will only get even harder to reason about this 
already complex beast.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to