tgravescs commented on issue #24497: [SPARK-27630][CORE]Stage retry causes totalRunningTasks calculation to be negative URL: https://github.com/apache/spark/pull/24497#issuecomment-491441977 So the issue why its negative is the task end event comes after the new stage attempt has started which causes it to decrement the stageIdToNumRunningTask(stage) Map. Yeah I'm surprised we didn't see that more but it would be timing dependent. What @squito says makes sense to me. I was just looking a bit and I wonder if we have an issues with the stageIdToSpeculativeTaskIndices as well. If the stageId gets put back in there, you could have an issue though it seems unlikely. I wonder if it makes senes to look at using the stage attempt id for a few of these. I think you could have similar problem with stageIdToTaskIndices if the stage attempt had started other tasks before you got the task end. I would have to take a more thorough look to verify.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
