squito commented on issue #24497: [SPARK-27630][CORE]Stage retry causes 
totalRunningTasks calculation to be negative
URL: https://github.com/apache/spark/pull/24497#issuecomment-489706450
 
 
   the problem is, the `StageCompleted` for a stage retry comes before all 
those tasks finish ... but meanwhile when that happens for a task failing 4 
times, we intentionally want to set the stage total to 0 for SPARK-11334.  
Those two seem to conflict each other.  Though to be honest, I don't understand 
why that fix was used in SPARK-11334 -- it seems like you should just wait for 
the task end events, and not just any `numRunning` counts just on 
`StageCompleted` events.
   
   I'm amazed this hasn't been reported before, it must be causing dynamic 
allocation to perform poorly on any stage retry.  I'd like to make sure we 
understand this properly and look a bit more at the history here.
   
   @tgravescs @abellina  I think you'll be interested in this one too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to