Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21577 > * fixed the issue Mridul brought up, but I think the race that Tom describes still exists. I'm just not sure it would cause problems, since as far as I can tell it can only happen in a map stage, not a result stage. @vanzin which race were you referring to here, I think tracking the stage across attempts fixes both the ones I mentioned in reference to scenario 2 for Mridul > There is another case though here where T1_1.1 could have just asked to be committed, but not yet committed, then if it gets delayed committing, the new stage attempt starts and T1_1.2 asks if it could commit and is granted, so then both try to commit at the same time causing corruption. Fixed because T1_1.2 won't be allowed to commit because we track first state attempt as committing. > The caveat there though would be if since T1_1.1 was committed, the second stage attempt could finish and call commitJob while T1_1.2 is committing since spark thinks it doesn't need to wait for T1_1.2. Anyway this seems very unlikely but we should protect against it. T1_1.2 shouldn't ever be allowed to commit since we track across the attempts so it wouldn't ever commit after the stage itself has completed.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org