cloud-fan edited a comment on issue #24375: [SPARK-27474][CORE] try best to not submit tasks when the partitions are already completed URL: https://github.com/apache/spark/pull/24375#issuecomment-483513266 I think we are discussing the optimization(saving resource) instead of bug? Nothing will go wrong even without #21131 UPDATE: For normal tasks, they can all complete even if they belong to the same partition. So it's just a matter of saving resource by avoiding submitting tasks whose corresponding partitions are already marked as completed. For tasks that write to file sources, which need to commit to the central coordinator, only one task can complete for one partition. In this case, if a task from zombie TSM completes first, then the corresponding task in the active TSM will fail and get re-tried, and fail again, until the stage attempt is aborted. Then a new stage attempt will be created. The job doesn't fail, but the resource is wasted a lot. If the task from the active TSM completes first, then the corresponding task from the zombie TSM will fail. This is totally fine, as zombie TSM does not retry tasks. That said, this PR tries to avoid the worst case described above. Even if we go through the event loop now, I don't think it will take a very long time that the task from the active TSM have already re-tried 3 times.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
