cloud-fan edited a comment on issue #24375: [SPARK-27474][CORE] try best to not 
submit tasks when the partitions are already completed
URL: https://github.com/apache/spark/pull/24375#issuecomment-483513266
 
 
   I think we are discussing the optimization(saving resource) instead of bug? 
Nothing will go wrong even without #21131
   
   UPDATE:
   For normal tasks, they can all complete even if they belong to the same 
partition. So it's just a matter of saving resource by avoiding submitting 
tasks whose corresponding partitions are already marked as completed. 
   
   For tasks that write to file sources, which need to commit to the central 
coordinator, only one task can complete for one partition. In this case, if a 
task from zombie TSM completes first, then the corresponding task in the active 
TSM will fail and get re-tried, and fail again, until the stage attempt is 
aborted. Then a new stage attempt will be created. The job doesn't fail, but 
the resource is wasted a lot.
   
   If the task from the active TSM completes first, then the corresponding task 
from the zombie TSM will fail. This is totally fine, as zombie TSM does not 
retry tasks.
   
   That said, this PR tries to avoid the worst case described above. Even if we 
go through the event loop now, I don't think it will take a very long time that 
the task from the active TSM have already re-tried 3 times.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to