seayoun commented on issue #26975: [SPARK-30325][CORE] markPartitionCompleted 
cause task status inconsistent
URL: https://github.com/apache/spark/pull/26975#issuecomment-569853896
 
 
   > Let me expand on case 2:
   > If T1 finished first, the partition in TSM2 (notated as P1) will be marked 
as successful too. Then the executor get lost, since T2 is still running, we 
won't change `successful(P1)` to false.
   > Then, possibly other partitions in TSM2 could be marked as successful by 
other tasks, then TSM2 think all the partitions has been finished, but actually 
P1 has been lost and not computed again.
   
   
   There another two cases in this situation as follows:
   
   1. T1 and T2 run on different executors, it doesn't matter.
   2. T1 and T2 run on same executor, T2 will not retry since T1 has succeeded. 
   Think like this situation:
    A stage has finished and then an executor holding the stage's shuffle file 
got lost, we can't rescheduler since it has finished, we will retry by next 
stage got `FetchFailedException`.
   This case like this we disscussed, we won't reschedule the task in the 
finished TSM, I think it is similar.
   
   So, I think this is reasonable, what do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to