jiangxb1987 commented on issue #26975: [SPARK-30325][CORE] 
markPartitionCompleted cause task status inconsistent
URL: https://github.com/apache/spark/pull/26975#issuecomment-569365741
 
 
   There are multiple corner cases not handled by current solution:
   Image we have two TSMs (M1 and M2) working on the same Stage, and for the 
corresponding tasks are notated as T1 and T2 for a specific partition:
   
   1. T1 and T2 might be scheduled on different executors (E1 and E2), both 
tasks have been finished. Then E2 get lost, in the approach suggested by this 
PR, the partition in M2 will be marked as not successful and a new pending task 
would be added, which is actually not necessary because the shuffle files are 
on E1;
   2. T1 and T2 might be scheduled on the same executor, T1 has been finished 
but T2 is still running. Then the executor get lost, since T2 is still running 
the partition will not be marked as not successful. After a while maybe another 
task finished and mark the TSM as finished, but actually the shuffle files get 
lost, thus it lead to a new regression.
   I haven't get a solution here. I'm thinking whether we can put the 
successful task information into taskInfos inside markPartitionCompleted, if 
this is possible then the second problem I mentioned above could probably get 
resolved.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to