jiangxb1987 commented on issue #26975: [SPARK-30325][CORE] 
markPartitionCompleted cause task status inconsistent
URL: https://github.com/apache/spark/pull/26975#issuecomment-569365197
 
 
   There are multiple corner cases not handled by current solution:
   Image we have two TSMs (M1 and M2) working on the same Stage, and for the 
corresponding tasks are notated as T1 and T2 for a specific partition:
   1. T1 and T2 might be scheduled on different executors (E1 and E2), T1 has 
been finished but T2 is still running. Then E2 get lost, in the approach 
suggested by this PR, the partition in M2 will be marked as not successful and 
a new pending task would be added, which is actually not necessary because the 
shuffle files are on E1;
   2. T1 and T2 might be scheduled on the same executor, T1 has been finished 
but T2 is still running. Then the executor get lost, since T2 is still running 
the partition will not be marked as not successful. After a while maybe another 
task finished and mark the TSM as finished, but actually the shuffle files get 
lost, thus it lead to a new regression.
   
   I haven't get a solution here. I'm thinking whether we can put the 
successful task information into `taskInfos` inside `markPartitionCompleted`, 
if this is possible then the second problem I mentioned above could probably 
get resolved.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to