Ngone51 opened a new pull request #23871: [SPARK-23433][SPARK-25250] [CORE] 
Later created TaskSet should learn about the finished partitions 
URL: https://github.com/apache/spark/pull/23871
 
 
   ## What changes were proposed in this pull request?
   
   This is an optional solution for #22806 . 
   
   #21131 firstly implement that a previous successful completed task from 
zombie TaskSet could also succeed the active TaskSet, which based on an 
assumption that an active TaskSet always exists for that stage when this 
happen.  But that's not always true as an active TaskSet may haven't been 
created when a previous task succeed, and this is the reason why #22806 hit the 
issue.
   
   This pr extends #21131 's behavior by adding `stageIdToFinishedPartitions` 
into TaskSchedulerImpl, which recording the finished partition whenever a 
task(from zombie or active) succeed. Thus, a later created active TaskSet could 
also learn about the finished partition by looking into 
`stageIdToFinishedPartitions ` and won't launch any duplicate tasks.
   
   ## How was this patch tested?
   
   Add.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to