Github user markhamstra commented on the issue:

    https://github.com/apache/spark/pull/15213
  
    This doesn't make sense to me.  The DAGSchedulerEventProcessLoop runs on a 
single thread and processes a single event from its queue at a time.
    
    When the first CompletionEvent is run as a result of a fetch failure, 
failedStages is added to and a ResubmitFailedStages event is queued.  After 
handleTaskCompletion is done, the next event from the queue will be processed.  
As events are sequentially dequeued and handled, either the 
ResubmitFailedStages event will be handled before the CompletionEvent for the 
second fetch failure, or the CompletionEvent will be handled before the 
ResubmitFailedStages event.  If the ResubmitFailedStages is handled first, then 
failedStages will be cleared in resubmitFailedStages, and there will be nothing 
preventing the subsequent CompletionEvent from queueing another 
ResubmitFailedStages event to handle additional fetch failures.  In the 
alternative that the second CompletionEvent is queued and handled before the 
ResubmitFailedStages event, then the additional stages are added to the 
non-empty failedStages, but there is no need to schedule another 
ResubmitFailedStages event because the one from 
 the first CompletionEvent is still on the queue and the handling of that 
queued event will also handle the newly added failedStages from the second 
CompletionEvent.  In either ordering, all the failedStages are handled and 
there is no race condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to