Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21558
  
    so I looked through the code and it certain appears to be a bug in the 
existing code (not just v2 datasource api).  If you have one stage running that 
gets a fetch failure, if it leaves any tasks running with attempt 0, it could 
conflict with the restarted stage since those tasks would all start with 
attempt 0 as well.  When I say it could it means it would be a race if they go 
to commit at about the same time.   Its probably more of an issue if one 
commits, then starts the job commit and the other task starts to then commit 
its, you could end up with incomplete/corrupt file.  We should see the warning 
"Authorizing duplicate request to commit" in the logs though if this occurs.
    
    @rdblue does this match what you are seeing?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to