Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21577
  
    +1. This fixes the commit coordinator problem where two separate tasks can 
be authorized. That case could lead to duplicate data (if, for example, both 
tasks generated unique file names using a random UUID).
    
    However, this doesn't address the problem I hit in practice, where a file 
was created twice and deleted once because the same task attempt number was 
both allowed to commit by the coordinator and denied commit by the coordinator 
(after the stage had finished).
    
    We still need the solution proposed in 
https://github.com/apache/spark/pull/21558 for the v2 API. But that's more of a 
v2 API problem because that API makes the guarantee that implementations can 
rely on the attempt ID.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to