Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19623#discussion_r148226526
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java
 ---
    @@ -75,8 +82,10 @@
       /**
        * Aborts this writing job because some data writers are failed to write 
the records and aborted,
        * or the Spark job fails with some unknown reasons, or {@link 
#commit(WriterCommitMessage[])}
    -   * fails. If this method fails(throw exception), the underlying data 
source may have garbage that
    -   * need to be cleaned manually, but these garbage should not be visible 
to data source readers.
    +   * fails.
    +   *
    +   * If an exception was thrown, the underlying data source may have 
garbage that need to be
    +   * cleaned manually, but these garbage should not be visible to data 
source readers.
    --- End diff --
    
    bit optimistic on the "should". Maybe: "state of the destination is 
undefined". 
    
    The issue here is what if there's a network failure and abort() fails for 
that. it hasn't made any changes to the destination. Thus its state is whatever 
the previous state of the dest was. If no task was committed, nothing should be 
directly observable. 1+ task committed and there *may* be output. If job commit 
failed and abort() was called then, nobody is going to have any idea of the 
final state.
    
    Maybe: "Implementations are required to perform best-effort cleanup 
irrespective of the current state of the destination".
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to