[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

rdblue Mon, 09 Oct 2017 16:11:59 -0700

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/19269
  
    > The only contract Spark needs is: data written/committed by tasks should 
not be visible to data source readers until the job-level commitment. But they 
can be visible to others like other writing tasks, so it's possible for data 
sources to implement "abort the output of the other writer".
    
    I'm not following what you mean here.
    
    > making DataSourceV2Writer.abort take commit messages is still a 
"best-effort" to clean up the data
    
    Agreed. We should state something about this in the abort job docs: "Commit 
messages passed to abort are the messages for all commits that succeeded and 
sent a commit message to the driver. It is possible, though unlikely, for an 
executor to successfully commit data to a data source, but fail before sending 
the commit message to the driver."




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

Reply via email to