Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19269
Several things to discuss:
1. Since Spark can't disable speculation during runtime, currently there is
not much benefit to provide an interface for data source to disable
speculation, because data source can check the spark conf at the beginning and
throw exception if speculation is enabled. We can do it later via mix-in trait.
2. The only contract Spark needs is: data written/committed by tasks should
not be visible to data source readers until the job-level commitment. But they
can be visible to others like other writing tasks, so it's possible for data
sources to implement "abort the output of the other writer".
3. The `WriteCommitMessage` can include statistics(it's an empty
interface), so data sources can aggregate statistics at driver side.
cc @steveloughran @rdblue
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]