rdblue commented on a change in pull request #23208: [SPARK-25530][SQL] data
source v2 API refactor (batch write)
URL: https://github.com/apache/spark/pull/23208#discussion_r240394058
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java
##########
@@ -25,7 +25,10 @@
* The base interface for v2 data sources which don't have a real catalog.
Implementations must
* have a public, 0-arg constructor.
* <p>
- * The major responsibility of this interface is to return a {@link Table} for
read/write.
+ * The major responsibility of this interface is to return a {@link Table} for
read/write. If you
+ * want to allow end-users to write data to non-existing tables via write APIs
in `DataFrameWriter`
+ * with `SaveMode`, you must return a {@link Table} instance even if the table
doesn't exist. The
+ * table schema can be empty in this case.
Review comment:
`SaveMode` is incompatible with the SPIP to standarize behavior that was
voted on and accepted. The save mode in `DataFrameWriter` must be used to
create v2 plans that have well-defined behavior and cannot be passed to
implementations in the final version of the v2 read/write API.
I see no reason to put off removing `SaveMode` from the API. If we remove it
now, we will avoid having more versions of this API that are **fundamentally
broken**. We will avoid more implementations that rely on it, not aware that it
will be removed.
To your point about whether it is safe: the only case where this is actually
used is `SaveMode.Overwrite` and `SaveMode.Append`. To replace those, all that
needs to happen is to define what kind of overwrite should happen here (dynamic
or truncate).
I can supply the logical plan and physical implementation in a follow-up PR
because I already have all this written and waiting to go in. Or, I can add a
PR to merge first if you'd like to have these changes depend on that
implementation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]