rdblue commented on a change in pull request #25465: [SPARK-28747][SQL] merge
the two data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#discussion_r316409612
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##########
@@ -251,37 +251,17 @@ final class DataFrameWriter[T] private[sql](ds:
Dataset[T]) {
assertNotBucketed("save")
- val session = df.sparkSession
- val cls = DataSource.lookupDataSource(source, session.sessionState.conf)
- val canUseV2 = canUseV2Source(session, cls) && partitioningColumns.isEmpty
-
- // In Data Source V2 project, partitioning is still under development.
- // Here we fallback to V1 if partitioning columns are specified.
- // TODO(SPARK-26778): use V2 implementations when partitioning feature is
supported.
- if (canUseV2) {
- val provider =
cls.getConstructor().newInstance().asInstanceOf[TableProvider]
+ val maybeV2Provider = lookupV2Provider()
+ // TODO(SPARK-26778): use V2 implementations when partition columns are
specified
+ if (maybeV2Provider.isDefined && partitioningColumns.isEmpty) {
Review comment:
This can throw an exception. If the provider is v2 and there is no catalog,
then Spark can only append or overwrite (see the cases below). Append and
overwrite rely on existing tables and must fail in v2 if the table does not
exist. If the user specified partition columns for a table that exists, then it
is an error.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]