rdblue commented on a change in pull request #23208: [SPARK-25530][SQL] data
source v2 API refactor (batch write)
URL: https://github.com/apache/spark/pull/23208#discussion_r240394365
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##########
@@ -241,32 +241,28 @@ final class DataFrameWriter[T] private[sql](ds:
Dataset[T]) {
assertNotBucketed("save")
- val cls = DataSource.lookupDataSource(source,
df.sparkSession.sessionState.conf)
- if (classOf[DataSourceV2].isAssignableFrom(cls)) {
- val source =
cls.getConstructor().newInstance().asInstanceOf[DataSourceV2]
- source match {
- case provider: BatchWriteSupportProvider =>
- val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
- source,
- df.sparkSession.sessionState.conf)
- val options = sessionOptions ++ extraOptions
-
+ val session = df.sparkSession
+ val cls = DataSource.lookupDataSource(source, session.sessionState.conf)
+ if (classOf[TableProvider].isAssignableFrom(cls)) {
+ val provider =
cls.getConstructor().newInstance().asInstanceOf[TableProvider]
+ val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
+ provider, session.sessionState.conf)
+ val options = sessionOptions ++ extraOptions
+ val dsOptions = new DataSourceOptions(options.asJava)
+ provider.getTable(dsOptions) match {
+ case table: SupportsBatchWrite =>
+ val relation = DataSourceV2Relation.create(table, dsOptions)
+ // TODO: revisit it. We should not create the `AppendData` operator
for `SaveMode.Append`.
+ // We should create new end-users APIs for the `AppendData` operator.
Review comment:
I see no reason to make this API depend on migrating the file source. We
know that `SaveMode` must be removed. It makes no sense to create a broken file
source implementation and then remove this afterward.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]