Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239596456 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -241,32 +241,28 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { assertNotBucketed("save") - val cls = DataSource.lookupDataSource(source, df.sparkSession.sessionState.conf) - if (classOf[DataSourceV2].isAssignableFrom(cls)) { - val source = cls.getConstructor().newInstance().asInstanceOf[DataSourceV2] - source match { - case provider: BatchWriteSupportProvider => - val sessionOptions = DataSourceV2Utils.extractSessionConfigs( - source, - df.sparkSession.sessionState.conf) - val options = sessionOptions ++ extraOptions - + val session = df.sparkSession + val cls = DataSource.lookupDataSource(source, session.sessionState.conf) + if (classOf[TableProvider].isAssignableFrom(cls)) { + val provider = cls.getConstructor().newInstance().asInstanceOf[TableProvider] + val sessionOptions = DataSourceV2Utils.extractSessionConfigs( + provider, session.sessionState.conf) + val options = sessionOptions ++ extraOptions + val dsOptions = new DataSourceOptions(options.asJava) + provider.getTable(dsOptions) match { + case table: SupportsBatchWrite => + val relation = DataSourceV2Relation.create(table, dsOptions) + // TODO: revisit it. We should not create the `AppendData` operator for `SaveMode.Append`. + // We should create new end-users APIs for the `AppendData` operator. --- End diff -- The example in the referenced comment is this: ``` spark.range(1).format("source").write.save("non-existent-path") ``` If a path for a path-based table doesn't exist, then I think that the table doesn't exist. If a table doesn't exist, then the operation for `save` should be CTAS instead of AppendData. Here, I think the right behavior is to check whether the provider returns a table. If it doesn't, then the table doesn't exist and the plan should be CTAS. If it does, then it must provide the schema used to validate the AppendData operation. Since we don't currently have CTAS, this should throw an exception stating that the table doesn't exist and can't be created. More generally, the meaning of SaveMode with v1 is not always reliable. I think the right approach is what @cloud-fan suggests: create a new write API for v2 tables that is clear for the new logical plans (I've proposed one and would be happy to open a PR). Once the logical plans are in place, we can go back through this API and move it over to v2 where the behaviors match.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org