yuchenhuo commented on a change in pull request #26957: [SPARK-30314] Add
identifier and catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#discussion_r366672547
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##########
@@ -554,12 +556,14 @@ final class DataFrameWriter[T] private[sql](ds:
Dataset[T]) {
}
val command = (mode, tableOpt) match {
- case (_, Some(table: V1Table)) =>
+ case (_, Some(_: V1Table)) =>
return saveAsTable(TableIdentifier(ident.name(),
ident.namespace().headOption))
case (SaveMode.Append, Some(table)) =>
checkPartitioningMatchesV2Table(table)
- AppendData.byName(DataSourceV2Relation.create(table), df.logicalPlan,
extraOptions.toMap)
+ val v2Relation =
+ DataSourceV2Relation.create(table,
catalogManager.catalogIdentifier(catalog), Seq(ident))
+ AppendData.byName(v2Relation, df.logicalPlan, extraOptions.toMap)
case (SaveMode.Overwrite, _) =>
Review comment:
Still probably a dumb question. Why does DDL/DML affects how we generate the
query plan? I'm asking this because in the `save()` function for the
DataFrameWriter, we do generate a `DataSourceV2Relation` for `Overwrite ` mode.
I'm curious about why there is such a difference here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]