Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20387#discussion_r166030958 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,17 +17,151 @@ package org.apache.spark.sql.execution.datasources.v2 +import java.util.UUID + +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.sql.{AnalysisException, SaveMode} +import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation -import org.apache.spark.sql.catalyst.expressions.AttributeReference -import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics} -import org.apache.spark.sql.sources.v2.reader._ +import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Expression} +import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics} +import org.apache.spark.sql.execution.datasources.DataSourceStrategy +import org.apache.spark.sql.sources.{DataSourceRegister, Filter} +import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, ReadSupport, ReadSupportWithSchema, WriteSupport} +import org.apache.spark.sql.sources.v2.reader.{DataSourceReader, SupportsPushDownCatalystFilters, SupportsPushDownFilters, SupportsPushDownRequiredColumns, SupportsReportStatistics} +import org.apache.spark.sql.sources.v2.writer.DataSourceWriter +import org.apache.spark.sql.types.StructType case class DataSourceV2Relation( - fullOutput: Seq[AttributeReference], - reader: DataSourceReader) - extends LeafNode with MultiInstanceRelation with DataSourceReaderHolder { + source: DataSourceV2, + options: Map[String, String], + path: Option[String] = None, + table: Option[TableIdentifier] = None, --- End diff -- @cloud-fan, sorry if it was not clear: Yes, I have considered it and I think it is a bad idea. I sent a note to the dev list about this issue, as well if you want more context. There are two main reasons: 1. Your proposal creates more places that are responsible for creating a `DataSourceOptions` with the right property names. All of the places where we have a `TableIdentifier` and want to convert to a `DataSourceV2Relation` need to copy the same logic and worry about using the same properties. What you propose is hard to maintain and error prone: what happens if we decide not to pass the database if it is `None` in a `TableIdentifier`? We would have to validate every place that creates a v2 relation. On the other hand, if we pass `TableIdentifier` here, we have one code path that converts. It is also easier for us to pass `TableIdentifier` to the data sources if we choose to update the API. 2. There is no reason to use `DataSourceOptions` outside of v2 at this point. This PR doesn't expose the v2-specific options class to other places in the codebase. Instead, it uses a map for generic options and classes that can be used in pattern matching where possible. And again, this has fewer places that create v2 internal classes, which is easier for maintenance. If you want to add those methods to the options class so that implementations can easily access path and table name, then we can do that in a follow-up PR.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org