Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/20647#discussion_r170088798
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
---
@@ -35,15 +35,14 @@ case class DataSourceV2Relation(
options: Map[String, String],
projection: Seq[AttributeReference],
filters: Option[Seq[Expression]] = None,
- userSpecifiedSchema: Option[StructType] = None) extends LeafNode with
MultiInstanceRelation {
+ userSpecifiedSchema: Option[StructType] = None)
+ extends LeafNode with MultiInstanceRelation with DataSourceV2QueryPlan {
--- End diff --
Extending `DataSourceV2QueryPlan` changes the definition of equality, which
I don't think is correct. At a minimum, I would say that two relations are
equal if they produce the same sequence of records. Equality as implemented in
this PR would allow completely different folders of data to be considered equal
as long as they produce the same schema and have the same filters.
In general, I don't see the utility of `DataSourceV2QueryPlan` for the
purpose of this PR, which is to improve explain results. This doesn't use the
source name if it is named, which is a regression in the explain results. It
also doesn't indicate that the source is v2. And finally, it doesn't show the
most important part of the scan, which is where the data comes from.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]