Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/20647#discussion_r170307194
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
---
@@ -35,15 +35,14 @@ case class DataSourceV2Relation(
options: Map[String, String],
projection: Seq[AttributeReference],
filters: Option[Seq[Expression]] = None,
- userSpecifiedSchema: Option[StructType] = None) extends LeafNode with
MultiInstanceRelation {
+ userSpecifiedSchema: Option[StructType] = None)
+ extends LeafNode with MultiInstanceRelation with DataSourceV2QueryPlan {
--- End diff --
Part of the utility of an immutable plan is to make equality work
correctly. I thought that was clear, but sorry if it was not. Maybe I should
have pointed it out explicitly.
Equality should definitely be based on all of the inputs that affect the
output rows. That includes location or table identifier, user schema, filters,
requested projection, implementation, and other options. I think case class
equality is correct here.
> there is no consensus about how to explain a data source v2 relation, my
PR tries to have people focus on this part and have a consensus.
That's a good goal for this PR, so I think you should roll back the changes
to the relation so equality is not affected by it.
> Do you mean the path option?
Yes. The data that will be scanned is an important part. As you know, I
think this should be part of the relation itself and not options. That would
solve this problem.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]