Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20647#discussion_r170307194
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
    @@ -35,15 +35,14 @@ case class DataSourceV2Relation(
         options: Map[String, String],
         projection: Seq[AttributeReference],
         filters: Option[Seq[Expression]] = None,
    -    userSpecifiedSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
    +    userSpecifiedSchema: Option[StructType] = None)
    +  extends LeafNode with MultiInstanceRelation with DataSourceV2QueryPlan {
    --- End diff --
    
    Part of the utility of an immutable plan is to make equality work 
correctly. I thought that was clear, but sorry if it was not. Maybe I should 
have pointed it out explicitly.
    
    Equality should definitely be based on all of the inputs that affect the 
output rows. That includes location or table identifier, user schema, filters, 
requested projection, implementation, and other options. I think case class 
equality is correct here.
    
    > there is no consensus about how to explain a data source v2 relation, my 
PR tries to have people focus on this part and have a consensus.
    
    That's a good goal for this PR, so I think you should roll back the changes 
to the relation so equality is not affected by it.
    
    > Do you mean the path option?
    
    Yes. The data that will be scanned is an important part. As you know, I 
think this should be part of the relation itself and not options. That would 
solve this problem.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to