Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20647#discussion_r170088798
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
    @@ -35,15 +35,14 @@ case class DataSourceV2Relation(
         options: Map[String, String],
         projection: Seq[AttributeReference],
         filters: Option[Seq[Expression]] = None,
    -    userSpecifiedSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
    +    userSpecifiedSchema: Option[StructType] = None)
    +  extends LeafNode with MultiInstanceRelation with DataSourceV2QueryPlan {
    --- End diff --
    
    Extending `DataSourceV2QueryPlan` changes the definition of equality, which 
I don't think is correct. At a minimum, I would say that two relations are 
equal if they produce the same sequence of records. Equality as implemented in 
this PR would allow completely different folders of data to be considered equal 
as long as they produce the same schema and have the same filters.
    
    In general, I don't see the utility of `DataSourceV2QueryPlan` for the 
purpose of this PR, which is to improve explain results. This doesn't use the 
source name if it is named, which is a regression in the explain results. It 
also doesn't indicate that the source is v2. And finally, it doesn't show the 
most important part of the scan, which is where the data comes from.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to