[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

cloud-fan Thu, 22 Feb 2018 20:05:13 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20647#discussion_r170162261
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
    @@ -35,15 +35,14 @@ case class DataSourceV2Relation(
         options: Map[String, String],
         projection: Seq[AttributeReference],
         filters: Option[Seq[Expression]] = None,
    -    userSpecifiedSchema: Option[StructType] = None) extends LeafNode with 
MultiInstanceRelation {
    +    userSpecifiedSchema: Option[StructType] = None)
    +  extends LeafNode with MultiInstanceRelation with DataSourceV2QueryPlan {
    --- End diff --
    
    Sorry I didn't realize you removed the extending of 
`DataSourceReaderHolder`, otherwise I would point it out in your PR. In general 
the question is, how should we define the equality of data source relation? It 
was defined by `output, reader.getClass, filters` before, and your PR changed 
it to the default equality of `DataSourceV2Relation` silently.
    
    The major difference is, should `options` take part in the equality? The 
answer is obviously yes, like `path`, so I'll add `options` to the equality.
    
    BTW `DataSourceV2QueryPlan` is needed, as there are 3 plans need to 
implement explain: `DataSourceV2Relation`, `StreamingDataSourceV2Relation`, 
`DataSourceV2ScanExec`.
    
    > This doesn't use the source name if it is named
    
    I missed that, let me add back.
    
    > doesn't indicate that the source is v2
    
    I'll improve it
    
    > it doesn't show the most important part of the scan
    
    Do you mean the path option? First I don't think showing all the options is 
a good idea, as it can be a lot. My future plan is to show these standard 
options like path, table, etc. Again this is something added silently, there is 
no consensus about how to explain a data source v2 relation, my PR tries to 
have people focus on this part and have a consensus.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

Reply via email to