Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14712#discussion_r75485564
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
    @@ -72,9 +72,12 @@ case class LogicalRelation(
       // expId can be different but the relation is still the same.
       override lazy val cleanArgs: Seq[Any] = Seq(relation)
     
    -  @transient override lazy val statistics: Statistics = Statistics(
    -    sizeInBytes = BigInt(relation.sizeInBytes)
    -  )
    +  // statistics inherited from a MetastoreRelation
    +  @transient var inheritedStats: Option[Statistics] = None
    --- End diff --
    
    This looks a little hacky. We are trying to support parquet/orc hive tables 
right? In that case `MetastoreRelation` will be converted to `LogicalRelation` 
and we will lost the statistics.
    
    I think it's more reasonable to convert table relation to `LogicalRelation` 
in the planner, so that when we see the logical plan tree it's clearer that we 
are reading a table, and it's easier to get the statistics. `LogicalRelation` 
should only be used when reading data source files. This is out of the scope 
and we can do it in another PR.
    
    For now, I'm ok to not support this case, as parquet/orc conversion for 
hive serde tables will be refatored soon.
    
    cc @yhuai @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to