[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

hvanhovell Wed, 13 Dec 2017 02:03:42 -0800

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19864#discussion_r156610279
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 ---
    @@ -71,9 +74,10 @@ case class InMemoryRelation(
     
       override def computeStats(): Statistics = {
         if (batchStats.value == 0L) {
    -      // Underlying columnar RDD hasn't been materialized, no useful 
statistics information
    -      // available, return the default statistics.
    -      Statistics(sizeInBytes = child.sqlContext.conf.defaultSizeInBytes)
    +      // Underlying columnar RDD hasn't been materialized, use the stats 
from the plan to cache when
    +      // applicable
    +      statsOfPlanToCache.getOrElse(Statistics(sizeInBytes =
    +        child.sqlContext.conf.defaultSizeInBytes))
    --- End diff --
    
    Mweh - this seems very arbitrary.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

Reply via email to