Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19864#discussion_r155140125
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
    @@ -94,14 +94,16 @@ class CacheManager extends Logging {
           logWarning("Asked to cache already cached data.")
         } else {
           val sparkSession = query.sparkSession
    -      cachedData.add(CachedData(
    -        planToCache,
    -        InMemoryRelation(
    -          sparkSession.sessionState.conf.useCompression,
    -          sparkSession.sessionState.conf.columnBatchSize,
    -          storageLevel,
    -          sparkSession.sessionState.executePlan(planToCache).executedPlan,
    -          tableName)))
    +      val inMemoryRelation = InMemoryRelation(
    +        sparkSession.sessionState.conf.useCompression,
    +        sparkSession.sessionState.conf.columnBatchSize,
    +        storageLevel,
    +        sparkSession.sessionState.executePlan(planToCache).executedPlan,
    +        tableName)
    +      if (planToCache.conf.cboEnabled && 
planToCache.stats.rowCount.isDefined) {
    --- End diff --
    
    no, if CBO is disabled, the relation's sizeInBytes is the file size
    
    
https://github.com/apache/spark/blob/5c3a1f3fad695317c2fff1243cdb9b3ceb25c317/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala#L85
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to