Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/19864#discussion_r156610279
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
---
@@ -71,9 +74,10 @@ case class InMemoryRelation(
override def computeStats(): Statistics = {
if (batchStats.value == 0L) {
- // Underlying columnar RDD hasn't been materialized, no useful
statistics information
- // available, return the default statistics.
- Statistics(sizeInBytes = child.sqlContext.conf.defaultSizeInBytes)
+ // Underlying columnar RDD hasn't been materialized, use the stats
from the plan to cache when
+ // applicable
+ statsOfPlanToCache.getOrElse(Statistics(sizeInBytes =
+ child.sqlContext.conf.defaultSizeInBytes))
--- End diff --
Mweh - this seems very arbitrary.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]