Github user CodingCat commented on a diff in the pull request:
https://github.com/apache/spark/pull/19864#discussion_r156445522
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
---
@@ -60,7 +62,8 @@ case class InMemoryRelation(
@transient child: SparkPlan,
tableName: Option[String])(
@transient var _cachedColumnBuffers: RDD[CachedBatch] = null,
- val batchStats: LongAccumulator =
child.sqlContext.sparkContext.longAccumulator)
+ val batchStats: LongAccumulator =
child.sqlContext.sparkContext.longAccumulator,
+ statsOfPlanToCache: Option[Statistics] = None)
--- End diff --
my two cents here: I didn't look into the code which makes this influence
the logic of equal and hash, but we may not want to make equals/hash dependent
on this:
as in Spark SQL, we usually compare plan based on the string-represented
format instead of plus stats info, e.g. try to reuse the cached plan based on
the execution plan's string-representation instead of anything + stats info
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]