Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21070#discussion_r184116235
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala
---
@@ -503,7 +503,7 @@ class InMemoryColumnarQuerySuite extends QueryTest with
SharedSQLContext {
case plan: InMemoryRelation => plan
}.head
// InMemoryRelation's stats is file size before the underlying
RDD is materialized
- assert(inMemoryRelation.computeStats().sizeInBytes === 740)
+ assert(inMemoryRelation.computeStats().sizeInBytes === 800)
--- End diff --
Our optimizer uses the statistics to decide the plans (e.g., in join
algorithm selection). Thus, the plans could be completely different if the file
size increases by 8 percents. Could you give us more contexts? cc @rdblue
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]