Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21070#discussion_r184116235
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala
 ---
    @@ -503,7 +503,7 @@ class InMemoryColumnarQuerySuite extends QueryTest with 
SharedSQLContext {
                 case plan: InMemoryRelation => plan
               }.head
               // InMemoryRelation's stats is file size before the underlying 
RDD is materialized
    -          assert(inMemoryRelation.computeStats().sizeInBytes === 740)
    +          assert(inMemoryRelation.computeStats().sizeInBytes === 800)
    --- End diff --
    
    Our optimizer uses the statistics to decide the plans (e.g., in join 
algorithm selection). Thus, the plans could be completely different if the file 
size increases by 8 percents. Could you give us more contexts? cc @rdblue 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to