[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

CodingCat Tue, 12 Dec 2017 09:55:10 -0800

Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19864#discussion_r156445522
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 ---
    @@ -60,7 +62,8 @@ case class InMemoryRelation(
         @transient child: SparkPlan,
         tableName: Option[String])(
         @transient var _cachedColumnBuffers: RDD[CachedBatch] = null,
    -    val batchStats: LongAccumulator = 
child.sqlContext.sparkContext.longAccumulator)
    +    val batchStats: LongAccumulator = 
child.sqlContext.sparkContext.longAccumulator,
    +    statsOfPlanToCache: Option[Statistics] = None)
    --- End diff --
    
    my two cents here: I didn't look into the code which makes this influence 
the logic of equal and hash, but we may not want to make equals/hash dependent 
on this: 
    
    as in Spark SQL, we usually compare plan based on the string-represented 
format instead of plus stats info, e.g. try to reuse the cached plan based on 
the execution plan's string-representation instead of anything + stats info



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

Reply via email to