[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

Ngone51 Mon, 22 Jan 2018 07:04:26 -0800

Github user Ngone51 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19285#discussion_r162962203
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
    @@ -233,17 +235,13 @@ private[spark] class MemoryStore(
         }
     
         if (keepUnrolling) {
    -      // We successfully unrolled the entirety of this block
    -      val arrayValues = vector.toArray
    -      vector = null
    -      val entry =
    -        new DeserializedMemoryEntry[T](arrayValues, 
SizeEstimator.estimate(arrayValues), classTag)
    -      val size = entry.size
    +      // We need more precise value
    +      val size = valuesHolder.esitimatedSize(false)
    --- End diff --
    
    `roughly = false` tells more than estimating the size of vector, which is 
'unroll has finished'. So, `storeValue` will not be called anymore.
    
    And for `heavy work`(under my understanding, mostly due to 
`SizeEstimator.estimate(arrayValues)`), yeah, I agree with you. Those `heavy 
work` would be done whether we create an entry or not or whether the entry put 
into the memory store or not finally.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

Reply via email to