[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

Ngone51 Sun, 21 Jan 2018 20:56:23 -0800

Github user Ngone51 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19285#discussion_r162848405
  
    --- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
    @@ -162,26 +162,29 @@ private[spark] class MemoryStore(
       }
     
       /**
    -   * Attempt to put the given block in memory store as values.
    +   * Attempt to put the given block in memory store as values or bytes.
        *
        * It's possible that the iterator is too large to materialize and store 
in memory. To avoid
        * OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
        * whether there is enough free memory. If the block is successfully 
materialized, then the
        * temporary unroll memory used during the materialization is 
"transferred" to storage memory,
        * so we won't acquire more memory than is actually needed to store the 
block.
        *
    -   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
    -   *         an iterator containing the values of the block. The returned 
iterator will be backed
    -   *         by the combination of the partially-unrolled block and the 
remaining elements of the
    -   *         original input iterator. The caller must either fully consume 
this iterator or call
    -   *         `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
    -   *         block.
    +   * @param blockId The block id.
    +   * @param values The values which need be stored.
    +   * @param classTag the [[ClassTag]] for the block.
    +   * @param memoryMode The values saved mode.
    +   * @param valuesHolder A holder that supports storing record of values 
into memory store as
    +   *        values or bytes.
    +   * @return if the block is stored successfully, return the stored data 
size. Else return the
    +   *         memory has used for unroll the block.
    --- End diff --
    
    First, I think you will do not disagree with that there's 
partially-unrolled case exists in failure situation.
    
    Second,
    >The block can be unrolled fully, but the used memory exceeded the request 
and can't request the extra memory.
    
    Yeah, I know. But what I want to say is block unrolled fully doesn't mean 
we have reserved unroll memory for all values(this only happens when the last 
element in iterator % memoryCheckPeriod == 0), because of `memoryCheckPeriod`.  
And here, we talk about `the  memory has used for unroll the block`. So, it is 
not accurately to say 'block be unrolled fully, so the used memory is for all 
the values'. 
    
    So, mostly, it would be `partially-unrolled`. WDYT?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

Reply via email to