Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11436#discussion_r54661775
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -648,8 +647,38 @@ private[spark] class BlockManager(
       }
     
       /**
    -   * @return true if the block was stored or false if the block was 
already stored or an
    -   *         error occurred.
    +   * Retrieve the given block if it exists, otherwise call the provided 
`makeIterator` method
    +   * to compute the block, persist it, and return its values.
    +   *
    +   * @return either a BlockResult if the block was successfully cached, or 
an iterator if the block
    +   *         could not be cached.
    +   */
    +  def getOrElseUpdate(
    +      blockId: BlockId,
    +      level: StorageLevel,
    +      makeIterator: () => Iterator[Any]): Either[BlockResult, 
Iterator[Any]] = {
    +    // Initially we hold no locks on this block.
    +    doPut(blockId, IteratorValues(makeIterator), level, 
downgradeToReadLock = true) match {
    --- End diff --
    
    Some of the inherent complexity here comes from the fact that `doPut()` can 
fail to cache a new block at `MEMORY_ONLY` because the block is too large to 
fully unroll and cannot be dropped to disk. In that case, `doPut()` needs to 
return back an iterator which chains the partially-unrolled values to the rest 
of the original iterator; that's why we have this weird return value in 
`doPut()` .
    
    In the `else` branch in your code example, we need to handle the case where 
the block already exists by reading it from the block store, not by calling 
`makeIterator()` to compute it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to