[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

mateiz Sat, 19 Jul 2014 20:35:17 -0700

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1165#discussion_r15147419
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -463,16 +463,15 @@ private[spark] class BlockManager(
                   val values = dataDeserialize(blockId, bytes)
                   if (level.deserialized) {
                     // Cache the values before returning them
    -                // TODO: Consider creating a putValues that also takes in 
a iterator?
    -                val valuesBuffer = new ArrayBuffer[Any]
    -                valuesBuffer ++= values
    -                memoryStore.putValues(blockId, valuesBuffer, level, 
returnValues = true).data
    -                  match {
    -                    case Left(values2) =>
    -                      return Some(new BlockResult(values2, 
DataReadMethod.Disk, info.size))
    -                    case _ =>
    -                      throw new SparkException("Memory store did not 
return back an iterator")
    -                  }
    +                val putResult = memoryStore.putValues(
    +                  blockId, values, level, returnValues = true, 
allowPersistToDisk = false)
    +                putResult.data match {
    +                  case Left(it) =>
    +                    return Some(new BlockResult(it, DataReadMethod.Disk, 
info.size))
    +                  case _ =>
    +                    // This only happens if we dropped the values back to 
disk (which is never)
    +                    throw new SparkException("Memory store did not return 
an iterator!")
    +                }
    --- End diff --
    
    Isn't it possible that as we unroll the partition here, it will be too 
large? It's certainly less common than it being too large the first time we 
read it, but I can see it happening. I'm thinking of the case where someone 
stores a block as MEMORY_AND_DISK.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1777] Prevent OOMs from single partitio...

Reply via email to