Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1165#discussion_r15147419 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -463,16 +463,15 @@ private[spark] class BlockManager( val values = dataDeserialize(blockId, bytes) if (level.deserialized) { // Cache the values before returning them - // TODO: Consider creating a putValues that also takes in a iterator? - val valuesBuffer = new ArrayBuffer[Any] - valuesBuffer ++= values - memoryStore.putValues(blockId, valuesBuffer, level, returnValues = true).data - match { - case Left(values2) => - return Some(new BlockResult(values2, DataReadMethod.Disk, info.size)) - case _ => - throw new SparkException("Memory store did not return back an iterator") - } + val putResult = memoryStore.putValues( + blockId, values, level, returnValues = true, allowPersistToDisk = false) + putResult.data match { + case Left(it) => + return Some(new BlockResult(it, DataReadMethod.Disk, info.size)) + case _ => + // This only happens if we dropped the values back to disk (which is never) + throw new SparkException("Memory store did not return an iterator!") + } --- End diff -- Isn't it possible that as we unroll the partition here, it will be too large? It's certainly less common than it being too large the first time we read it, but I can see it happening. I'm thinking of the case where someone stores a block as MEMORY_AND_DISK.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---