Github user andrewor14 commented on a diff in the pull request:
https://github.com/apache/spark/pull/11791#discussion_r57394680
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -876,21 +876,40 @@ private[spark] class BlockManager(
if (level.useMemory) {
// Put it in memory first, even if it also has useDisk set to true;
// We will drop it to disk later if the memory store can't hold it.
- memoryStore.putIterator(blockId, iterator(), level, classTag)
match {
- case Right(s) =>
- size = s
- case Left(iter) =>
- // Not enough space to unroll this block; drop to disk if
applicable
- if (level.useDisk) {
- logWarning(s"Persisting block $blockId to disk instead.")
- diskStore.put(blockId) { fileOutputStream =>
- serializerManager.dataSerializeStream(blockId,
fileOutputStream, iter)(classTag)
+ if (level.deserialized) {
+ memoryStore.putIteratorAsValues(blockId, iterator(), classTag)
match {
+ case Right(s) =>
+ size = s
+ case Left(iter) =>
+ // Not enough space to unroll this block; drop to disk if
applicable
+ if (level.useDisk) {
+ logWarning(s"Persisting block $blockId to disk instead.")
+ diskStore.put(blockId) { fileOutputStream =>
+ serializerManager.dataSerializeStream(blockId,
fileOutputStream, iter)(classTag)
+ }
+ size = diskStore.getSize(blockId)
+ } else {
+ iteratorFromFailedMemoryStorePut = Some(iter)
}
- size = diskStore.getSize(blockId)
- } else {
- iteratorFromFailedMemoryStorePut = Some(iter)
- }
+ }
+ } else { // !level.deserialized
+ memoryStore.putIteratorAsBytes(blockId, iterator(), classTag)
match {
+ case Right(s) =>
+ size = s
+ case Left(partiallySerializedValues) =>
+ // Not enough space to unroll this block; drop to disk if
applicable
+ if (level.useDisk) {
+ logWarning(s"Persisting block $blockId to disk instead.")
+ diskStore.put(blockId) { fileOutputStream =>
+
partiallySerializedValues.finishWritingToStream(fileOutputStream)
+ }
+ size = diskStore.getSize(blockId)
+ } else {
+ iteratorFromFailedMemoryStorePut =
Some(partiallySerializedValues.valuesIterator)
+ }
--- End diff --
duplicate code >:(
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]