Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19458#discussion_r143527010
--- Diff:
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
@@ -100,7 +102,9 @@ private[spark] class DiskBlockManager(conf: SparkConf,
deleteFilesOnStop: Boolea
/** List all the blocks currently stored on disk by the disk manager. */
def getAllBlocks(): Seq[BlockId] = {
- getAllFiles().map(f => BlockId(f.getName))
+ // SPARK-22227: the Try guides against temporary files written
+ // during shuffle which do not correspond to valid block IDs.
+ getAllFiles().flatMap(f => Try(BlockId(f.getName)).toOption)
--- End diff --
This has the effect of swallowing a number of possible exceptions. I think
at least you'd want to unpack this and log the error. But is there a more
explicit way of excluding temp files? it seems like getAllFiles shouldn't
return those either?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]