Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/5136#discussion_r27020752
--- Diff:
core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala ---
@@ -91,7 +90,12 @@ private[spark] class DiskBlockManager(blockManager:
BlockManager, conf: SparkCon
/** List all the files currently stored on disk by the disk manager. */
def getAllFiles(): Seq[File] = {
// Get all the files inside the array of array of directories
- subDirs.flatten.filter(_ != null).flatMap { dir =>
+ subDirs.flatMap { dir =>
--- End diff --
I think you have a decent point. Yes the example I gave happened to involve
strings, which have `final` fields, but imagine a different example that
doesn't. I think I am implicitly reasoning that the file creation, for example,
must happen-before the assignment within one thread (this is not a question of
the Java memory model and visibility). I also don't think you can see a memory
location before the default object initialization finishes since this is atomic
w.r.t. the Java program (not the constructor body). That plus the end of the
first `synchronized` block is a memory barrier that causes all the writes to be
visible.
So I made too strong an assertion that "this sort of thing can never
happen" since it does depend a little more on exactly what's happening.
So, hm, just given that there was discussion here, I can see the argument
for being safe and leaving in the extra copy "just in case". I suppose the
question is how expensive or error prone is it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]