Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/9214#issuecomment-150708764
Hey so I'm curious about two things here:
1) If we just always replaced the output with a new one using a file
rename, would we actually have a problem? I think that any thread that has a
file open will still be reading from the old version of the file if you do a
rename. You should double-check this, but I don't think it will switch
mid-file. That might mean the "last task wins" strategy works.
2) Otherwise, what I would do is store the status in a separate file,
similar to the .index file we have for sort-based shuffle. There's no memory
overhead and it's easy to read it back again when we're given a map task and we
see that an output block for it already exists.
Regarding shuffle files getting corrupted somehow, I think this is super
unlikely and I haven't seen many systems try to defend against this. If this
were an issue, we'd also have to worry about data cached with DISK_ONLY being
corrupted, etc. I think this is considered in systems like HDFS because they
store a huge amount of data for a very long time, but I don't think it's a
major problem in Spark, and we can always add checksums later if we see it
happen.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]