Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1616#issuecomment-54380698
@andrewor14 I don't think that it was a problem before, but the reason is
perhaps a little subtle:
The old `fetchFile` has a workflow where it first downloads the file to a
temporary file and then moves that temporary file to its final destination.
Although the parent directory of the temporary files (`spark.local.dir`) is
shared by all executors, the actual temporary file is created through
`File.createTempFile`, so it should have a unique name. After downloading the
file, `fetchFile` moves it to `targetDir` and renames it. When fetching a file
on an executor, `targetDir` is `SparkFiles.getRootDirectory`, which is a
per-application temporary directory, so there's no potential for
cross-application conflicts.
This PR uses that same code path to perform the actual download. The
potential conflict occurs because `targetDir` is `localDir` when downloading a
file that's not present in the cache.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]