[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

JoshRosen Wed, 03 Sep 2014 16:03:07 -0700

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1616#issuecomment-54380698
  
    @andrewor14 I don't think that it was a problem before, but the reason is 
perhaps a little subtle:
    
    The old `fetchFile` has a workflow where it first downloads the file to a 
temporary file and then moves that temporary file to its final destination.  
Although the parent directory of the temporary files (`spark.local.dir`) is 
shared by all executors, the actual temporary file is created through 
`File.createTempFile`, so it should have a unique name.  After downloading the 
file, `fetchFile` moves it to `targetDir` and renames it.  When fetching a file 
on an executor, `targetDir` is `SparkFiles.getRootDirectory`, which is a 
per-application temporary directory, so there's no potential for 
cross-application conflicts. 
    
    This PR uses that same code path to perform the actual download.  The 
potential conflict occurs because `targetDir` is `localDir` when downloading a 
file that's not present in the cache.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...

Reply via email to