[GitHub] spark issue #14396: [SPARK-16787] SparkContext.addFile() should not throw if...

JoshRosen Fri, 29 Jul 2016 12:04:56 -0700

Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/14396
  
    A few other notes:
    
    - It's important that we keep the timestamps because they're necessary for 
cross-worker / cross-executor JAR + file download caching to work correctly. 
This is a relatively obscure feature which was added a few releases ago in 
order to improve scalability when running large clusters with many executors 
per worker. Theoretically different applications will generally have different 
timestamps for the same JAR so we could just as well use app ids for this but 
I'd rather not make such an invasive change now.
    - If we assume that `addFile()` will never be called with files with the 
same name and different contents, so as not brick executors, then allowing 
files to be downloaded so that the content-comparison can be performed on 
executors will lead to performance problems if `addFile` is repeatedly called 
with the same argument, since executors will repeatedly download the file 
despite it having identical contents.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14396: [SPARK-16787] SparkContext.addFile() should not throw if...

Reply via email to