GitHub user kishorvpatil opened a pull request:
https://github.com/apache/spark/pull/15627
[SPARK-18099][YARN] Fail if same files added to distributed cache for
--files and --archives
## What changes were proposed in this pull request?
During spark-submit, if yarn dist cache is instructed to add same file
under --files and --archives, This code change ensures the spark yarn
distributed cache behaviour is retained i.e. to warn and fail if same files is
mentioned in both --files and --archives.
## How was this patch tested?
Manually tested:
1. if same jar is mentioned in --jars and --files it will continue to
submit the job.
- basically functionality [SPARK-14423] #12203 is unchanged
2. if same file is mentioned in --files and --archives it will fail to
submit the job.
Please review
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before
opening a pull request.
⦠under archives and files
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kishorvpatil/spark spark18099
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15627.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15627
----
commit 9bb16236ad7bbb982e0ffaa73899ebc11df9e6ee
Author: Kishor Patil <[email protected]>
Date: 2016-10-25T18:19:46Z
Dist cache yarn during submit should throw error for adding same file under
archives and files
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]