GitHub user yanboliang opened a pull request:
https://github.com/apache/spark/pull/15888
[SPARK-18444][SPARKR] SparkR running in yarn-cluster mode should not
download Spark package.
## What changes were proposed in this pull request?
When running SparkR job in yarn-cluster mode, it will download Spark
package from apache website which is not necessary.
```
./bin/spark-submit --master yarn-cluster ./examples/src/main/r/dataframe.R
```
The following is output:
```
Attaching package: âSparkRâ
The following objects are masked from âpackage:statsâ:
cov, filter, lag, na.omit, predict, sd, var, window
The following objects are masked from âpackage:baseâ:
as.data.frame, colnames, colnames<-, drop, endsWith, intersect,
rank, rbind, sample, startsWith, subset, summary, transform, union
Spark not found in SPARK_HOME:
Spark not found in the cache directory. Installation will start.
MirrorUrl not provided.
Looking for preferred site from apache website...
......
```
There's no ```SPARK_HOME``` in yarn-cluster mode since the R process is in
a remote host of the yarn cluster rather than in the client host. The JVM comes
up first and the R process then connects to it. So in such cases we should
never have to download Spark as Spark is already running.
## How was this patch tested?
Offline test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yanboliang/spark spark-18444
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15888.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15888
----
commit 16aa40086f8e2e58f7e3d7c3ec95a2e4d5967e5b
Author: Yanbo Liang <[email protected]>
Date: 2016-11-15T10:01:38Z
SparkR running in yarn-cluster mode should not download Spark package.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]