HyukjinKwon opened a new pull request #33887: URL: https://github.com/apache/spark/pull/33887
### What changes were proposed in this pull request? This PR proposes to ask users if they want to download and install SparkR when they install SparkR from CRAN. `SPARKR_ASK_INSTALLATION` environment variable was added in case other notebook projects are affected. ### Why are the changes needed? This is required for CRAN. Currently SparkR is removed: https://cran.r-project.org/web/packages/SparkR/index.html. See also https://lists.apache.org/thread.html/r02b9046273a518e347dfe85f864d23d63d3502c6c1edd33df17a3b86%40%3Cdev.spark.apache.org%3E ### Does this PR introduce _any_ user-facing change? Yes, `sparkR.session(...)` will ask if users want to download and install Spark package or not if they are in the plain R shall or `Rscript`. ### How was this patch tested? Manually tested as below: **R** Valid input (`n`): ``` > sparkR.session(master="local") Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): n ``` ``` Error in sparkCheckInstall(sparkHome, master, deployMode) : Please make sure Spark package is installed in this machine. - If there is one, set the path in sparkHome parameter or environment variable SPARK_HOME. - If not, you may run install.spark function to do the job. ``` Invalid input: ``` > sparkR.session(master="local") Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): abc ``` ``` Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): ``` Valid input (`y`): ``` > sparkR.session(master="local") Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): y Spark not found in the cache directory. Installation will start. MirrorUrl not provided. Looking for preferred site from apache website... Preferred mirror site found: https://ftp.riken.jp/net/apache/spark Downloading spark-3.3.0 for Hadoop 2.7 from: - https://ftp.riken.jp/net/apache/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz trying URL 'https://ftp.riken.jp/net/apache/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz' ... ``` **Rscript** ``` cat tmp.R ``` ``` library(SparkR, lib.loc = c(file.path(".", "R", "lib"))) sparkR.session(master="local") ``` ``` Rscript tmp.R ``` Valid input (`n`): ``` Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): n ``` ``` Error in sparkCheckInstall(sparkHome, master, deployMode) : Please make sure Spark package is installed in this machine. - If there is one, set the path in sparkHome parameter or environment variable SPARK_HOME. - If not, you may run install.spark function to do the job. Calls: sparkR.session -> sparkCheckInstall ``` Invalid input: ``` Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): abc ``` ``` Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): ``` Valid input (`y`): ``` ... Spark not found in SPARK_HOME: Will you download and install (or reuse if it exists) Spark package under the cache [/Users/hyukjin.kwon/Library/Caches/spark]? (y/n): y Spark not found in the cache directory. Installation will start. MirrorUrl not provided. Looking for preferred site from apache website... Preferred mirror site found: https://ftp.riken.jp/net/apache/spark Downloading spark-3.3.0 for Hadoop 2.7 from: ... ``` `bin/sparkR` and `bin/spark-submit *.R` are not affected (tested). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
