HyukjinKwon opened a new pull request #33887:
URL: https://github.com/apache/spark/pull/33887


   ### What changes were proposed in this pull request?
   
   This PR proposes to ask users if they want to download and install SparkR 
when they install SparkR from CRAN.
   
   `SPARKR_ASK_INSTALLATION` environment variable was added in case other 
notebook projects are affected.
   
   ### Why are the changes needed?
   
   This is required for CRAN. Currently SparkR is removed: 
https://cran.r-project.org/web/packages/SparkR/index.html.
   See also 
https://lists.apache.org/thread.html/r02b9046273a518e347dfe85f864d23d63d3502c6c1edd33df17a3b86%40%3Cdev.spark.apache.org%3E
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, `sparkR.session(...)` will ask if users want to download and install 
Spark package or not if they are in the plain R shall or `Rscript`.
   
   ### How was this patch tested?
   
   Manually tested as below:
   **R**
   
   Valid input (`n`):
   
   ```
   > sparkR.session(master="local")
   Spark not found in SPARK_HOME:
   Will you download and install (or reuse if it exists) Spark package under 
the cache [/.../Caches/spark]? (y/n): n
   ```
   ```
   Error in sparkCheckInstall(sparkHome, master, deployMode) :
     Please make sure Spark package is installed in this machine.
   - If there is one, set the path in sparkHome parameter or environment 
variable SPARK_HOME.
   - If not, you may run install.spark function to do the job.
   ```
   
   Invalid input:
   
   ```
   > sparkR.session(master="local")
   Spark not found in SPARK_HOME:
   Will you download and install (or reuse if it exists) Spark package under 
the cache [/.../Caches/spark]? (y/n): abc
   ```
   ```
   Will you download and install (or reuse if it exists) Spark package under 
the cache [/.../Caches/spark]? (y/n):
   ```
   
   Valid input (`y`):
   
   ```
   > sparkR.session(master="local")
   Will you download and install (or reuse if it exists) Spark package under 
the cache [/.../Caches/spark]? (y/n): y
   Spark not found in the cache directory. Installation will start.
   MirrorUrl not provided.
   Looking for preferred site from apache website...
   Preferred mirror site found: https://ftp.riken.jp/net/apache/spark
   Downloading spark-3.3.0 for Hadoop 2.7 from:
   - 
https://ftp.riken.jp/net/apache/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz
   trying URL 
'https://ftp.riken.jp/net/apache/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz'
   ...
   ```
   
   
   **Rscript**
   
   ```
   cat tmp.R
   ```
   ```
   library(SparkR, lib.loc = c(file.path(".", "R", "lib")))
   sparkR.session(master="local")
   ```
   
   ```
   Rscript tmp.R
   ```
   
   Valid input (`n`):
   
   ```
   Spark not found in SPARK_HOME:
   Will you download and install (or reuse if it exists) Spark package under 
the cache [/.../Caches/spark]? (y/n): n
   ```
   ```
   Error in sparkCheckInstall(sparkHome, master, deployMode) :
     Please make sure Spark package is installed in this machine.
   - If there is one, set the path in sparkHome parameter or environment 
variable SPARK_HOME.
   - If not, you may run install.spark function to do the job.
   Calls: sparkR.session -> sparkCheckInstall
   ```
   
   Invalid input:
   
   ```
   Spark not found in SPARK_HOME:
   Will you download and install (or reuse if it exists) Spark package under 
the cache [/.../Caches/spark]? (y/n): abc
   ```
   ```
   Will you download and install (or reuse if it exists) Spark package under 
the cache [/.../Caches/spark]? (y/n):
   ```
   
   Valid input (`y`):
   
   ```
   ...
   Spark not found in SPARK_HOME:
   Will you download and install (or reuse if it exists) Spark package under 
the cache [/Users/hyukjin.kwon/Library/Caches/spark]? (y/n): y
   Spark not found in the cache directory. Installation will start.
   MirrorUrl not provided.
   Looking for preferred site from apache website...
   Preferred mirror site found: https://ftp.riken.jp/net/apache/spark
   Downloading spark-3.3.0 for Hadoop 2.7 from:
   ...
   ```
   
   `bin/sparkR` and `bin/spark-submit *.R` are not affected (tested).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to