spark git commit: [SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check

2017-02-14 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/master ab9872db1 -> a3626ca33


[SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check

## What changes were proposed in this pull request?

- this is cause by changes in SPARK-18444, SPARK-18643 that we no longer 
install Spark when `master = ""` (default), but also related to SPARK-18449 
since the real `master` value is not known at the time the R code in 
`sparkR.session` is run. (`master` cannot default to "local" since it could be 
overridden by spark-submit commandline or spark config)
- as a result, while running SparkR as a package in IDE is working fine, CRAN 
check is not as it is launching it via non-interactive script
- fix is to add check to the beginning of each test and vignettes; the same 
would also work by changing `sparkR.session()` to `sparkR.session(master = 
"local")` in tests, but I think being more explicit is better.

## How was this patch tested?

Tested this by reverting version to 2.1, since it needs to download the release 
jar with matching version. But since there are changes in 2.2 (specifically 
around SparkR ML) that are incompatible with 2.1, some tests are failing in 
this config. Will need to port this to branch-2.1 and retest with 2.1 release 
jar.

manually as:
```
# modify DESCRIPTION to revert version to 2.1.0
SPARK_HOME=/usr/spark R CMD build pkg
# run cran check without SPARK_HOME
R CMD check --as-cran SparkR_2.1.0.tar.gz
```

Author: Felix Cheung 

Closes #16720 from felixcheung/rcranchecktest.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a3626ca3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a3626ca3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a3626ca3

Branch: refs/heads/master
Commit: a3626ca333e6e1881e2f09ccae0fa8fa7243223e
Parents: ab9872d
Author: Felix Cheung 
Authored: Tue Feb 14 13:51:27 2017 -0800
Committer: Shivaram Venkataraman 
Committed: Tue Feb 14 13:51:27 2017 -0800

--
 R/pkg/R/install.R| 16 +---
 R/pkg/R/sparkR.R |  6 ++
 R/pkg/tests/run-all.R|  3 +++
 R/pkg/vignettes/sparkr-vignettes.Rmd |  3 +++
 4 files changed, 21 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a3626ca3/R/pkg/R/install.R
--
diff --git a/R/pkg/R/install.R b/R/pkg/R/install.R
index 72386e6..4ca7aa6 100644
--- a/R/pkg/R/install.R
+++ b/R/pkg/R/install.R
@@ -21,9 +21,9 @@
 #' Download and Install Apache Spark to a Local Directory
 #'
 #' \code{install.spark} downloads and installs Spark to a local directory if
-#' it is not found. The Spark version we use is the same as the SparkR version.
-#' Users can specify a desired Hadoop version, the remote mirror site, and
-#' the directory where the package is installed locally.
+#' it is not found. If SPARK_HOME is set in the environment, and that 
directory is found, that is
+#' returned. The Spark version we use is the same as the SparkR version. Users 
can specify a desired
+#' Hadoop version, the remote mirror site, and the directory where the package 
is installed locally.
 #'
 #' The full url of remote file is inferred from \code{mirrorUrl} and 
\code{hadoopVersion}.
 #' \code{mirrorUrl} specifies the remote path to a Spark folder. It is 
followed by a subfolder
@@ -68,6 +68,16 @@
 #'  \href{http://spark.apache.org/downloads.html}{Apache Spark}
 install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
   localDir = NULL, overwrite = FALSE) {
+  sparkHome <- Sys.getenv("SPARK_HOME")
+  if (isSparkRShell()) {
+stopifnot(nchar(sparkHome) > 0)
+message("Spark is already running in sparkR shell.")
+return(invisible(sparkHome))
+  } else if (!is.na(file.info(sparkHome)$isdir)) {
+message("Spark package found in SPARK_HOME: ", sparkHome)
+return(invisible(sparkHome))
+  }
+
   version <- paste0("spark-", packageVersion("SparkR"))
   hadoopVersion <- tolower(hadoopVersion)
   hadoopVersionName <- hadoopVersionName(hadoopVersion)

http://git-wip-us.apache.org/repos/asf/spark/blob/a3626ca3/R/pkg/R/sparkR.R
--
diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R
index 870e76b..61773ed 100644
--- a/R/pkg/R/sparkR.R
+++ b/R/pkg/R/sparkR.R
@@ -588,13 +588,11 @@ processSparkPackages <- function(packages) {
 sparkCheckInstall <- function(sparkHome, master, deployMode) {
   if (!isSparkRShell()) {
 if (!is.na(file.info(sparkHome)$isdir)) {
-  msg <- paste0("Spark package found in SPARK_HOME: ", sparkHome)
-  message(msg)
+  message("Spark package found in SPARK_HOME: ", sparkHome)
   NULL
 } else {
   if (interactive() 

spark git commit: [SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check

2017-02-14 Thread shivaram
Repository: spark
Updated Branches:
  refs/heads/branch-2.1 f837ced4c -> 7763b0b8b


[SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check

## What changes were proposed in this pull request?

- this is cause by changes in SPARK-18444, SPARK-18643 that we no longer 
install Spark when `master = ""` (default), but also related to SPARK-18449 
since the real `master` value is not known at the time the R code in 
`sparkR.session` is run. (`master` cannot default to "local" since it could be 
overridden by spark-submit commandline or spark config)
- as a result, while running SparkR as a package in IDE is working fine, CRAN 
check is not as it is launching it via non-interactive script
- fix is to add check to the beginning of each test and vignettes; the same 
would also work by changing `sparkR.session()` to `sparkR.session(master = 
"local")` in tests, but I think being more explicit is better.

## How was this patch tested?

Tested this by reverting version to 2.1, since it needs to download the release 
jar with matching version. But since there are changes in 2.2 (specifically 
around SparkR ML) that are incompatible with 2.1, some tests are failing in 
this config. Will need to port this to branch-2.1 and retest with 2.1 release 
jar.

manually as:
```
# modify DESCRIPTION to revert version to 2.1.0
SPARK_HOME=/usr/spark R CMD build pkg
# run cran check without SPARK_HOME
R CMD check --as-cran SparkR_2.1.0.tar.gz
```

Author: Felix Cheung 

Closes #16720 from felixcheung/rcranchecktest.

(cherry picked from commit a3626ca333e6e1881e2f09ccae0fa8fa7243223e)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7763b0b8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7763b0b8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7763b0b8

Branch: refs/heads/branch-2.1
Commit: 7763b0b8bd33b0baa99434136528efb5de261919
Parents: f837ced
Author: Felix Cheung 
Authored: Tue Feb 14 13:51:27 2017 -0800
Committer: Shivaram Venkataraman 
Committed: Tue Feb 14 13:51:37 2017 -0800

--
 R/pkg/R/install.R| 16 +---
 R/pkg/R/sparkR.R |  6 ++
 R/pkg/tests/run-all.R|  3 +++
 R/pkg/vignettes/sparkr-vignettes.Rmd |  3 +++
 4 files changed, 21 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7763b0b8/R/pkg/R/install.R
--
diff --git a/R/pkg/R/install.R b/R/pkg/R/install.R
index 72386e6..4ca7aa6 100644
--- a/R/pkg/R/install.R
+++ b/R/pkg/R/install.R
@@ -21,9 +21,9 @@
 #' Download and Install Apache Spark to a Local Directory
 #'
 #' \code{install.spark} downloads and installs Spark to a local directory if
-#' it is not found. The Spark version we use is the same as the SparkR version.
-#' Users can specify a desired Hadoop version, the remote mirror site, and
-#' the directory where the package is installed locally.
+#' it is not found. If SPARK_HOME is set in the environment, and that 
directory is found, that is
+#' returned. The Spark version we use is the same as the SparkR version. Users 
can specify a desired
+#' Hadoop version, the remote mirror site, and the directory where the package 
is installed locally.
 #'
 #' The full url of remote file is inferred from \code{mirrorUrl} and 
\code{hadoopVersion}.
 #' \code{mirrorUrl} specifies the remote path to a Spark folder. It is 
followed by a subfolder
@@ -68,6 +68,16 @@
 #'  \href{http://spark.apache.org/downloads.html}{Apache Spark}
 install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
   localDir = NULL, overwrite = FALSE) {
+  sparkHome <- Sys.getenv("SPARK_HOME")
+  if (isSparkRShell()) {
+stopifnot(nchar(sparkHome) > 0)
+message("Spark is already running in sparkR shell.")
+return(invisible(sparkHome))
+  } else if (!is.na(file.info(sparkHome)$isdir)) {
+message("Spark package found in SPARK_HOME: ", sparkHome)
+return(invisible(sparkHome))
+  }
+
   version <- paste0("spark-", packageVersion("SparkR"))
   hadoopVersion <- tolower(hadoopVersion)
   hadoopVersionName <- hadoopVersionName(hadoopVersion)

http://git-wip-us.apache.org/repos/asf/spark/blob/7763b0b8/R/pkg/R/sparkR.R
--
diff --git a/R/pkg/R/sparkR.R b/R/pkg/R/sparkR.R
index 870e76b..61773ed 100644
--- a/R/pkg/R/sparkR.R
+++ b/R/pkg/R/sparkR.R
@@ -588,13 +588,11 @@ processSparkPackages <- function(packages) {
 sparkCheckInstall <- function(sparkHome, master, deployMode) {
   if (!isSparkRShell()) {
 if (!is.na(file.info(sparkHome)$isdir)) {
-  msg <- paste0("Spark package found in SPARK_HOME: ", sparkHome)
-  message(msg)