[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-29 Thread junyangq
Github user junyangq closed the pull request at:

https://github.com/apache/spark/pull/14666


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-19 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75513347
  
--- Diff: R/pkg/R/utils.R ---
@@ -689,3 +689,33 @@ getSparkContext <- function() {
   sc <- get(".sparkRjsc", envir = .sparkREnv)
   sc
 }
+
+is_master_local <- function(master) {
+  grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE)
+}
+
+getRemoteMasterInfo <- function(master) {
+  hostPort <- sub("spark://", "", master)
+  host <- sub(":.*", "", hostPort)
--- End diff --

Oh did you mean the cluster mode?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-19 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75444995
  
--- Diff: R/pkg/R/utils.R ---
@@ -689,3 +689,33 @@ getSparkContext <- function() {
   sc <- get(".sparkRjsc", envir = .sparkREnv)
   sc
 }
+
+is_master_local <- function(master) {
+  grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE)
+}
+
+getRemoteMasterInfo <- function(master) {
+  hostPort <- sub("spark://", "", master)
+  host <- sub(":.*", "", hostPort)
--- End diff --

yes, sparkR support yarn-cluster. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-19 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75440848
  
--- Diff: R/pkg/R/utils.R ---
@@ -689,3 +689,33 @@ getSparkContext <- function() {
   sc <- get(".sparkRjsc", envir = .sparkREnv)
   sc
 }
+
+is_master_local <- function(master) {
+  grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE)
+}
+
+getRemoteMasterInfo <- function(master) {
+  hostPort <- sub("spark://", "", master)
+  host <- sub(":.*", "", hostPort)
--- End diff --

Just fixed the client mode (similar to local mode). Was YARN cluster 
supported originally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75289101
  
--- Diff: R/pkg/R/utils.R ---
@@ -689,3 +689,33 @@ getSparkContext <- function() {
   sc <- get(".sparkRjsc", envir = .sparkREnv)
   sc
 }
+
+is_master_local <- function(master) {
+  grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE)
+}
+
+getRemoteMasterInfo <- function(master) {
+  hostPort <- sub("spark://", "", master)
+  host <- sub(":.*", "", hostPort)
--- End diff --

is this going to break YARN client and cluster mode then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-18 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75275020
  
--- Diff: R/pkg/R/utils.R ---
@@ -689,3 +689,33 @@ getSparkContext <- function() {
   sc <- get(".sparkRjsc", envir = .sparkREnv)
   sc
 }
+
+is_master_local <- function(master) {
+  grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE)
+}
+
+getRemoteMasterInfo <- function(master) {
+  hostPort <- sub("spark://", "", master)
+  host <- sub(":.*", "", hostPort)
--- End diff --

Yeah, that's correct. For others like YARN cluster, it could be a little 
more complicated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-17 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75239846
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -159,55 +159,81 @@ sparkR.sparkContext <- function(
   warning(paste("sparkPackages has no effect when using spark-submit 
or sparkR shell",
 " please use the --packages commandline instead", sep 
= ","))
 }
+host <- "localhost"
 backendPort <- existingPort
   } else {
-path <- tempfile(pattern = "backend_port")
-submitOps <- getClientModeSparkSubmitOpts(
+
+if (!nzchar(master) || is_master_local(master)) {
+  path <- tempfile(pattern = "backend_port")
+  submitOps <- getClientModeSparkSubmitOpts(
 Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
 sparkEnvirMap)
-launchBackend(
+  launchBackend(
 args = path,
 sparkHome = sparkHome,
 jars = jars,
 sparkSubmitOpts = submitOps,
 packages = packages)
-# wait atmost 100 seconds for JVM to launch
-wait <- 0.1
-for (i in 1:25) {
-  Sys.sleep(wait)
-  if (file.exists(path)) {
-break
+  # wait atmost 100 seconds for JVM to launch
+  wait <- 0.1
+  for (i in 1:25) {
+Sys.sleep(wait)
+if (file.exists(path)) {
+  break
+}
+wait <- wait * 1.25
   }
-  wait <- wait * 1.25
-}
-if (!file.exists(path)) {
-  stop("JVM is not ready after 10 seconds")
-}
-f <- file(path, open = "rb")
-backendPort <- readInt(f)
-monitorPort <- readInt(f)
-rLibPath <- readString(f)
-close(f)
-file.remove(path)
-if (length(backendPort) == 0 || backendPort == 0 ||
-length(monitorPort) == 0 || monitorPort == 0 ||
-length(rLibPath) != 1) {
-  stop("JVM failed to launch")
+  if (!file.exists(path)) {
+stop("JVM is not ready after 10 seconds")
+  }
+  f <- file(path, open = "rb")
+  backendPort <- readInt(f)
+  monitorPort <- readInt(f)
+  rLibPath <- readString(f)
+  close(f)
+  file.remove(path)
+  if (length(backendPort) == 0 || backendPort == 0 ||
+  length(monitorPort) == 0 || monitorPort == 0 ||
+  length(rLibPath) != 1) {
+stop("JVM failed to launch")
+  }
+  if (rLibPath != "") {
+assign(".libPath", rLibPath, envir = .sparkREnv)
+.libPaths(c(rLibPath, .libPaths()))
+  }
+  host <- "localhost"
+} else {
+  backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) {
+sparkEnvirMap[["backend.port"]]
--- End diff --

Never mind, I mess it with host. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-17 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75239809
  
--- Diff: R/pkg/R/utils.R ---
@@ -689,3 +689,33 @@ getSparkContext <- function() {
   sc <- get(".sparkRjsc", envir = .sparkREnv)
   sc
 }
+
+is_master_local <- function(master) {
+  grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE)
+}
+
+getRemoteMasterInfo <- function(master) {
+  hostPort <- sub("spark://", "", master)
+  host <- sub(":.*", "", hostPort)
--- End diff --

Does that it mean it only support standalone mode ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-17 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75234053
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -159,55 +159,81 @@ sparkR.sparkContext <- function(
   warning(paste("sparkPackages has no effect when using spark-submit 
or sparkR shell",
 " please use the --packages commandline instead", sep 
= ","))
 }
+host <- "localhost"
 backendPort <- existingPort
   } else {
-path <- tempfile(pattern = "backend_port")
-submitOps <- getClientModeSparkSubmitOpts(
+
+if (!nzchar(master) || is_master_local(master)) {
+  path <- tempfile(pattern = "backend_port")
+  submitOps <- getClientModeSparkSubmitOpts(
 Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
 sparkEnvirMap)
-launchBackend(
+  launchBackend(
 args = path,
 sparkHome = sparkHome,
 jars = jars,
 sparkSubmitOpts = submitOps,
 packages = packages)
-# wait atmost 100 seconds for JVM to launch
-wait <- 0.1
-for (i in 1:25) {
-  Sys.sleep(wait)
-  if (file.exists(path)) {
-break
+  # wait atmost 100 seconds for JVM to launch
+  wait <- 0.1
+  for (i in 1:25) {
+Sys.sleep(wait)
+if (file.exists(path)) {
+  break
+}
+wait <- wait * 1.25
   }
-  wait <- wait * 1.25
-}
-if (!file.exists(path)) {
-  stop("JVM is not ready after 10 seconds")
-}
-f <- file(path, open = "rb")
-backendPort <- readInt(f)
-monitorPort <- readInt(f)
-rLibPath <- readString(f)
-close(f)
-file.remove(path)
-if (length(backendPort) == 0 || backendPort == 0 ||
-length(monitorPort) == 0 || monitorPort == 0 ||
-length(rLibPath) != 1) {
-  stop("JVM failed to launch")
+  if (!file.exists(path)) {
+stop("JVM is not ready after 10 seconds")
+  }
+  f <- file(path, open = "rb")
+  backendPort <- readInt(f)
+  monitorPort <- readInt(f)
+  rLibPath <- readString(f)
+  close(f)
+  file.remove(path)
+  if (length(backendPort) == 0 || backendPort == 0 ||
+  length(monitorPort) == 0 || monitorPort == 0 ||
+  length(rLibPath) != 1) {
+stop("JVM failed to launch")
+  }
+  if (rLibPath != "") {
+assign(".libPath", rLibPath, envir = .sparkREnv)
+.libPaths(c(rLibPath, .libPaths()))
+  }
+  host <- "localhost"
+} else {
+  backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) {
+sparkEnvirMap[["backend.port"]]
--- End diff --

Not really. I just used the default one. Any concern about that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-17 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75233185
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -159,55 +159,81 @@ sparkR.sparkContext <- function(
   warning(paste("sparkPackages has no effect when using spark-submit 
or sparkR shell",
 " please use the --packages commandline instead", sep 
= ","))
 }
+host <- "localhost"
 backendPort <- existingPort
   } else {
-path <- tempfile(pattern = "backend_port")
-submitOps <- getClientModeSparkSubmitOpts(
+
+if (!nzchar(master) || is_master_local(master)) {
+  path <- tempfile(pattern = "backend_port")
+  submitOps <- getClientModeSparkSubmitOpts(
 Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
 sparkEnvirMap)
-launchBackend(
+  launchBackend(
 args = path,
 sparkHome = sparkHome,
 jars = jars,
 sparkSubmitOpts = submitOps,
 packages = packages)
-# wait atmost 100 seconds for JVM to launch
-wait <- 0.1
-for (i in 1:25) {
-  Sys.sleep(wait)
-  if (file.exists(path)) {
-break
+  # wait atmost 100 seconds for JVM to launch
+  wait <- 0.1
+  for (i in 1:25) {
+Sys.sleep(wait)
+if (file.exists(path)) {
+  break
+}
+wait <- wait * 1.25
   }
-  wait <- wait * 1.25
-}
-if (!file.exists(path)) {
-  stop("JVM is not ready after 10 seconds")
-}
-f <- file(path, open = "rb")
-backendPort <- readInt(f)
-monitorPort <- readInt(f)
-rLibPath <- readString(f)
-close(f)
-file.remove(path)
-if (length(backendPort) == 0 || backendPort == 0 ||
-length(monitorPort) == 0 || monitorPort == 0 ||
-length(rLibPath) != 1) {
-  stop("JVM failed to launch")
+  if (!file.exists(path)) {
+stop("JVM is not ready after 10 seconds")
+  }
+  f <- file(path, open = "rb")
+  backendPort <- readInt(f)
+  monitorPort <- readInt(f)
+  rLibPath <- readString(f)
+  close(f)
+  file.remove(path)
+  if (length(backendPort) == 0 || backendPort == 0 ||
+  length(monitorPort) == 0 || monitorPort == 0 ||
+  length(rLibPath) != 1) {
+stop("JVM failed to launch")
+  }
+  if (rLibPath != "") {
+assign(".libPath", rLibPath, envir = .sparkREnv)
+.libPaths(c(rLibPath, .libPaths()))
+  }
+  host <- "localhost"
+} else {
+  backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) {
+sparkEnvirMap[["backend.port"]]
--- End diff --

So do you manually set that when you do test ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-17 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75232576
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -159,55 +159,81 @@ sparkR.sparkContext <- function(
   warning(paste("sparkPackages has no effect when using spark-submit 
or sparkR shell",
 " please use the --packages commandline instead", sep 
= ","))
 }
+host <- "localhost"
 backendPort <- existingPort
   } else {
-path <- tempfile(pattern = "backend_port")
-submitOps <- getClientModeSparkSubmitOpts(
+
+if (!nzchar(master) || is_master_local(master)) {
+  path <- tempfile(pattern = "backend_port")
+  submitOps <- getClientModeSparkSubmitOpts(
 Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
 sparkEnvirMap)
-launchBackend(
+  launchBackend(
 args = path,
 sparkHome = sparkHome,
 jars = jars,
 sparkSubmitOpts = submitOps,
 packages = packages)
-# wait atmost 100 seconds for JVM to launch
-wait <- 0.1
-for (i in 1:25) {
-  Sys.sleep(wait)
-  if (file.exists(path)) {
-break
+  # wait atmost 100 seconds for JVM to launch
+  wait <- 0.1
+  for (i in 1:25) {
+Sys.sleep(wait)
+if (file.exists(path)) {
+  break
+}
+wait <- wait * 1.25
   }
-  wait <- wait * 1.25
-}
-if (!file.exists(path)) {
-  stop("JVM is not ready after 10 seconds")
-}
-f <- file(path, open = "rb")
-backendPort <- readInt(f)
-monitorPort <- readInt(f)
-rLibPath <- readString(f)
-close(f)
-file.remove(path)
-if (length(backendPort) == 0 || backendPort == 0 ||
-length(monitorPort) == 0 || monitorPort == 0 ||
-length(rLibPath) != 1) {
-  stop("JVM failed to launch")
+  if (!file.exists(path)) {
+stop("JVM is not ready after 10 seconds")
+  }
+  f <- file(path, open = "rb")
+  backendPort <- readInt(f)
+  monitorPort <- readInt(f)
+  rLibPath <- readString(f)
+  close(f)
+  file.remove(path)
+  if (length(backendPort) == 0 || backendPort == 0 ||
+  length(monitorPort) == 0 || monitorPort == 0 ||
+  length(rLibPath) != 1) {
+stop("JVM failed to launch")
+  }
+  if (rLibPath != "") {
+assign(".libPath", rLibPath, envir = .sparkREnv)
+.libPaths(c(rLibPath, .libPaths()))
+  }
+  host <- "localhost"
+} else {
+  backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) {
+sparkEnvirMap[["backend.port"]]
--- End diff --

From `sparkConfig` in sparkR.session? Users can provide their own. 
Otherwise, the default ones will be used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-17 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75232452
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -369,7 +395,7 @@ sparkR.session <- function(
   if (!exists(".sparkRjsc", envir = .sparkREnv)) {
 sparkExecutorEnvMap <- new.env()
 sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, 
sparkExecutorEnvMap,
-   sparkJars, sparkPackages)
+sparkJars, sparkPackages)
--- End diff --

I thought the arguments should be aligned?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-17 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75232295
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -159,55 +159,81 @@ sparkR.sparkContext <- function(
   warning(paste("sparkPackages has no effect when using spark-submit 
or sparkR shell",
 " please use the --packages commandline instead", sep 
= ","))
 }
+host <- "localhost"
 backendPort <- existingPort
   } else {
-path <- tempfile(pattern = "backend_port")
-submitOps <- getClientModeSparkSubmitOpts(
+
+if (!nzchar(master) || is_master_local(master)) {
+  path <- tempfile(pattern = "backend_port")
+  submitOps <- getClientModeSparkSubmitOpts(
 Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
 sparkEnvirMap)
-launchBackend(
+  launchBackend(
 args = path,
 sparkHome = sparkHome,
 jars = jars,
 sparkSubmitOpts = submitOps,
 packages = packages)
-# wait atmost 100 seconds for JVM to launch
-wait <- 0.1
-for (i in 1:25) {
-  Sys.sleep(wait)
-  if (file.exists(path)) {
-break
+  # wait atmost 100 seconds for JVM to launch
+  wait <- 0.1
+  for (i in 1:25) {
+Sys.sleep(wait)
+if (file.exists(path)) {
+  break
+}
+wait <- wait * 1.25
   }
-  wait <- wait * 1.25
-}
-if (!file.exists(path)) {
-  stop("JVM is not ready after 10 seconds")
-}
-f <- file(path, open = "rb")
-backendPort <- readInt(f)
-monitorPort <- readInt(f)
-rLibPath <- readString(f)
-close(f)
-file.remove(path)
-if (length(backendPort) == 0 || backendPort == 0 ||
-length(monitorPort) == 0 || monitorPort == 0 ||
-length(rLibPath) != 1) {
-  stop("JVM failed to launch")
+  if (!file.exists(path)) {
+stop("JVM is not ready after 10 seconds")
+  }
+  f <- file(path, open = "rb")
+  backendPort <- readInt(f)
+  monitorPort <- readInt(f)
+  rLibPath <- readString(f)
+  close(f)
+  file.remove(path)
+  if (length(backendPort) == 0 || backendPort == 0 ||
+  length(monitorPort) == 0 || monitorPort == 0 ||
+  length(rLibPath) != 1) {
+stop("JVM failed to launch")
+  }
+  if (rLibPath != "") {
+assign(".libPath", rLibPath, envir = .sparkREnv)
+.libPaths(c(rLibPath, .libPaths()))
+  }
+  host <- "localhost"
+} else {
+  backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) {
+sparkEnvirMap[["backend.port"]]
+  } else {
+"8000"
+  }
+  monitorPort <- if (!is.null(sparkEnvirMap[["monitor.port"]])) {
+sparkEnvirMap[["monitor.port"]]
+  } else {
+"8001"
+  }
+  host <- getRemoteMasterInfo(master)$host
+  port <- getRemoteMasterInfo(master)$port
+  if (is.null(port)) {
+message(sprintf("Use backend port %s.", backendPort))
+  } else {
+message(sprintf("Use backedn port %s parsed from master.", port))
+backendPort <- port
+  }
+  master <- "local"   # have connected to RBackend, use local mode
 }
-assign(".monitorConn", socketConnection(port = monitorPort), envir = 
.sparkREnv)
+assign(".monitorConn", socketConnection(host = host, port = 
monitorPort),
+   envir = .sparkREnv)
 assign(".backendLaunched", 1, envir = .sparkREnv)
-if (rLibPath != "") {
-  assign(".libPath", rLibPath, envir = .sparkREnv)
-  .libPaths(c(rLibPath, .libPaths()))
-}
   }
 
   .sparkREnv$backendPort <- backendPort
   tryCatch({
-connectBackend("localhost", backendPort)
+connectBackend(host, backendPort)
   },
   error = function(err) {
-stop("Failed to connect JVM\n")
+stop(paste0("Failed to connect JVM\n", existingPort))
--- End diff --

Oh this shouldn't have been changed in this PR...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-16 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75050631
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -159,55 +159,81 @@ sparkR.sparkContext <- function(
   warning(paste("sparkPackages has no effect when using spark-submit 
or sparkR shell",
 " please use the --packages commandline instead", sep 
= ","))
 }
+host <- "localhost"
 backendPort <- existingPort
   } else {
-path <- tempfile(pattern = "backend_port")
-submitOps <- getClientModeSparkSubmitOpts(
+
+if (!nzchar(master) || is_master_local(master)) {
+  path <- tempfile(pattern = "backend_port")
+  submitOps <- getClientModeSparkSubmitOpts(
 Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
 sparkEnvirMap)
-launchBackend(
+  launchBackend(
 args = path,
 sparkHome = sparkHome,
 jars = jars,
 sparkSubmitOpts = submitOps,
 packages = packages)
-# wait atmost 100 seconds for JVM to launch
-wait <- 0.1
-for (i in 1:25) {
-  Sys.sleep(wait)
-  if (file.exists(path)) {
-break
+  # wait atmost 100 seconds for JVM to launch
+  wait <- 0.1
+  for (i in 1:25) {
+Sys.sleep(wait)
+if (file.exists(path)) {
+  break
+}
+wait <- wait * 1.25
   }
-  wait <- wait * 1.25
-}
-if (!file.exists(path)) {
-  stop("JVM is not ready after 10 seconds")
-}
-f <- file(path, open = "rb")
-backendPort <- readInt(f)
-monitorPort <- readInt(f)
-rLibPath <- readString(f)
-close(f)
-file.remove(path)
-if (length(backendPort) == 0 || backendPort == 0 ||
-length(monitorPort) == 0 || monitorPort == 0 ||
-length(rLibPath) != 1) {
-  stop("JVM failed to launch")
+  if (!file.exists(path)) {
+stop("JVM is not ready after 10 seconds")
+  }
+  f <- file(path, open = "rb")
+  backendPort <- readInt(f)
+  monitorPort <- readInt(f)
+  rLibPath <- readString(f)
+  close(f)
+  file.remove(path)
+  if (length(backendPort) == 0 || backendPort == 0 ||
+  length(monitorPort) == 0 || monitorPort == 0 ||
+  length(rLibPath) != 1) {
+stop("JVM failed to launch")
+  }
+  if (rLibPath != "") {
+assign(".libPath", rLibPath, envir = .sparkREnv)
+.libPaths(c(rLibPath, .libPaths()))
+  }
+  host <- "localhost"
+} else {
+  backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) {
+sparkEnvirMap[["backend.port"]]
--- End diff --

How is backend.port passed to R process ? I don't see how this environment 
variable is set. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-16 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75045315
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -159,55 +159,81 @@ sparkR.sparkContext <- function(
   warning(paste("sparkPackages has no effect when using spark-submit 
or sparkR shell",
 " please use the --packages commandline instead", sep 
= ","))
 }
+host <- "localhost"
 backendPort <- existingPort
   } else {
-path <- tempfile(pattern = "backend_port")
-submitOps <- getClientModeSparkSubmitOpts(
+
+if (!nzchar(master) || is_master_local(master)) {
+  path <- tempfile(pattern = "backend_port")
+  submitOps <- getClientModeSparkSubmitOpts(
 Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
 sparkEnvirMap)
-launchBackend(
+  launchBackend(
--- End diff --

indentation issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-16 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75045227
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -369,7 +395,7 @@ sparkR.session <- function(
   if (!exists(".sparkRjsc", envir = .sparkREnv)) {
 sparkExecutorEnvMap <- new.env()
 sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, 
sparkExecutorEnvMap,
-   sparkJars, sparkPackages)
+sparkJars, sparkPackages)
--- End diff --

indention 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-16 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14666#discussion_r75045186
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -159,55 +159,81 @@ sparkR.sparkContext <- function(
   warning(paste("sparkPackages has no effect when using spark-submit 
or sparkR shell",
 " please use the --packages commandline instead", sep 
= ","))
 }
+host <- "localhost"
 backendPort <- existingPort
   } else {
-path <- tempfile(pattern = "backend_port")
-submitOps <- getClientModeSparkSubmitOpts(
+
+if (!nzchar(master) || is_master_local(master)) {
+  path <- tempfile(pattern = "backend_port")
+  submitOps <- getClientModeSparkSubmitOpts(
 Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"),
 sparkEnvirMap)
-launchBackend(
+  launchBackend(
 args = path,
 sparkHome = sparkHome,
 jars = jars,
 sparkSubmitOpts = submitOps,
 packages = packages)
-# wait atmost 100 seconds for JVM to launch
-wait <- 0.1
-for (i in 1:25) {
-  Sys.sleep(wait)
-  if (file.exists(path)) {
-break
+  # wait atmost 100 seconds for JVM to launch
+  wait <- 0.1
+  for (i in 1:25) {
+Sys.sleep(wait)
+if (file.exists(path)) {
+  break
+}
+wait <- wait * 1.25
   }
-  wait <- wait * 1.25
-}
-if (!file.exists(path)) {
-  stop("JVM is not ready after 10 seconds")
-}
-f <- file(path, open = "rb")
-backendPort <- readInt(f)
-monitorPort <- readInt(f)
-rLibPath <- readString(f)
-close(f)
-file.remove(path)
-if (length(backendPort) == 0 || backendPort == 0 ||
-length(monitorPort) == 0 || monitorPort == 0 ||
-length(rLibPath) != 1) {
-  stop("JVM failed to launch")
+  if (!file.exists(path)) {
+stop("JVM is not ready after 10 seconds")
+  }
+  f <- file(path, open = "rb")
+  backendPort <- readInt(f)
+  monitorPort <- readInt(f)
+  rLibPath <- readString(f)
+  close(f)
+  file.remove(path)
+  if (length(backendPort) == 0 || backendPort == 0 ||
+  length(monitorPort) == 0 || monitorPort == 0 ||
+  length(rLibPath) != 1) {
+stop("JVM failed to launch")
+  }
+  if (rLibPath != "") {
+assign(".libPath", rLibPath, envir = .sparkREnv)
+.libPaths(c(rLibPath, .libPaths()))
+  }
+  host <- "localhost"
+} else {
+  backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) {
+sparkEnvirMap[["backend.port"]]
+  } else {
+"8000"
+  }
+  monitorPort <- if (!is.null(sparkEnvirMap[["monitor.port"]])) {
+sparkEnvirMap[["monitor.port"]]
+  } else {
+"8001"
+  }
+  host <- getRemoteMasterInfo(master)$host
+  port <- getRemoteMasterInfo(master)$port
+  if (is.null(port)) {
+message(sprintf("Use backend port %s.", backendPort))
+  } else {
+message(sprintf("Use backedn port %s parsed from master.", port))
+backendPort <- port
+  }
+  master <- "local"   # have connected to RBackend, use local mode
 }
-assign(".monitorConn", socketConnection(port = monitorPort), envir = 
.sparkREnv)
+assign(".monitorConn", socketConnection(host = host, port = 
monitorPort),
+   envir = .sparkREnv)
 assign(".backendLaunched", 1, envir = .sparkREnv)
-if (rLibPath != "") {
-  assign(".libPath", rLibPath, envir = .sparkREnv)
-  .libPaths(c(rLibPath, .libPaths()))
-}
   }
 
   .sparkREnv$backendPort <- backendPort
   tryCatch({
-connectBackend("localhost", backendPort)
+connectBackend(host, backendPort)
   },
   error = function(err) {
-stop("Failed to connect JVM\n")
+stop(paste0("Failed to connect JVM\n", existingPort))
--- End diff --

add host here ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...

2016-08-16 Thread junyangq
GitHub user junyangq opened a pull request:

https://github.com/apache/spark/pull/14666

[SPARK-16578][SparkR] Enable SparkR to connect to a remote machine running 
RBackend

## What changes were proposed in this pull request?

This PR tries to enable SparkR to connect to a remote machine that runs the 
RBackend. 

* Default port numbers are used, backend 8000 and monitor 8001, if not set 
otherwise. They can be set by `spark.r.backendPort` and `spark.r.monitorPort` 
on the RBackend side and by `backend.port` and `monitor.port` in `sparkConfig` 
on the client side.

## How was this patch tested?

R unit test. Manual test: connect to local standalone cluster and to AWS 
EC2 instance.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junyangq/spark SPARK-16578-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14666


commit b221b5fb4043dc891c74dee513dcacffd6951b1a
Author: Junyang Qian 
Date:   2016-08-04T07:05:27Z

Enable SparkR to talk to a remote cluster.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org