[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user junyangq closed the pull request at: https://github.com/apache/spark/pull/14666 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75513347 --- Diff: R/pkg/R/utils.R --- @@ -689,3 +689,33 @@ getSparkContext <- function() { sc <- get(".sparkRjsc", envir = .sparkREnv) sc } + +is_master_local <- function(master) { + grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE) +} + +getRemoteMasterInfo <- function(master) { + hostPort <- sub("spark://", "", master) + host <- sub(":.*", "", hostPort) --- End diff -- Oh did you mean the cluster mode? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75444995 --- Diff: R/pkg/R/utils.R --- @@ -689,3 +689,33 @@ getSparkContext <- function() { sc <- get(".sparkRjsc", envir = .sparkREnv) sc } + +is_master_local <- function(master) { + grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE) +} + +getRemoteMasterInfo <- function(master) { + hostPort <- sub("spark://", "", master) + host <- sub(":.*", "", hostPort) --- End diff -- yes, sparkR support yarn-cluster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75440848 --- Diff: R/pkg/R/utils.R --- @@ -689,3 +689,33 @@ getSparkContext <- function() { sc <- get(".sparkRjsc", envir = .sparkREnv) sc } + +is_master_local <- function(master) { + grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE) +} + +getRemoteMasterInfo <- function(master) { + hostPort <- sub("spark://", "", master) + host <- sub(":.*", "", hostPort) --- End diff -- Just fixed the client mode (similar to local mode). Was YARN cluster supported originally? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75289101 --- Diff: R/pkg/R/utils.R --- @@ -689,3 +689,33 @@ getSparkContext <- function() { sc <- get(".sparkRjsc", envir = .sparkREnv) sc } + +is_master_local <- function(master) { + grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE) +} + +getRemoteMasterInfo <- function(master) { + hostPort <- sub("spark://", "", master) + host <- sub(":.*", "", hostPort) --- End diff -- is this going to break YARN client and cluster mode then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75275020 --- Diff: R/pkg/R/utils.R --- @@ -689,3 +689,33 @@ getSparkContext <- function() { sc <- get(".sparkRjsc", envir = .sparkREnv) sc } + +is_master_local <- function(master) { + grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE) +} + +getRemoteMasterInfo <- function(master) { + hostPort <- sub("spark://", "", master) + host <- sub(":.*", "", hostPort) --- End diff -- Yeah, that's correct. For others like YARN cluster, it could be a little more complicated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75239846 --- Diff: R/pkg/R/sparkR.R --- @@ -159,55 +159,81 @@ sparkR.sparkContext <- function( warning(paste("sparkPackages has no effect when using spark-submit or sparkR shell", " please use the --packages commandline instead", sep = ",")) } +host <- "localhost" backendPort <- existingPort } else { -path <- tempfile(pattern = "backend_port") -submitOps <- getClientModeSparkSubmitOpts( + +if (!nzchar(master) || is_master_local(master)) { + path <- tempfile(pattern = "backend_port") + submitOps <- getClientModeSparkSubmitOpts( Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"), sparkEnvirMap) -launchBackend( + launchBackend( args = path, sparkHome = sparkHome, jars = jars, sparkSubmitOpts = submitOps, packages = packages) -# wait atmost 100 seconds for JVM to launch -wait <- 0.1 -for (i in 1:25) { - Sys.sleep(wait) - if (file.exists(path)) { -break + # wait atmost 100 seconds for JVM to launch + wait <- 0.1 + for (i in 1:25) { +Sys.sleep(wait) +if (file.exists(path)) { + break +} +wait <- wait * 1.25 } - wait <- wait * 1.25 -} -if (!file.exists(path)) { - stop("JVM is not ready after 10 seconds") -} -f <- file(path, open = "rb") -backendPort <- readInt(f) -monitorPort <- readInt(f) -rLibPath <- readString(f) -close(f) -file.remove(path) -if (length(backendPort) == 0 || backendPort == 0 || -length(monitorPort) == 0 || monitorPort == 0 || -length(rLibPath) != 1) { - stop("JVM failed to launch") + if (!file.exists(path)) { +stop("JVM is not ready after 10 seconds") + } + f <- file(path, open = "rb") + backendPort <- readInt(f) + monitorPort <- readInt(f) + rLibPath <- readString(f) + close(f) + file.remove(path) + if (length(backendPort) == 0 || backendPort == 0 || + length(monitorPort) == 0 || monitorPort == 0 || + length(rLibPath) != 1) { +stop("JVM failed to launch") + } + if (rLibPath != "") { +assign(".libPath", rLibPath, envir = .sparkREnv) +.libPaths(c(rLibPath, .libPaths())) + } + host <- "localhost" +} else { + backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) { +sparkEnvirMap[["backend.port"]] --- End diff -- Never mind, I mess it with host. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75239809 --- Diff: R/pkg/R/utils.R --- @@ -689,3 +689,33 @@ getSparkContext <- function() { sc <- get(".sparkRjsc", envir = .sparkREnv) sc } + +is_master_local <- function(master) { + grepl("^local(\\[([0-9]+|\\*)\\])?$", master, perl = TRUE) +} + +getRemoteMasterInfo <- function(master) { + hostPort <- sub("spark://", "", master) + host <- sub(":.*", "", hostPort) --- End diff -- Does that it mean it only support standalone mode ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75234053 --- Diff: R/pkg/R/sparkR.R --- @@ -159,55 +159,81 @@ sparkR.sparkContext <- function( warning(paste("sparkPackages has no effect when using spark-submit or sparkR shell", " please use the --packages commandline instead", sep = ",")) } +host <- "localhost" backendPort <- existingPort } else { -path <- tempfile(pattern = "backend_port") -submitOps <- getClientModeSparkSubmitOpts( + +if (!nzchar(master) || is_master_local(master)) { + path <- tempfile(pattern = "backend_port") + submitOps <- getClientModeSparkSubmitOpts( Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"), sparkEnvirMap) -launchBackend( + launchBackend( args = path, sparkHome = sparkHome, jars = jars, sparkSubmitOpts = submitOps, packages = packages) -# wait atmost 100 seconds for JVM to launch -wait <- 0.1 -for (i in 1:25) { - Sys.sleep(wait) - if (file.exists(path)) { -break + # wait atmost 100 seconds for JVM to launch + wait <- 0.1 + for (i in 1:25) { +Sys.sleep(wait) +if (file.exists(path)) { + break +} +wait <- wait * 1.25 } - wait <- wait * 1.25 -} -if (!file.exists(path)) { - stop("JVM is not ready after 10 seconds") -} -f <- file(path, open = "rb") -backendPort <- readInt(f) -monitorPort <- readInt(f) -rLibPath <- readString(f) -close(f) -file.remove(path) -if (length(backendPort) == 0 || backendPort == 0 || -length(monitorPort) == 0 || monitorPort == 0 || -length(rLibPath) != 1) { - stop("JVM failed to launch") + if (!file.exists(path)) { +stop("JVM is not ready after 10 seconds") + } + f <- file(path, open = "rb") + backendPort <- readInt(f) + monitorPort <- readInt(f) + rLibPath <- readString(f) + close(f) + file.remove(path) + if (length(backendPort) == 0 || backendPort == 0 || + length(monitorPort) == 0 || monitorPort == 0 || + length(rLibPath) != 1) { +stop("JVM failed to launch") + } + if (rLibPath != "") { +assign(".libPath", rLibPath, envir = .sparkREnv) +.libPaths(c(rLibPath, .libPaths())) + } + host <- "localhost" +} else { + backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) { +sparkEnvirMap[["backend.port"]] --- End diff -- Not really. I just used the default one. Any concern about that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75233185 --- Diff: R/pkg/R/sparkR.R --- @@ -159,55 +159,81 @@ sparkR.sparkContext <- function( warning(paste("sparkPackages has no effect when using spark-submit or sparkR shell", " please use the --packages commandline instead", sep = ",")) } +host <- "localhost" backendPort <- existingPort } else { -path <- tempfile(pattern = "backend_port") -submitOps <- getClientModeSparkSubmitOpts( + +if (!nzchar(master) || is_master_local(master)) { + path <- tempfile(pattern = "backend_port") + submitOps <- getClientModeSparkSubmitOpts( Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"), sparkEnvirMap) -launchBackend( + launchBackend( args = path, sparkHome = sparkHome, jars = jars, sparkSubmitOpts = submitOps, packages = packages) -# wait atmost 100 seconds for JVM to launch -wait <- 0.1 -for (i in 1:25) { - Sys.sleep(wait) - if (file.exists(path)) { -break + # wait atmost 100 seconds for JVM to launch + wait <- 0.1 + for (i in 1:25) { +Sys.sleep(wait) +if (file.exists(path)) { + break +} +wait <- wait * 1.25 } - wait <- wait * 1.25 -} -if (!file.exists(path)) { - stop("JVM is not ready after 10 seconds") -} -f <- file(path, open = "rb") -backendPort <- readInt(f) -monitorPort <- readInt(f) -rLibPath <- readString(f) -close(f) -file.remove(path) -if (length(backendPort) == 0 || backendPort == 0 || -length(monitorPort) == 0 || monitorPort == 0 || -length(rLibPath) != 1) { - stop("JVM failed to launch") + if (!file.exists(path)) { +stop("JVM is not ready after 10 seconds") + } + f <- file(path, open = "rb") + backendPort <- readInt(f) + monitorPort <- readInt(f) + rLibPath <- readString(f) + close(f) + file.remove(path) + if (length(backendPort) == 0 || backendPort == 0 || + length(monitorPort) == 0 || monitorPort == 0 || + length(rLibPath) != 1) { +stop("JVM failed to launch") + } + if (rLibPath != "") { +assign(".libPath", rLibPath, envir = .sparkREnv) +.libPaths(c(rLibPath, .libPaths())) + } + host <- "localhost" +} else { + backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) { +sparkEnvirMap[["backend.port"]] --- End diff -- So do you manually set that when you do test ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75232576 --- Diff: R/pkg/R/sparkR.R --- @@ -159,55 +159,81 @@ sparkR.sparkContext <- function( warning(paste("sparkPackages has no effect when using spark-submit or sparkR shell", " please use the --packages commandline instead", sep = ",")) } +host <- "localhost" backendPort <- existingPort } else { -path <- tempfile(pattern = "backend_port") -submitOps <- getClientModeSparkSubmitOpts( + +if (!nzchar(master) || is_master_local(master)) { + path <- tempfile(pattern = "backend_port") + submitOps <- getClientModeSparkSubmitOpts( Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"), sparkEnvirMap) -launchBackend( + launchBackend( args = path, sparkHome = sparkHome, jars = jars, sparkSubmitOpts = submitOps, packages = packages) -# wait atmost 100 seconds for JVM to launch -wait <- 0.1 -for (i in 1:25) { - Sys.sleep(wait) - if (file.exists(path)) { -break + # wait atmost 100 seconds for JVM to launch + wait <- 0.1 + for (i in 1:25) { +Sys.sleep(wait) +if (file.exists(path)) { + break +} +wait <- wait * 1.25 } - wait <- wait * 1.25 -} -if (!file.exists(path)) { - stop("JVM is not ready after 10 seconds") -} -f <- file(path, open = "rb") -backendPort <- readInt(f) -monitorPort <- readInt(f) -rLibPath <- readString(f) -close(f) -file.remove(path) -if (length(backendPort) == 0 || backendPort == 0 || -length(monitorPort) == 0 || monitorPort == 0 || -length(rLibPath) != 1) { - stop("JVM failed to launch") + if (!file.exists(path)) { +stop("JVM is not ready after 10 seconds") + } + f <- file(path, open = "rb") + backendPort <- readInt(f) + monitorPort <- readInt(f) + rLibPath <- readString(f) + close(f) + file.remove(path) + if (length(backendPort) == 0 || backendPort == 0 || + length(monitorPort) == 0 || monitorPort == 0 || + length(rLibPath) != 1) { +stop("JVM failed to launch") + } + if (rLibPath != "") { +assign(".libPath", rLibPath, envir = .sparkREnv) +.libPaths(c(rLibPath, .libPaths())) + } + host <- "localhost" +} else { + backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) { +sparkEnvirMap[["backend.port"]] --- End diff -- From `sparkConfig` in sparkR.session? Users can provide their own. Otherwise, the default ones will be used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75232452 --- Diff: R/pkg/R/sparkR.R --- @@ -369,7 +395,7 @@ sparkR.session <- function( if (!exists(".sparkRjsc", envir = .sparkREnv)) { sparkExecutorEnvMap <- new.env() sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, sparkExecutorEnvMap, - sparkJars, sparkPackages) +sparkJars, sparkPackages) --- End diff -- I thought the arguments should be aligned? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user junyangq commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75232295 --- Diff: R/pkg/R/sparkR.R --- @@ -159,55 +159,81 @@ sparkR.sparkContext <- function( warning(paste("sparkPackages has no effect when using spark-submit or sparkR shell", " please use the --packages commandline instead", sep = ",")) } +host <- "localhost" backendPort <- existingPort } else { -path <- tempfile(pattern = "backend_port") -submitOps <- getClientModeSparkSubmitOpts( + +if (!nzchar(master) || is_master_local(master)) { + path <- tempfile(pattern = "backend_port") + submitOps <- getClientModeSparkSubmitOpts( Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"), sparkEnvirMap) -launchBackend( + launchBackend( args = path, sparkHome = sparkHome, jars = jars, sparkSubmitOpts = submitOps, packages = packages) -# wait atmost 100 seconds for JVM to launch -wait <- 0.1 -for (i in 1:25) { - Sys.sleep(wait) - if (file.exists(path)) { -break + # wait atmost 100 seconds for JVM to launch + wait <- 0.1 + for (i in 1:25) { +Sys.sleep(wait) +if (file.exists(path)) { + break +} +wait <- wait * 1.25 } - wait <- wait * 1.25 -} -if (!file.exists(path)) { - stop("JVM is not ready after 10 seconds") -} -f <- file(path, open = "rb") -backendPort <- readInt(f) -monitorPort <- readInt(f) -rLibPath <- readString(f) -close(f) -file.remove(path) -if (length(backendPort) == 0 || backendPort == 0 || -length(monitorPort) == 0 || monitorPort == 0 || -length(rLibPath) != 1) { - stop("JVM failed to launch") + if (!file.exists(path)) { +stop("JVM is not ready after 10 seconds") + } + f <- file(path, open = "rb") + backendPort <- readInt(f) + monitorPort <- readInt(f) + rLibPath <- readString(f) + close(f) + file.remove(path) + if (length(backendPort) == 0 || backendPort == 0 || + length(monitorPort) == 0 || monitorPort == 0 || + length(rLibPath) != 1) { +stop("JVM failed to launch") + } + if (rLibPath != "") { +assign(".libPath", rLibPath, envir = .sparkREnv) +.libPaths(c(rLibPath, .libPaths())) + } + host <- "localhost" +} else { + backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) { +sparkEnvirMap[["backend.port"]] + } else { +"8000" + } + monitorPort <- if (!is.null(sparkEnvirMap[["monitor.port"]])) { +sparkEnvirMap[["monitor.port"]] + } else { +"8001" + } + host <- getRemoteMasterInfo(master)$host + port <- getRemoteMasterInfo(master)$port + if (is.null(port)) { +message(sprintf("Use backend port %s.", backendPort)) + } else { +message(sprintf("Use backedn port %s parsed from master.", port)) +backendPort <- port + } + master <- "local" # have connected to RBackend, use local mode } -assign(".monitorConn", socketConnection(port = monitorPort), envir = .sparkREnv) +assign(".monitorConn", socketConnection(host = host, port = monitorPort), + envir = .sparkREnv) assign(".backendLaunched", 1, envir = .sparkREnv) -if (rLibPath != "") { - assign(".libPath", rLibPath, envir = .sparkREnv) - .libPaths(c(rLibPath, .libPaths())) -} } .sparkREnv$backendPort <- backendPort tryCatch({ -connectBackend("localhost", backendPort) +connectBackend(host, backendPort) }, error = function(err) { -stop("Failed to connect JVM\n") +stop(paste0("Failed to connect JVM\n", existingPort)) --- End diff -- Oh this shouldn't have been changed in this PR... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75050631 --- Diff: R/pkg/R/sparkR.R --- @@ -159,55 +159,81 @@ sparkR.sparkContext <- function( warning(paste("sparkPackages has no effect when using spark-submit or sparkR shell", " please use the --packages commandline instead", sep = ",")) } +host <- "localhost" backendPort <- existingPort } else { -path <- tempfile(pattern = "backend_port") -submitOps <- getClientModeSparkSubmitOpts( + +if (!nzchar(master) || is_master_local(master)) { + path <- tempfile(pattern = "backend_port") + submitOps <- getClientModeSparkSubmitOpts( Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"), sparkEnvirMap) -launchBackend( + launchBackend( args = path, sparkHome = sparkHome, jars = jars, sparkSubmitOpts = submitOps, packages = packages) -# wait atmost 100 seconds for JVM to launch -wait <- 0.1 -for (i in 1:25) { - Sys.sleep(wait) - if (file.exists(path)) { -break + # wait atmost 100 seconds for JVM to launch + wait <- 0.1 + for (i in 1:25) { +Sys.sleep(wait) +if (file.exists(path)) { + break +} +wait <- wait * 1.25 } - wait <- wait * 1.25 -} -if (!file.exists(path)) { - stop("JVM is not ready after 10 seconds") -} -f <- file(path, open = "rb") -backendPort <- readInt(f) -monitorPort <- readInt(f) -rLibPath <- readString(f) -close(f) -file.remove(path) -if (length(backendPort) == 0 || backendPort == 0 || -length(monitorPort) == 0 || monitorPort == 0 || -length(rLibPath) != 1) { - stop("JVM failed to launch") + if (!file.exists(path)) { +stop("JVM is not ready after 10 seconds") + } + f <- file(path, open = "rb") + backendPort <- readInt(f) + monitorPort <- readInt(f) + rLibPath <- readString(f) + close(f) + file.remove(path) + if (length(backendPort) == 0 || backendPort == 0 || + length(monitorPort) == 0 || monitorPort == 0 || + length(rLibPath) != 1) { +stop("JVM failed to launch") + } + if (rLibPath != "") { +assign(".libPath", rLibPath, envir = .sparkREnv) +.libPaths(c(rLibPath, .libPaths())) + } + host <- "localhost" +} else { + backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) { +sparkEnvirMap[["backend.port"]] --- End diff -- How is backend.port passed to R process ? I don't see how this environment variable is set. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75045315 --- Diff: R/pkg/R/sparkR.R --- @@ -159,55 +159,81 @@ sparkR.sparkContext <- function( warning(paste("sparkPackages has no effect when using spark-submit or sparkR shell", " please use the --packages commandline instead", sep = ",")) } +host <- "localhost" backendPort <- existingPort } else { -path <- tempfile(pattern = "backend_port") -submitOps <- getClientModeSparkSubmitOpts( + +if (!nzchar(master) || is_master_local(master)) { + path <- tempfile(pattern = "backend_port") + submitOps <- getClientModeSparkSubmitOpts( Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"), sparkEnvirMap) -launchBackend( + launchBackend( --- End diff -- indentation issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75045227 --- Diff: R/pkg/R/sparkR.R --- @@ -369,7 +395,7 @@ sparkR.session <- function( if (!exists(".sparkRjsc", envir = .sparkREnv)) { sparkExecutorEnvMap <- new.env() sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, sparkExecutorEnvMap, - sparkJars, sparkPackages) +sparkJars, sparkPackages) --- End diff -- indention --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14666#discussion_r75045186 --- Diff: R/pkg/R/sparkR.R --- @@ -159,55 +159,81 @@ sparkR.sparkContext <- function( warning(paste("sparkPackages has no effect when using spark-submit or sparkR shell", " please use the --packages commandline instead", sep = ",")) } +host <- "localhost" backendPort <- existingPort } else { -path <- tempfile(pattern = "backend_port") -submitOps <- getClientModeSparkSubmitOpts( + +if (!nzchar(master) || is_master_local(master)) { + path <- tempfile(pattern = "backend_port") + submitOps <- getClientModeSparkSubmitOpts( Sys.getenv("SPARKR_SUBMIT_ARGS", "sparkr-shell"), sparkEnvirMap) -launchBackend( + launchBackend( args = path, sparkHome = sparkHome, jars = jars, sparkSubmitOpts = submitOps, packages = packages) -# wait atmost 100 seconds for JVM to launch -wait <- 0.1 -for (i in 1:25) { - Sys.sleep(wait) - if (file.exists(path)) { -break + # wait atmost 100 seconds for JVM to launch + wait <- 0.1 + for (i in 1:25) { +Sys.sleep(wait) +if (file.exists(path)) { + break +} +wait <- wait * 1.25 } - wait <- wait * 1.25 -} -if (!file.exists(path)) { - stop("JVM is not ready after 10 seconds") -} -f <- file(path, open = "rb") -backendPort <- readInt(f) -monitorPort <- readInt(f) -rLibPath <- readString(f) -close(f) -file.remove(path) -if (length(backendPort) == 0 || backendPort == 0 || -length(monitorPort) == 0 || monitorPort == 0 || -length(rLibPath) != 1) { - stop("JVM failed to launch") + if (!file.exists(path)) { +stop("JVM is not ready after 10 seconds") + } + f <- file(path, open = "rb") + backendPort <- readInt(f) + monitorPort <- readInt(f) + rLibPath <- readString(f) + close(f) + file.remove(path) + if (length(backendPort) == 0 || backendPort == 0 || + length(monitorPort) == 0 || monitorPort == 0 || + length(rLibPath) != 1) { +stop("JVM failed to launch") + } + if (rLibPath != "") { +assign(".libPath", rLibPath, envir = .sparkREnv) +.libPaths(c(rLibPath, .libPaths())) + } + host <- "localhost" +} else { + backendPort <- if (!is.null(sparkEnvirMap[["backend.port"]])) { +sparkEnvirMap[["backend.port"]] + } else { +"8000" + } + monitorPort <- if (!is.null(sparkEnvirMap[["monitor.port"]])) { +sparkEnvirMap[["monitor.port"]] + } else { +"8001" + } + host <- getRemoteMasterInfo(master)$host + port <- getRemoteMasterInfo(master)$port + if (is.null(port)) { +message(sprintf("Use backend port %s.", backendPort)) + } else { +message(sprintf("Use backedn port %s parsed from master.", port)) +backendPort <- port + } + master <- "local" # have connected to RBackend, use local mode } -assign(".monitorConn", socketConnection(port = monitorPort), envir = .sparkREnv) +assign(".monitorConn", socketConnection(host = host, port = monitorPort), + envir = .sparkREnv) assign(".backendLaunched", 1, envir = .sparkREnv) -if (rLibPath != "") { - assign(".libPath", rLibPath, envir = .sparkREnv) - .libPaths(c(rLibPath, .libPaths())) -} } .sparkREnv$backendPort <- backendPort tryCatch({ -connectBackend("localhost", backendPort) +connectBackend(host, backendPort) }, error = function(err) { -stop("Failed to connect JVM\n") +stop(paste0("Failed to connect JVM\n", existingPort)) --- End diff -- add host here ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14666: [SPARK-16578][SparkR] Enable SparkR to connect to...
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14666 [SPARK-16578][SparkR] Enable SparkR to connect to a remote machine running RBackend ## What changes were proposed in this pull request? This PR tries to enable SparkR to connect to a remote machine that runs the RBackend. * Default port numbers are used, backend 8000 and monitor 8001, if not set otherwise. They can be set by `spark.r.backendPort` and `spark.r.monitorPort` on the RBackend side and by `backend.port` and `monitor.port` in `sparkConfig` on the client side. ## How was this patch tested? R unit test. Manual test: connect to local standalone cluster and to AWS EC2 instance. You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARK-16578-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14666.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14666 commit b221b5fb4043dc891c74dee513dcacffd6951b1a Author: Junyang QianDate: 2016-08-04T07:05:27Z Enable SparkR to talk to a remote cluster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org