[GitHub] spark pull request #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-...

2016-09-19 Thread zjffdu
Github user zjffdu closed the pull request at:

https://github.com/apache/spark/pull/14639


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-...

2016-08-18 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14639#discussion_r75347400
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -344,6 +344,7 @@ sparkRHive.init <- function(jsc = NULL) {
 #' @note sparkR.session since 2.0.0
 sparkR.session <- function(
   master = "",
+  deployMode = "",
--- End diff --

Hmm, I think the standard way to do that in Spark is to have the deploy 
mode config:
YARN cluster
master=yarn
deploy-mode=cluster

YARN client
master=yarn
deploy-mode=client

deploy-mode is default to client.

In this as on the SparkSession API does it make sense to support 
deploy-mode cluster though? When running in the YARN cluster mode the driver 
JVM starts and then its companion R process. The R SparkSession API could not 
be able to change the master or the deploy-mode at the point R code is running? 
(Unless we want to support a remote R process connecting to the Spark driver 
running in the YARN cluster - another JIRA on this)

In any other case only master is used (standalone, mesos).

I think now one could still set spark.deployMode in the named list (though 
again won't be effective).

Perhaps we just document this in the programming guide regrading how to use 
the SparkSession R API with YARN?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-...

2016-08-18 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14639#discussion_r75344423
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -344,6 +344,7 @@ sparkRHive.init <- function(jsc = NULL) {
 #' @note sparkR.session since 2.0.0
 sparkR.session <- function(
   master = "",
+  deployMode = "",
--- End diff --

We should move this to the end of the param list as @felixcheung says. On a 
different note, is there a reason this should be a parameter to `spark.session` 
? It would be good if the users didn't have to think of deploy mode and we 
inferred it from the spark-submit options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-...

2016-08-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14639#discussion_r75117822
  
--- Diff: R/pkg/R/sparkR.R ---
@@ -344,6 +344,7 @@ sparkRHive.init <- function(jsc = NULL) {
 #' @note sparkR.session since 2.0.0
 sparkR.session <- function(
   master = "",
+  deployMode = "",
--- End diff --

I don't think we should add the parameter in the middle which would break 
user passing parameter by order.

On the other hand, do we need this as a formal parameter? Most cluster 
setup I know of is already having deploy mode in spark conf - and to @shivaram 
's point, why isn't SPARK_HOME set? If it is, we wouldn't be downloading 
anyway, right?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org