[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR

2015-06-16 Thread Rick Moritz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588271#comment-14588271
 ] 

Rick Moritz commented on SPARK-6816:


Apparently this work-around is no longer needed for spark-1.4.0, which invokes 
a shell script instead of going directly to java as sparkR-pkg did, and fetches 
the required environment parameters.
With spark-defaults being respected, and SPARK_MEM available for memory 
options, there probably isn't a whole lot that needs to be passed by -D to 
shell script.

 Add SparkConf API to configure SparkR
 -

 Key: SPARK-6816
 URL: https://issues.apache.org/jira/browse/SPARK-6816
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 Right now the only way to configure SparkR is to pass in arguments to 
 sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python 
 to make configuration easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR

2015-06-02 Thread Rick Moritz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568807#comment-14568807
 ] 

Rick Moritz commented on SPARK-6816:


[~shivaram], I am integrating SparkR into an RStudio server (I would believe 
this to be a rather common use case), so using bin/SparkR won't work in this 
case, as far as I can tell. Thanks for the suggestion nonetheless.

 Add SparkConf API to configure SparkR
 -

 Key: SPARK-6816
 URL: https://issues.apache.org/jira/browse/SPARK-6816
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 Right now the only way to configure SparkR is to pass in arguments to 
 sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python 
 to make configuration easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR

2015-06-01 Thread Rick Moritz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567187#comment-14567187
 ] 

Rick Moritz commented on SPARK-6816:


One current drawback with SparkR's configuration option is the inability to set 
driver VM-options. These are crucial, when attempting to run sparkR on a 
Hortonworks HDP, as both driver and appliation-master need to be aware of the 
hdp.version variable in order to resolve the classpath.

While it is possible to pass this variable to the executors, there's no way to 
pass this option to the driver, excepting the following exploit/work-around:

The SPARK_MEM variable can be abused to pass the required parameters to the 
driver's VM, by using String concatenation. Setting the variable to (e.g.)  
512m -Dhdp.version=NNN appends the -D option to the -X option which is 
currently read from this environment variable. Adding a secondary variable to 
the System.env which gets parsed for JVM options would be far more obvious and 
less hacky, or by adding a separate environment list for the driver, extending 
what's currently available for executors.

I'm adding this as a comment to this issue, since I believe it is sufficiently 
closely related not to warrant a separate issue.

 Add SparkConf API to configure SparkR
 -

 Key: SPARK-6816
 URL: https://issues.apache.org/jira/browse/SPARK-6816
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 Right now the only way to configure SparkR is to pass in arguments to 
 sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python 
 to make configuration easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR

2015-06-01 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568150#comment-14568150
 ] 

Shivaram Venkataraman commented on SPARK-6816:
--

[~RPCMoritz] Have you tried launching SparkR from the new scripts in 
`bin/sparkR` (or `bin/spark-submit` if you want to run a script) ? In these 
cases you should be able to pass any spark-submit options like driver-memory or 
driver-java-options to the scripts.  However I have to say that this is not a 
direct fix for this issue, but more of a work-around

 Add SparkConf API to configure SparkR
 -

 Key: SPARK-6816
 URL: https://issues.apache.org/jira/browse/SPARK-6816
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 Right now the only way to configure SparkR is to pass in arguments to 
 sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python 
 to make configuration easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR

2015-04-09 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488237#comment-14488237
 ] 

Shivaram Venkataraman commented on SPARK-6816:
--

Comments from SparkR JIRA

Shivaram Venkataraman added a comment - 14/Feb/15 10:32 AM
I looked at this recently and I think the existing arguments to `sparkR.init` 
pretty much cover all the options that are exposed in SparkConf.
We could split things out of the function arguments into a separate SparkConf 
object (something like PySpark 
https://github.com/apache/spark/blob/master/python/pyspark/conf.py) but the 
setter-methods don't translate very well to the style we use in SparkR. For 
example it would be something like setAppName(setMaster(conf, local), 
SparkR) instead of conf.setMaster().setAppName()
The other thing brought up by this JIRA is that we should parse arguments 
passed to spark-submit or set in spark-defaults.conf. I think this should 
automatically happen with SPARKR-178
Sun Rui Zongheng Yang Any thoughts on this ?
  
concretevitamin Zongheng Yang added a comment - 15/Feb/15 12:07 PM
I'm +1 on not using the builder pattern in R. What about using a named list or 
an environment to simulate a SparkConf? For example, users can write something 
like:
{code}
 conf - list(spark.master = local[2], spark.executor.memory = 12g)
 conf
$spark.master
[1] local[2]

$spark.executor.memory
[1] 12g
{code}
and pass the named list to `sparkR.init()`.
  
shivaram Shivaram Venkataraman added a comment - 15/Feb/15 5:50 PM
I think the named list might be okay, (one thing is that we will have nested 
named lists for things like executorEnv). However I am not sure if named lists 
are better than just passing named arguments to the `sparkR.init`. I guess the 
better way to ask my question is what functionality do we want to provide to 
the users –
Right now users can pretty much set anything they want in the SparkConf using 
sparkR.init
One functionality that is missing is printing the conf and say inspecting what 
config variables are set. We could say add a getConf(sc) which returns a named 
list to provide this feature.
Is there any other functionality we need ?
  
concretevitamin Zongheng Yang added a comment - 21/Feb/15 3:22 PM
IMO using a named list provides more flexibility: it's ordinary data that users 
can operate/transform on. Using only parameter-passing in the constructor locks 
users in operating on code instead of data. It'd also be easier to just return 
the saved named list if we're going to implement getConf()?
Some relevant discussions: https://aphyr.com/posts/321-builders-vs-option-maps
  
shivaram Shivaram Venkataraman added a comment - 22/Feb/15 4:33 PM
Hmm okay - named lists are not quite the same as option maps though.To move 
forward it'll be good to see how the new API functions we want on the R side 
should look like.
Lets keep this discussion open but I'm going to change the priority / 
description (we are already able to read in spark-defaults.conf now that 
SPARKR-178 has been merged).

 Add SparkConf API to configure SparkR
 -

 Key: SPARK-6816
 URL: https://issues.apache.org/jira/browse/SPARK-6816
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 Right now the only way to configure SparkR is to pass in arguments to 
 sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python 
 to make configuration easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org