[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR
[ https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588271#comment-14588271 ] Rick Moritz commented on SPARK-6816: Apparently this work-around is no longer needed for spark-1.4.0, which invokes a shell script instead of going directly to java as sparkR-pkg did, and fetches the required environment parameters. With spark-defaults being respected, and SPARK_MEM available for memory options, there probably isn't a whole lot that needs to be passed by -D to shell script. Add SparkConf API to configure SparkR - Key: SPARK-6816 URL: https://issues.apache.org/jira/browse/SPARK-6816 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Shivaram Venkataraman Priority: Minor Right now the only way to configure SparkR is to pass in arguments to sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python to make configuration easier -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR
[ https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568807#comment-14568807 ] Rick Moritz commented on SPARK-6816: [~shivaram], I am integrating SparkR into an RStudio server (I would believe this to be a rather common use case), so using bin/SparkR won't work in this case, as far as I can tell. Thanks for the suggestion nonetheless. Add SparkConf API to configure SparkR - Key: SPARK-6816 URL: https://issues.apache.org/jira/browse/SPARK-6816 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Shivaram Venkataraman Priority: Minor Right now the only way to configure SparkR is to pass in arguments to sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python to make configuration easier -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR
[ https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567187#comment-14567187 ] Rick Moritz commented on SPARK-6816: One current drawback with SparkR's configuration option is the inability to set driver VM-options. These are crucial, when attempting to run sparkR on a Hortonworks HDP, as both driver and appliation-master need to be aware of the hdp.version variable in order to resolve the classpath. While it is possible to pass this variable to the executors, there's no way to pass this option to the driver, excepting the following exploit/work-around: The SPARK_MEM variable can be abused to pass the required parameters to the driver's VM, by using String concatenation. Setting the variable to (e.g.) 512m -Dhdp.version=NNN appends the -D option to the -X option which is currently read from this environment variable. Adding a secondary variable to the System.env which gets parsed for JVM options would be far more obvious and less hacky, or by adding a separate environment list for the driver, extending what's currently available for executors. I'm adding this as a comment to this issue, since I believe it is sufficiently closely related not to warrant a separate issue. Add SparkConf API to configure SparkR - Key: SPARK-6816 URL: https://issues.apache.org/jira/browse/SPARK-6816 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Shivaram Venkataraman Priority: Minor Right now the only way to configure SparkR is to pass in arguments to sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python to make configuration easier -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR
[ https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568150#comment-14568150 ] Shivaram Venkataraman commented on SPARK-6816: -- [~RPCMoritz] Have you tried launching SparkR from the new scripts in `bin/sparkR` (or `bin/spark-submit` if you want to run a script) ? In these cases you should be able to pass any spark-submit options like driver-memory or driver-java-options to the scripts. However I have to say that this is not a direct fix for this issue, but more of a work-around Add SparkConf API to configure SparkR - Key: SPARK-6816 URL: https://issues.apache.org/jira/browse/SPARK-6816 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Shivaram Venkataraman Priority: Minor Right now the only way to configure SparkR is to pass in arguments to sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python to make configuration easier -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6816) Add SparkConf API to configure SparkR
[ https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488237#comment-14488237 ] Shivaram Venkataraman commented on SPARK-6816: -- Comments from SparkR JIRA Shivaram Venkataraman added a comment - 14/Feb/15 10:32 AM I looked at this recently and I think the existing arguments to `sparkR.init` pretty much cover all the options that are exposed in SparkConf. We could split things out of the function arguments into a separate SparkConf object (something like PySpark https://github.com/apache/spark/blob/master/python/pyspark/conf.py) but the setter-methods don't translate very well to the style we use in SparkR. For example it would be something like setAppName(setMaster(conf, local), SparkR) instead of conf.setMaster().setAppName() The other thing brought up by this JIRA is that we should parse arguments passed to spark-submit or set in spark-defaults.conf. I think this should automatically happen with SPARKR-178 Sun Rui Zongheng Yang Any thoughts on this ? concretevitamin Zongheng Yang added a comment - 15/Feb/15 12:07 PM I'm +1 on not using the builder pattern in R. What about using a named list or an environment to simulate a SparkConf? For example, users can write something like: {code} conf - list(spark.master = local[2], spark.executor.memory = 12g) conf $spark.master [1] local[2] $spark.executor.memory [1] 12g {code} and pass the named list to `sparkR.init()`. shivaram Shivaram Venkataraman added a comment - 15/Feb/15 5:50 PM I think the named list might be okay, (one thing is that we will have nested named lists for things like executorEnv). However I am not sure if named lists are better than just passing named arguments to the `sparkR.init`. I guess the better way to ask my question is what functionality do we want to provide to the users – Right now users can pretty much set anything they want in the SparkConf using sparkR.init One functionality that is missing is printing the conf and say inspecting what config variables are set. We could say add a getConf(sc) which returns a named list to provide this feature. Is there any other functionality we need ? concretevitamin Zongheng Yang added a comment - 21/Feb/15 3:22 PM IMO using a named list provides more flexibility: it's ordinary data that users can operate/transform on. Using only parameter-passing in the constructor locks users in operating on code instead of data. It'd also be easier to just return the saved named list if we're going to implement getConf()? Some relevant discussions: https://aphyr.com/posts/321-builders-vs-option-maps shivaram Shivaram Venkataraman added a comment - 22/Feb/15 4:33 PM Hmm okay - named lists are not quite the same as option maps though.To move forward it'll be good to see how the new API functions we want on the R side should look like. Lets keep this discussion open but I'm going to change the priority / description (we are already able to read in spark-defaults.conf now that SPARKR-178 has been merged). Add SparkConf API to configure SparkR - Key: SPARK-6816 URL: https://issues.apache.org/jira/browse/SPARK-6816 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Shivaram Venkataraman Priority: Minor Right now the only way to configure SparkR is to pass in arguments to sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python to make configuration easier -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org