My workflow as to install RStudio on a cluster launched using Spark EC2 scripts. However I did a bunch of tweaking after that (like copying the spark installation over etc.). When I get some time I'll try to write the steps down in the JIRA.
Thanks Shivaram On Fri, Jun 26, 2015 at 10:21 AM, <m...@redoakstrategic.com> wrote: > So you created an EC2 instance with RStudio installed first, then > installed Spark under that same username? That makes sense, I just want to > verify your work flow. > > Thank you again for your willingness to help! > > > > On Fri, Jun 26, 2015 at 10:13 AM -0700, "Shivaram Venkataraman" < > shiva...@eecs.berkeley.edu> wrote: > > I was using RStudio on the master node of the same cluster in the demo. >> However I had installed Spark under the user `rstudio` (i.e. /home/rstudio) >> and that will make the permissions work correctly. You will need to copy >> the config files from /root/spark/conf after installing Spark though and it >> might need some more manual tweaks. >> >> Thanks >> Shivaram >> >> On Fri, Jun 26, 2015 at 9:59 AM, Mark Stephenson < >> m...@redoakstrategic.com> wrote: >> >>> Thanks! >>> >>> In your demo video, were you using RStudio to hit a separate EC2 Spark >>> cluster? I noticed that it appeared your browser that you were using EC2 >>> at that time, so I was just curious. It appears that might be one of the >>> possible workarounds - fire up a separate EC2 instance with RStudio Server >>> that initializes the spark context against a separate Spark cluster. >>> >>> On Jun 26, 2015, at 11:46 AM, Shivaram Venkataraman < >>> shiva...@eecs.berkeley.edu> wrote: >>> >>> We don't have a documented way to use RStudio on EC2 right now. We have >>> a ticket open at https://issues.apache.org/jira/browse/SPARK-8596 to >>> discuss work-arounds and potential solutions for this. >>> >>> Thanks >>> Shivaram >>> >>> On Fri, Jun 26, 2015 at 6:27 AM, RedOakMark <m...@redoakstrategic.com> >>> wrote: >>> >>>> Good morning, >>>> >>>> I am having a bit of trouble finalizing the installation and usage of >>>> the >>>> newest Spark version 1.4.0, deploying to an Amazon EC2 instance and >>>> using >>>> RStudio to run on top of it. >>>> >>>> Using these instructions ( >>>> http://spark.apache.org/docs/latest/ec2-scripts.html >>>> <http://spark.apache.org/docs/latest/ec2-scripts.html> ) we can fire >>>> up an >>>> EC2 instance (which we have been successful doing - we have gotten the >>>> cluster to launch from the command line without an issue). Then, I >>>> installed RStudio Server on the same EC2 instance (the master) and >>>> successfully logged into it (using the test/test user) through the web >>>> browser. >>>> >>>> This is where I get stuck - within RStudio, when I try to >>>> reference/find the >>>> folder that SparkR was installed, to load the SparkR library and >>>> initialize >>>> a SparkContext, I get permissions errors on the folders, or the library >>>> cannot be found because I cannot find the folder in which the library is >>>> sitting. >>>> >>>> Has anyone successfully launched and utilized SparkR 1.4.0 in this way, >>>> with >>>> RStudio Server running on top of the master instance? Are we on the >>>> right >>>> track, or should we manually launch a cluster and attempt to connect to >>>> it >>>> from another instance running R? >>>> >>>> Thank you in advance! >>>> >>>> Mark >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-4-0-Using-SparkR-on-EC2-Instance-tp23506.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com >>>> . >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >>> >>