I would like to programmatically start a spark cluster in ec2 from another app running in ec2, run my job and then destroy the cluster. I can launch a spark EMR cluster easily enough using the SDK however I ran into two problems: 1) I was only able to retrieve the address of the master node from the console, not via the SDK. 2) I was not able to connect to the master from my app after setting "spark://public_dns:7077" as the master in the SparkConf (where public_dns is the address listed for the cluster on the EMR console page in amazon). I kept getting "all masters are unresponsive" errors. In addition, the amazon docs only speak of running spark jobs in emr by ssh'ing to the master, launching a spark shell and running the jobs from there. Is it even possible to do programmatically from another app or must you login into the master and run jobs from the shell if you want to use spark in amazon EMR?
The second approach I tried was simply calling the spark-ec2 script from my app passing the same parameters that I use to launch the cluster manually from the cli. This failed because the ec2.connect call returns None when called from my app (scala/java on play) whereas it works perfectly when called from the cli. Is there a recommended method to launch ec2 clusters dynamically from within an app running in ec2? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/launching-a-spark-cluster-in-ec2-from-within-an-application-tp10340.html Sent from the Apache Spark User List mailing list archive at Nabble.com.