I think the driver binds to a random port by default, but this can be changed using the "spark.driver.port" system property. So you should be able to set that property and open only that port. See: http://spark.incubator.apache.org/docs/latest/configuration.html.
On Tue, Nov 19, 2013 at 10:16 AM, Matt Cheah <[email protected]> wrote: > I determined that it is a firewall issue. Allowing "all traffic" to the > cluster where the shell was running hack-fixed the problem. > > That being said, what ports do I have to open to allow the spark master > to communicate back to the driver? I've heard this is required. And > obviously allowing all traffic is bad… > > -Matt Cheah > > From: Aaron Davidson <[email protected]> > Reply-To: "[email protected]" < > [email protected]> > Date: Monday, November 18, 2013 8:28 PM > To: "[email protected]" <[email protected]> > Subject: Re: EC2 node submit jobs to separate Spark Cluster > > The main issue with running a spark-shell locally is that it > orchestrates the actual computation, so you want it to be "close" to the > actual Worker nodes for latency reasons. Running a spark-shell on EC2 in > the same region as the Spark cluster avoids this problem. > > The error you're seeing seems to indicate a different issue. Check the > Master web UI (accessible on port 8080 at the master's IP address) to make > sure that Workers are successfully registered and they have the expected > amount of memory available to Spark. You can also check to see how much > memory your spark-shell is trying to get per executor. A couple common > problems are (1) an abandoned spark-shell is holding onto all of your > cluster's resources or (2) you've manually configured your spark-shell to > try to get more memory than your Workers have available. Both of these > should be visible in the web UI. > > > On Mon, Nov 18, 2013 at 5:00 PM, Matt Cheah <[email protected]> wrote: > >> Hi, >> >> I'm working with an infrastructure that already has its own web server >> set up on EC2. I would like to set up a *separate* spark cluster on EC2 >> with the scripts and have the web server submit jobs to this spark cluster. >> >> Is it possible to do this? I'm getting some errors running the spark >> shell from the spark shell on the web server: "Initial job has not accepted >> any resources; check your cluster UI to ensure that workers are registered >> and have sufficient memory". I have heard that it's not possible for any >> local computer to connect to the spark cluster, but I was wondering if >> other EC2 nodes could have their firewalls configured to allow this. >> >> We don't want to deploy the web server on the master node of the spark >> cluster. >> >> Thanks, >> >> -Matt Cheah >> >> >> >
