I determined that it is a firewall issue. Allowing "all traffic" to the cluster 
where the shell was running hack-fixed the problem.

That being said, what ports do I have to open to allow the spark master to 
communicate back to the driver? I've heard this is required. And obviously 
allowing all traffic is bad…

-Matt Cheah

From: Aaron Davidson <[email protected]<mailto:[email protected]>>
Reply-To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, November 18, 2013 8:28 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: EC2 node submit jobs to separate Spark Cluster

The main issue with running a spark-shell locally is that it orchestrates the 
actual computation, so you want it to be "close" to the actual Worker nodes for 
latency reasons. Running a spark-shell on EC2 in the same region as the Spark 
cluster avoids this problem.

The error you're seeing seems to indicate a different issue. Check the Master 
web UI (accessible on port 8080 at the master's IP address) to make sure that 
Workers are successfully registered and they have the expected amount of memory 
available to Spark. You can also check to see how much memory your spark-shell 
is trying to get per executor. A couple common problems are (1) an abandoned 
spark-shell is holding onto all of your cluster's resources or (2) you've 
manually configured your spark-shell to try to get more memory than your 
Workers have available. Both of these should be visible in the web UI.


On Mon, Nov 18, 2013 at 5:00 PM, Matt Cheah 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I'm working with an infrastructure that already has its own web server set up 
on EC2. I would like to set up a separate spark cluster on EC2 with the scripts 
and have the web server submit jobs to this spark cluster.

Is it possible to do this? I'm getting some errors running the spark shell from 
the spark shell on the web server: "Initial job has not accepted any resources; 
check your cluster UI to ensure that workers are registered and have sufficient 
memory". I have heard that it's not possible for any local computer to connect 
to the spark cluster, but I was wondering if other EC2 nodes could have their 
firewalls configured to allow this.

We don't want to deploy the web server on the master node of the spark cluster.

Thanks,

-Matt Cheah



Reply via email to